在 MultiLabelBinarizer 中获取计数

问题描述 投票:0回答:2

如何获取 MultiLabelBinarizer 中的项目计数?

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()

pd.DataFrame(mlb.fit_transform([(1,1,2), (3,3,2,5)]),columns=mlb.classes_)

Out[0]: 
   1  2  3  5
0  1  1  0  0
1  0  1  1  1

而不是这个,我想要得到

Out[0]: 
   1  2  3  5
0  2  1  0  0
1  0  1  2  1

由于 1 在第 1 行中重复 2 次,并且 3 在第 2 行中重复 2 次

python-3.x machine-learning scikit-learn data-manipulation
2个回答
1
投票
from collections import Counter

data = [(1,1,2), (3,3,2,5)]
pd.DataFrame([Counter(x) for x in data]).fillna(0)

输出:

    1       2   3       5
0   2.0     1   0.0     0.0
1   0.0     1   2.0     1.0

0
投票
import pandas as pd
import numpy as np
from sklearn.preprocessing import MultiLabelBinarizer


data = [(1, 1, 2), (3, 3, 2, 5)]

# Initialize MultiLabelBinarizer and fit_transform the data
mlb = MultiLabelBinarizer()
binary_matrix = mlb.fit_transform(data)
# Get the labels (columns) from MultiLabelBinarizer
cols= labels = mlb.classes_
print(cols)#[1 2 3 5]
# Count occurrences in each row
count_matrix = np.array([[row.count(label) for label in labels] for row in data])
# Create DataFrame from the count matrix
count_df = pd.DataFrame(count_matrix, columns=labels)
print(count_df)
'''
   1  2  3  5
0  2  1  0  0
1  0  1  2  1
'''
© www.soinside.com 2019 - 2024. All rights reserved.