如何获取 MultiLabelBinarizer 中的项目计数?
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform([(1,1,2), (3,3,2,5)]),columns=mlb.classes_)
Out[0]:
1 2 3 5
0 1 1 0 0
1 0 1 1 1
而不是这个,我想要得到
Out[0]:
1 2 3 5
0 2 1 0 0
1 0 1 2 1
由于 1 在第 1 行中重复 2 次,并且 3 在第 2 行中重复 2 次
from collections import Counter
data = [(1,1,2), (3,3,2,5)]
pd.DataFrame([Counter(x) for x in data]).fillna(0)
输出:
1 2 3 5
0 2.0 1 0.0 0.0
1 0.0 1 2.0 1.0
import pandas as pd
import numpy as np
from sklearn.preprocessing import MultiLabelBinarizer
data = [(1, 1, 2), (3, 3, 2, 5)]
# Initialize MultiLabelBinarizer and fit_transform the data
mlb = MultiLabelBinarizer()
binary_matrix = mlb.fit_transform(data)
# Get the labels (columns) from MultiLabelBinarizer
cols= labels = mlb.classes_
print(cols)#[1 2 3 5]
# Count occurrences in each row
count_matrix = np.array([[row.count(label) for label in labels] for row in data])
# Create DataFrame from the count matrix
count_df = pd.DataFrame(count_matrix, columns=labels)
print(count_df)
'''
1 2 3 5
0 2 1 0 0
1 0 1 2 1
'''