我有一个像列一样的CSV
LABEL
a
b
a
a
c
n o
ye s
我想把它分成以下几样:
LABEL_a LABEL_b LABEL_c LABEL_n_o LABEL_ye_s
1 0 0 0 0
0 1 0 0 0
1 0 0 0 0
1 0 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
如何用熊猫做这样的事情?
让我们使用带有参数pd.get_dummmies
的prefix
:
#Using @Lambda setup
label = ["a", "b", "a", "a", "c", "n o", "ye s"]
s = pd.Series(label)
pd.get_dummies(s, prefix='label')
输出:
label_a label_b label_c label_n o label_ye s
0 1 0 0 0 0
1 0 1 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 0 0 1 0 0
5 0 0 0 1 0
6 0 0 0 0 1
> %%timeit for key in keys:
> df[("label_%s" % key).replace(" ", "_")] = (s == key).astype(int)
100个循环,最佳3:6.7 ms每循环
> %timeit s.str.get_dummies().add_prefix('label_')
100个循环,最佳3:每循环6.03毫秒
> %timeit pd.get_dummies(s, prefix='label')
1000循环,最佳3:每循环1.77毫秒
使用get_dummies
s.str.get_dummies().add_prefix('label_')
Out[19]:
label_a label_b label_c label_n o label_ye s
0 1 0 0 0 0
1 0 1 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 0 0 1 0 0
5 0 0 0 1 0
6 0 0 0 0 1
import pandas as pd
label = ["a", "b", "a", "a", "c", "n o", "ye s"]
s = pd.Series(label)
keys = s.unique()
df = pd.DataFrame()
for key in keys:
df[("label_%s" % key).replace(" ", "_")] = (s == key).astype(int)