我正在利用熊猫做一些分析练习。我想创建一个新列,其值是两行的总和。原始数据集如下......
Admit Gender Dept Freq
0 Admitted Male A 512
1 Rejected Male A 313
2 Admitted Female A 89
3 Rejected Female A 19
4 Admitted Male B 353
5 Rejected Male B 207
6 Admitted Female B 17
7 Rejected Female B 8
8 Admitted Male C 120
9 Rejected Male C 205
10 Admitted Female C 202
11 Rejected Female C 391
12 Admitted Male D 138
13 Rejected Male D 279
14 Admitted Female D 131
15 Rejected Female D 244
16 Admitted Male E 53
17 Rejected Male E 138
18 Admitted Female E 94
19 Rejected Female E 299
20 Admitted Male F 22
21 Rejected Male F 351
22 Admitted Female F 24
23 Rejected Female F 317
我想利用以下数据框创建一个新列...
Dept Gender Freq
0 A Female 108
1 A Male 825
2 B Female 25
3 B Male 560
4 C Female 593
5 C Male 325
6 D Female 375
7 D Male 417
8 E Female 393
9 E Male 191
10 F Female 341
11 F Male 373
我想利用第二个数据帧的Freq
列在第一个数据帧中创建一个新列。我需要插入108
值if Detp and Gender
在两个数据帧中是相同的。新数据框应如下所示......
Admit Gender Dept Freq Total
0 Admitted Male A 512 825
1 Rejected Male A 313 825
2 Admitted Female A 89 108
3 Rejected Female A 19 108
4 Admitted Male B 353 560
5 Rejected Male B 207 560
6 Admitted Female B 17 25
7 Rejected Female B 8 25
我试过以下代码......
for i in data.iterrows():
for j in total_freq.iterrows():
if i[1].Gender == total_freq.Gender & i[1].Dept == total_freq.Dept:
data['Total'] = total_freq.Freq
我收到以下错误... TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]
有没有帮助创建具有正确值的列?
你可以使用变换
df['Total'] = df.groupby(['Dept', 'Gender']).Freq.transform('sum')
你得到
Admit Gender Dept Freq Total
0 Admitted Male A 512 825
1 Rejected Male A 313 825
2 Admitted Female A 89 108
3 Rejected Female A 19 108
4 Admitted Male B 353 560
5 Rejected Male B 207 560
6 Admitted Female B 17 25
7 Rejected Female B 8 25
8 Admitted Male C 120 325
9 Rejected Male C 205 325
10 Admitted Female C 202 593
11 Rejected Female C 391 593
12 Admitted Male D 138 417
13 Rejected Male D 279 417
14 Admitted Female D 131 375
15 Rejected Female D 244 375
16 Admitted Male E 53 191
17 Rejected Male E 138 191
18 Admitted Female E 94 393
19 Rejected Female E 299 393
20 Admitted Male F 22 373
21 Rejected Male F 351 373
22 Admitted Female F 24 341
23 Rejected Female F 317 341
你可以使用pandas.DataFrame.merge()将你的总数从第二个数据帧加到第一个数据帧。首先,在总数df中重命名freq。
df1 = df1.rename(columns={'Freq':'Total'})
df_totals = pd.merge(df, df1['Total'], how='left', on=['Gender', 'Dept'])