使用pandas在两个条件下创建列

问题描述 投票:2回答:2

我正在利用熊猫做一些分析练习。我想创建一个新列,其值是两行的总和。原始数据集如下......

    Admit      Gender   Dept    Freq
0   Admitted    Male    A   512
1   Rejected    Male    A   313
2   Admitted    Female  A   89
3   Rejected    Female  A   19
4   Admitted    Male    B   353
5   Rejected    Male    B   207
6   Admitted    Female  B   17
7   Rejected    Female  B   8
8   Admitted    Male    C   120
9   Rejected    Male    C   205
10  Admitted    Female  C   202
11  Rejected    Female  C   391
12  Admitted    Male    D   138
13  Rejected    Male    D   279
14  Admitted    Female  D   131
15  Rejected    Female  D   244
16  Admitted    Male    E   53
17  Rejected    Male    E   138
18  Admitted    Female  E   94
19  Rejected    Female  E   299
20  Admitted    Male    F   22
21  Rejected    Male    F   351
22  Admitted    Female  F   24
23  Rejected    Female  F   317

我想利用以下数据框创建一个新列...

    Dept    Gender  Freq
0   A   Female  108
1   A   Male    825
2   B   Female  25
3   B   Male    560
4   C   Female  593
5   C   Male    325
6   D   Female  375
7   D   Male    417
8   E   Female  393
9   E   Male    191
10  F   Female  341
11  F   Male    373

我想利用第二个数据帧的Freq列在第一个数据帧中创建一个新列。我需要插入108if Detp and Gender在两个数据帧中是相同的。新数据框应如下所示......

    Admit      Gender   Dept    Freq   Total
0   Admitted    Male    A   512        825
1   Rejected    Male    A   313        825
2   Admitted    Female  A   89         108
3   Rejected    Female  A   19         108
4   Admitted    Male    B   353        560
5   Rejected    Male    B   207        560
6   Admitted    Female  B   17         25
7   Rejected    Female  B   8          25 

我试过以下代码......

for i in data.iterrows():
    for j in total_freq.iterrows():
        if i[1].Gender == total_freq.Gender & i[1].Dept == total_freq.Dept:
            data['Total'] = total_freq.Freq

我收到以下错误... TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]

有没有帮助创建具有正确值的列?

python-3.x pandas conditional
2个回答
2
投票

你可以使用变换

df['Total'] = df.groupby(['Dept', 'Gender']).Freq.transform('sum')

你得到

    Admit   Gender  Dept    Freq    Total
0   Admitted    Male    A   512 825
1   Rejected    Male    A   313 825
2   Admitted    Female  A   89  108
3   Rejected    Female  A   19  108
4   Admitted    Male    B   353 560
5   Rejected    Male    B   207 560
6   Admitted    Female  B   17  25
7   Rejected    Female  B   8   25
8   Admitted    Male    C   120 325
9   Rejected    Male    C   205 325
10  Admitted    Female  C   202 593
11  Rejected    Female  C   391 593
12  Admitted    Male    D   138 417
13  Rejected    Male    D   279 417
14  Admitted    Female  D   131 375
15  Rejected    Female  D   244 375
16  Admitted    Male    E   53  191
17  Rejected    Male    E   138 191
18  Admitted    Female  E   94  393
19  Rejected    Female  E   299 393
20  Admitted    Male    F   22  373
21  Rejected    Male    F   351 373
22  Admitted    Female  F   24  341
23  Rejected    Female  F   317 341

0
投票

你可以使用pandas.DataFrame.merge()将你的总数从第二个数据帧加到第一个数据帧。首先,在总数df中重命名freq。

df1 = df1.rename(columns={'Freq':'Total'})
df_totals = pd.merge(df, df1['Total'], how='left', on=['Gender', 'Dept'])
© www.soinside.com 2019 - 2024. All rights reserved.