如何高效计算聚合列的份额

问题描述 投票:0回答:1

我有以下 DataFrame,想要计算“份额”。

    import pandas as pd

    d = {"col1":["A", "A", "A", "B", "B", "B"], "col2":["start_amount", "mid_amount", "end_amount", "start_amount", "mid_amount", "end_amount"], "amount":[0, 2, 8, 1, 2, 3]}
    df_test = pd.DataFrame(d)
    
    df_test["share"] = 0
    for i in range(len(df_test)):
        df_test.loc[i, "share"] = df_test.loc[i, "amount"] / df_test.loc[(df_test["col1"] == df_test.loc[i, "col1"]) & (df_test["col2"] == "end_amount"), "amount"].values

这确实有效,但远非高效。有更好的计算方法吗?

python pandas dataframe
1个回答
0
投票

这相当于选择具有“end_amount”的行,然后对每个“col1”执行

map
,然后除以“amount”:

s = df_test.loc[df_test['col2'].eq('end_amount')].set_index('col1')['amount']
df_test['share'] = df_test['amount']/df_test['col1'].map(s)

输出:

  col1          col2  amount     share
0    A  start_amount       0  0.000000
1    A    mid_amount       2  0.250000
2    A    end_amount       8  1.000000
3    B  start_amount       1  0.333333
4    B    mid_amount       2  0.666667
5    B    end_amount       3  1.000000
© www.soinside.com 2019 - 2024. All rights reserved.