我有以下 DataFrame,想要计算“份额”。
import pandas as pd
d = {"col1":["A", "A", "A", "B", "B", "B"], "col2":["start_amount", "mid_amount", "end_amount", "start_amount", "mid_amount", "end_amount"], "amount":[0, 2, 8, 1, 2, 3]}
df_test = pd.DataFrame(d)
df_test["share"] = 0
for i in range(len(df_test)):
df_test.loc[i, "share"] = df_test.loc[i, "amount"] / df_test.loc[(df_test["col1"] == df_test.loc[i, "col1"]) & (df_test["col2"] == "end_amount"), "amount"].values
这确实有效,但远非高效。有更好的计算方法吗?
这相当于选择具有“end_amount”的行,然后对每个“col1”执行
map
,然后除以“amount”:
s = df_test.loc[df_test['col2'].eq('end_amount')].set_index('col1')['amount']
df_test['share'] = df_test['amount']/df_test['col1'].map(s)
输出:
col1 col2 amount share
0 A start_amount 0 0.000000
1 A mid_amount 2 0.250000
2 A end_amount 8 1.000000
3 B start_amount 1 0.333333
4 B mid_amount 2 0.666667
5 B end_amount 3 1.000000