可扩展的方法而不是在Python中应用

问题描述 投票:0回答:1

我使用 apply 循环行并获取 feat1、feat2 或 feat3 的列名称(如果它们等于 1 并且 Score 等于 0)。然后将列名称插入到名为 Reason 的新功能中。

此解决方案无法扩展到更大的数据集。我正在寻找更快的方法。我怎样才能做到这一点?

df = pd.DataFrame({'ID':[1,2,3],
             'feat1_tax':[1,0,0],
             'feat2_move':[1,0,0],
             'feat3_coffee': [0,1,0],
             'scored':[0,0,1]})

def get_not_scored_reason(row):
    exclusions_list = [col for col in df.columns if col.startswith('feat')]
    reasons = [col for col in exclusions_list if row[col] == 1]
    return ', '.join(reasons) if reasons else None

df['reason'] = df.apply(lambda row: get_not_scored_reason(row) if row['scored'] == 0 else None, axis=1)

print(df)
   ID  feat1_tax  feat2_move  feat3_coffee  scored      reason
0   1          1           1             0       0  feat1_tax, feat2_move
1   2          0           0             1       0           feat3_coffee
2   3          0           0             0       1                   None
python pandas numpy
1个回答
0
投票

你可以尝试:

columns = df.filter(regex=r"^feat")
df["reason"] = (columns * columns.columns).agg(
    lambda x: ", ".join(x[x.ne("")]) or None, axis=1
)

print(df)

打印:

   ID  feat1_tax  feat2_move  feat3_coffee  scored                 reason
0   1          1           1             0       0  feat1_tax, feat2_move
1   2          0           0             1       0           feat3_coffee
2   3          0           0             0       1                   None
© www.soinside.com 2019 - 2024. All rights reserved.