我使用 apply 循环行并获取 feat1、feat2 或 feat3 的列名称(如果它们等于 1 并且 Score 等于 0)。然后将列名称插入到名为 Reason 的新功能中。
此解决方案无法扩展到更大的数据集。我正在寻找更快的方法。我怎样才能做到这一点?
df = pd.DataFrame({'ID':[1,2,3],
'feat1_tax':[1,0,0],
'feat2_move':[1,0,0],
'feat3_coffee': [0,1,0],
'scored':[0,0,1]})
def get_not_scored_reason(row):
exclusions_list = [col for col in df.columns if col.startswith('feat')]
reasons = [col for col in exclusions_list if row[col] == 1]
return ', '.join(reasons) if reasons else None
df['reason'] = df.apply(lambda row: get_not_scored_reason(row) if row['scored'] == 0 else None, axis=1)
print(df)
ID feat1_tax feat2_move feat3_coffee scored reason
0 1 1 1 0 0 feat1_tax, feat2_move
1 2 0 0 1 0 feat3_coffee
2 3 0 0 0 1 None
你可以尝试:
columns = df.filter(regex=r"^feat")
df["reason"] = (columns * columns.columns).agg(
lambda x: ", ".join(x[x.ne("")]) or None, axis=1
)
print(df)
打印:
ID feat1_tax feat2_move feat3_coffee scored reason
0 1 1 1 0 0 feat1_tax, feat2_move
1 2 0 0 1 0 feat3_coffee
2 3 0 0 0 1 None