将包含计数数据的 DataFrame 分解/反转为每项一行

Question

上下文：逻辑回归问题的数据转换。我有以下数据结构：

df = pd.DataFrame({"group": ["A", "B"], "total": [3, 5], "occurrence": [2, 1]})

我想做某事。类似于

pd.explode

，但为

total

的项目创建一行，即 5+6 行，其中

occurence

行数包含

，其余

（位于

occurence

列或新目标列）。

目前我正在迭代地进行，这对于大数据来说非常慢：

expanded = []
for ix, row in df.iterrows():
    for i in range(row["count"]):
        row["y"] = 1 if i < row["occurence"] else 0
        expanded.append(row.copy())
df_out = pd.DataFrame(expanded).reset_index(drop=True)
df_out.drop(["count", "occurence"], axis=1, inplace=True)
df_out


  group  y
0     A  1
1     A  1
2     A  0
3     B  1
4     B  0
5     B  0
6     B  0
7     B  0

Answer 1

您可以

repeat

行，然后

assign

基于

groupby.cumcount

的输出创建一个新列：

out = (df.loc[df.index.repeat(df['total']), ['group', 'occurrence']]
         .assign(y=lambda x: x.groupby(level=0).cumcount().lt(x.pop('occurrence')).astype(int))
      )

将包含计数数据的 DataFrame 分解/反转为每项一行

问题描述投票：0回答：1

1个回答

最新问题

将包含计数数据的 DataFrame 分解/反转为每项一行

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1