我有两个 pandas DataFrame:
object_1df = pd.DataFrame([['a', 1], ['b', 2]],
columns=['letter', 'number'])
object_2df = pd.DataFrame([['b', 3, 'cat'], ['c', 4, 'dog']],
columns=['letter', 'number', 'animal'])
我需要为每个 DataFrame 制作一行目录,列数等于元素数量。最终的形式应该是每个 df 一行,并包含以下列:
我尝试过非常丑陋的:
objects = [object_1df, object_2df]
catalog = pd.DataFrame()
for objectdf in objects:
object_row = pd.DataFrame()
for letter in objectdf['letter']:
for column in objectdf.columns:
object_row[f'{letter}_{column}'] = objectdf[column].loc[objectdf['letter']==letter]
catalog = pd.concat([catalog, object_row], ignore_index=True)
display(catalog)
输出不需要的结果:
这个结果本质上只计算每个 df 的第一行,并在其他地方给出 NaN。这样做的正确方法是什么?
回答我自己的问题:像这样展平 df 可以得到期望的结果:
object_1df = pd.DataFrame([['a', 1], ['b', 2]],
columns=['letter', 'number'])
object_2df = pd.DataFrame([['b', 3, 'cat'], ['c', 4, 'dog']],
columns=['letter', 'number', 'animal'])
objects = [object_1df, object_2df]
catalog = pd.DataFrame()
for df in objects:
df.set_index('letter', inplace=True)
flattened_data = {f'{index}_{col}': df.loc[index, col] for index in df.index for col in df.columns}
flattened_df = pd.DataFrame([flattened_data])
display(flattened_df)
catalog = pd.concat([catalog, flattened_df], ignore_index=True)
display(catalog)