如何在 For 循环中合并 DF,作为新的 DF - Pandas

问题描述 投票:0回答:1

我想从文件夹加载新文件,然后清理并将它们合并为一个 DF,我尝试过 pd.concat 、 df.join 和 merge() 但我似乎无法让它对两个单独的文件执行此操作/df 我如何才能将每个写入为新的 DF 名称

尝试了一些功能,似乎无法合并,因为我不知道各个 DF 名称,并且我想在每次运行脚本时添加新文件

import pandas as pd 
import os 
import glob 
from pathlib import Path
 
# assign directory
directory = "/lakehouse/default/Files/SP/SP_Xlsx/"
 
# iterate over files in
# that directory
files = Path(directory).glob('*')
for file in files:
    df = pd.read_excel(file,sheet_name='Option')


    print(file)

    # drops rows with ID: 0,1,2,3
    idx = [0,1,2,3]
    df = df.query("index != @idx")
   
    #resets Row ID 0 as index/header
    df.columns = df.iloc[0]
    df = df[1:]
   
    df = df.drop(df.index.to_list()[1:], axis=0)
    display(df)

The result of this is 2 data frames (both named df) how to I merge them as one?

python pandas helper fabric
1个回答
0
投票

您可以声明一个空列表来在循环时存储数据帧,因此类似的事情可能会有所帮助。希望我理解你的问题。

import pandas as pd 
import os 
import glob 
from pathlib import Path

# assign directory
directory = "/lakehouse/default/Files/SP/SP_Xlsx/"

# create an empty list to store data frames
data_frames = []

# iterate over files in that directory
files = Path(directory).glob('*')
for file in files:
    df = pd.read_excel(file, sheet_name='Option')

    print(file)

    # drops rows with ID: 0,1,2,3
    idx = [0,1,2,3]
    df = df.query("index != @idx")
   
    # resets Row ID 0 as index/header
    df.columns = df.iloc[0]
    df = df[1:]
   
    df = df.drop(df.index.to_list()[1:], axis=0)
    
    # append the data frame to the list
    data_frames.append(df)

# concatenate all data frames in the list
merged_df = pd.concat(data_frames, axis=0)

# display the merged data frame
display(merged_df)
© www.soinside.com 2019 - 2024. All rights reserved.