我想从文件夹加载新文件,然后清理并将它们合并为一个 DF,我尝试过 pd.concat 、 df.join 和 merge() 但我似乎无法让它对两个单独的文件执行此操作/df 我如何才能将每个写入为新的 DF 名称
尝试了一些功能,似乎无法合并,因为我不知道各个 DF 名称,并且我想在每次运行脚本时添加新文件
import pandas as pd
import os
import glob
from pathlib import Path
# assign directory
directory = "/lakehouse/default/Files/SP/SP_Xlsx/"
# iterate over files in
# that directory
files = Path(directory).glob('*')
for file in files:
df = pd.read_excel(file,sheet_name='Option')
print(file)
# drops rows with ID: 0,1,2,3
idx = [0,1,2,3]
df = df.query("index != @idx")
#resets Row ID 0 as index/header
df.columns = df.iloc[0]
df = df[1:]
df = df.drop(df.index.to_list()[1:], axis=0)
display(df)
The result of this is 2 data frames (both named df) how to I merge them as one?
您可以声明一个空列表来在循环时存储数据帧,因此类似的事情可能会有所帮助。希望我理解你的问题。
import pandas as pd
import os
import glob
from pathlib import Path
# assign directory
directory = "/lakehouse/default/Files/SP/SP_Xlsx/"
# create an empty list to store data frames
data_frames = []
# iterate over files in that directory
files = Path(directory).glob('*')
for file in files:
df = pd.read_excel(file, sheet_name='Option')
print(file)
# drops rows with ID: 0,1,2,3
idx = [0,1,2,3]
df = df.query("index != @idx")
# resets Row ID 0 as index/header
df.columns = df.iloc[0]
df = df[1:]
df = df.drop(df.index.to_list()[1:], axis=0)
# append the data frame to the list
data_frames.append(df)
# concatenate all data frames in the list
merged_df = pd.concat(data_frames, axis=0)
# display the merged data frame
display(merged_df)