我如何使用 pandas 来追加各种 excels 中的特定列和所有工作表?

问题描述 投票:0回答:1

我有很多excel文件,其中有很多工作表(每个工作表都有不同的名称)和各种列。

每张excel的每张表都有三列,我想把它们提取出来(名称,标题,数据库),并把它们放在一个只有一张表的文件中。因此,输出文件中应该有三列由所有其他excel和工作表的名称、标题、数据库数据组成。

如果用下面的代码来做,就会只从第一张表中抓取数据。

通过添加 sheet_name=Noneread_excel()我明白了 TypeError: Can only append a Series if ignore_index=True or if the Series has a name.

通过添加 ignore_index=Trueappend(),输出文件中没有数据。

谢谢你的时间!我是一个编程初学者。我是一个编程初学者。

import pandas as pd

#Setting SourceFiles
my_files = [(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\a.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\b.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\c.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\d.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\e.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\f.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\g.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\h.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\i.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\j.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\k.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\l.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\m.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\n.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\o.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\p.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\q.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\r.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\s.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\t.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\u.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\v.xls'),
            (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\w,y,z.xls')]

#Combining the SourceFiles
df = pd.DataFrame()

for f in my_files:
    data = pd.read_excel(f) #Here I set the sheet_name=None
    df = df.append(data)    #Here I set the ignore_index=True

#Defining the columns I want from the Source file and how I want them (Sorted and with Dropped duplicates)

columns_i_want = (pd.DataFrame(df, columns= ['Name', 'Title', 'Database']))
sorted_data = column_i_want.sort_values('Name', ascending=True)

#Transfering the data to an other excel
column_i_want.to_excel (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\output.xlsx', index = False, header=True)
python excel pandas append
1个回答
0
投票

于是,我找到了方法! 我不认为这是最好的或最简单的方法,但它对我来说是有效的。

import pandas as pd

test = [(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\a.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\b.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\c.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\d.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\e.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\f.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\g.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\h.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\i.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\j.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\k.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\l.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\m.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\n.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\o.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\p.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\q.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\r.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\s.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\t.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\u.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\v.xls'),
        (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\w,y,z.xls')]

df = pd.DataFrame()

def merge(data):
    df = pd.concat(pd.read_excel(data, sheet_name=None), ignore_index=True)
    df = pd.DataFrame(df, columns= ['Name', 'Title', 'Database'])
    column_i_want = (pd.DataFrame(df).drop_duplicates(subset=["Title"]))
    sorted_data = (column_i_want.sort_values(['Name', 'Title'], ascending=True))
    return sorted_data

for i in test:
    x = merge(i)
    df = df.append(x)

df.to_excel(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\output.xlsx', index = False, header=True)
© www.soinside.com 2019 - 2024. All rights reserved.