我有很多excel文件,其中有很多工作表(每个工作表都有不同的名称)和各种列。
每张excel的每张表都有三列,我想把它们提取出来(名称,标题,数据库),并把它们放在一个只有一张表的文件中。因此,输出文件中应该有三列由所有其他excel和工作表的名称、标题、数据库数据组成。
如果用下面的代码来做,就会只从第一张表中抓取数据。
通过添加 sheet_name=None
在 read_excel()
我明白了 TypeError: Can only append a Series if ignore_index=True or if the Series has a name.
通过添加 ignore_index=True
在 append()
,输出文件中没有数据。
谢谢你的时间!我是一个编程初学者。我是一个编程初学者。
import pandas as pd
#Setting SourceFiles
my_files = [(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\a.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\b.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\c.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\d.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\e.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\f.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\g.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\h.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\i.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\j.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\k.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\l.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\m.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\n.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\o.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\p.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\q.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\r.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\s.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\t.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\u.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\v.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\w,y,z.xls')]
#Combining the SourceFiles
df = pd.DataFrame()
for f in my_files:
data = pd.read_excel(f) #Here I set the sheet_name=None
df = df.append(data) #Here I set the ignore_index=True
#Defining the columns I want from the Source file and how I want them (Sorted and with Dropped duplicates)
columns_i_want = (pd.DataFrame(df, columns= ['Name', 'Title', 'Database']))
sorted_data = column_i_want.sort_values('Name', ascending=True)
#Transfering the data to an other excel
column_i_want.to_excel (r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\output.xlsx', index = False, header=True)
于是,我找到了方法! 我不认为这是最好的或最简单的方法,但它对我来说是有效的。
import pandas as pd
test = [(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\a.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\b.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\c.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\d.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\e.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\f.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\g.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\h.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\i.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\j.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\k.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\l.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\m.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\n.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\o.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\p.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\q.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\r.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\s.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\t.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\u.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\v.xls'),
(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\REAL\w,y,z.xls')]
df = pd.DataFrame()
def merge(data):
df = pd.concat(pd.read_excel(data, sheet_name=None), ignore_index=True)
df = pd.DataFrame(df, columns= ['Name', 'Title', 'Database'])
column_i_want = (pd.DataFrame(df).drop_duplicates(subset=["Title"]))
sorted_data = (column_i_want.sort_values(['Name', 'Title'], ascending=True))
return sorted_data
for i in test:
x = merge(i)
df = df.append(x)
df.to_excel(r'C:\Users\John\Dev\Mini_Personal_scripts\The_Excel_Project\Excels\output.xlsx', index = False, header=True)