我试图合并两个或更多csv文件,以便
MetaData1
Sample_name TITLE
Cody Chicken Pox
Claudia Chicken Pox
Alex Chicken Pox
Steven Chicken Pox
Mom Chicken Pox
Dad
MetaData2
Sample_name TITLE Geo_Loc DESCRIPTION
Dad Chicken Pox Earth people
Me Chicken Pox Earth people
Roger Chicken Pox Earth people
Ben Chicken Pox Earth people
合并在一起看起来像这样:
Merged Metadata
Sample_name TITLE Geo_Loc DESCRIPTION
Cody Chicken Pox Missing:Not Applicable Missing:Not Applicable
Claudia Chicken Pox Missing:Not Applicable Missing:Not Applicable
Alex Chicken Pox Missing:Not Applicable Missing:Not Applicable
Steven Chicken Pox Missing:Not Applicable Missing:Not Applicable
Mom Chicken Pox Missing:Not Applicable Missing:Not Applicable
Dad Chicken Pox Earth people
Me Chicken Pox Earth people
Roger Chicken Pox Earth people
Ben Chicken Pox Earth people
我到目前为止的代码是下面:然而它只是将两个csv文件拼接在一起并没有真正修改和过度铺设。如上图所示。
import pandas as panda
import numpy as numpy
File_one = panda.read_csv('/Users/c1carpenter/Desktop/Test.txt', sep='\t', header=0, dtype=str)
File_two = panda.read_csv('/Users/c1carpenter/Desktop/Test2.txt', sep='\t', header=0, dtype=str)
Concat_File = panda.concat([File_one, File_two])
for column_header in Final_File:
for entry in Final_File[column_header]:
if str(entry) == 'nan':
print(entry)
entry = 'not applicable'
print("changed to: " + entry)
Concat_File.to_csv(path_or_buf='/Users/c1carpenter/Desktop/' + 'diditwork.txt', sep='\t', na_rep='not applicable',index=False)
如果指定entry = 'not applicable'
,则不要修改数据帧。相反,你应该迭代框架并使用Final_File.at
或Final_File.iat
进行分配。
但在您的情况下,您可以使用以下方法填充NaN:
Final_File.fillna('Missing:Not Applicable')
请看看伟大的教程10 Minutes to pandas。
如果我理解你的问题,看起来你可能想要merge()
你的csv文件的结果,而不是concat()
他们在一起。
import pandas as pd
m1 = pd.read_csv('metadata1.csv')
m2 = pd.read_csv('metadata2.csv')
mm = pd.merge(m1, m2, how='outer', on='Sample_name')
# Cleanup to merge duplicate non-index column
mm['TITLE'] = mm[['TITLE_x', 'TITLE_y']].fillna('').sum(axis=1)
mm.drop(['TITLE_x','TITLE_y'], axis=1, inplace=True)
# Replace NaN if you desire
mm.fillna('Missing:Not Applicable')
如果你想更好地调整你的结果,你可以找到更多关于merge()
以及如何进行连接的信息:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
希望有所帮助。