如何使用pandas合并2个或更多CSV文件,并在找不到值时替换为“不适用”

问题描述 投票:-1回答:2

我试图合并两个或更多csv文件,以便

MetaData1
Sample_name   TITLE
Cody        Chicken Pox
Claudia     Chicken Pox
Alex        Chicken Pox
Steven      Chicken Pox
Mom         Chicken Pox
Dad     

MetaData2
Sample_name    TITLE       Geo_Loc    DESCRIPTION
Dad         Chicken Pox     Earth       people
Me          Chicken Pox     Earth       people
Roger       Chicken Pox     Earth       people
Ben         Chicken Pox     Earth       people

合并在一起看起来像这样:

Merged Metadata 
Sample_name    TITLE             Geo_Loc                 DESCRIPTION
Cody        Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Claudia     Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Alex        Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Steven      Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Mom         Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Dad         Chicken Pox     Earth                   people
Me          Chicken Pox     Earth                   people
Roger       Chicken Pox     Earth                   people
Ben         Chicken Pox     Earth                   people

我到目前为止的代码是下面:然而它只是将两个csv文件拼接在一起并没有真正修改和过度铺设。如上图所示。

import pandas as panda
import numpy as numpy

File_one = panda.read_csv('/Users/c1carpenter/Desktop/Test.txt', sep='\t', header=0, dtype=str)
File_two = panda.read_csv('/Users/c1carpenter/Desktop/Test2.txt', sep='\t', header=0, dtype=str)
Concat_File = panda.concat([File_one, File_two])

for column_header in Final_File:
    for entry in Final_File[column_header]:
        if str(entry) == 'nan':
            print(entry)
            entry = 'not applicable'
            print("changed to: " + entry)

Concat_File.to_csv(path_or_buf='/Users/c1carpenter/Desktop/' + 'diditwork.txt', sep='\t', na_rep='not applicable',index=False)
python pandas csv numpy merge
2个回答
0
投票

如果指定entry = 'not applicable',则不要修改数据帧。相反,你应该迭代框架并使用Final_File.atFinal_File.iat进行分配。

但在您的情况下,您可以使用以下方法填充NaN:

Final_File.fillna('Missing:Not Applicable')

请看看伟大的教程10 Minutes to pandas


0
投票

如果我理解你的问题,看起来你可能想要merge()你的csv文件的结果,而不是concat()他们在一起。

import pandas as pd
m1 = pd.read_csv('metadata1.csv')
m2 = pd.read_csv('metadata2.csv')
mm = pd.merge(m1, m2, how='outer', on='Sample_name')  

# Cleanup to merge duplicate non-index column
mm['TITLE'] = mm[['TITLE_x', 'TITLE_y']].fillna('').sum(axis=1)
mm.drop(['TITLE_x','TITLE_y'], axis=1, inplace=True)

# Replace NaN if you desire
mm.fillna('Missing:Not Applicable')

如果你想更好地调整你的结果,你可以找到更多关于merge()以及如何进行连接的信息:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

希望有所帮助。

© www.soinside.com 2019 - 2024. All rights reserved.