Python Pandas格式

问题描述 投票:0回答:3

我正在使用熊猫进行比较,我发现了下一个问题:

我有2个这样的表:

  DESCRIPTION   EXTRAS   ADDRESS  AVAILABLE
1   House        WiFi     CP 432     1
2   Farm         NONE     CP 345     1
3   House        Wifi     CP 315     1

  DESCRIPTION   EXTRAS   ADDRESS  AVAILABLE
1   House        WiFi     CP 437     0
2   House        Wifi     CP 315     0 

我有下一个输出:

ID   DESCRIPTION   EXTRAS   ADDRESS  AVAILABLE
1,1   House        WiFi     CP 432     1
2,2   Farm         NONE     CP 345     1
3,3   House        Wifi     CP 315     1
4,1   House        WiFi     CP 437     0

就像大熊猫混合两个ID一样。

另一方面,在另一个CSV中,我发现有些行看起来很棒,但其他行有“ID”列中的所有信息。奇怪的是,在合并两个CSV之前,所有信息都完美地放在正确的列中。它看起来像这样:

ID   DESCRIPTION   EXTRAS   ADDRESS  AVAILABLE
1   House        WiFi     CP 432     1
2;Farm NONE CP 345 1
3   House        Wifi     CP 315     1
1   House        WiFi     CP 437     0

在两种情况下合并2个CSV的代码如下:

df1 = pd.read_csv(get_work_folder_path(args.processName) + "/" + args.processName +"EnAlquiler"+ ".csv" , error_bad_lines=False)
df2 = pd.read_csv(get_work_folder_path(args.processName) + "/" + args.processName + ".csv" , error_bad_lines=False)

frames = [df1, df2]
result = pd.concat(frames)

df5 = pd.DataFrame(result)
df5.drop_duplicates( keep='first', inplace = True)

df5.to_csv(get_work_folder_path(args.processName) + "/" + args.processName +"HomeAwayComparacion"+ ".csv")

print(df5)
python pandas csv
3个回答
0
投票

我怀疑您的一个CSV输入格式不正确。如果没有error_bad_lines = False它将无法工作,有点证明它。尝试在组合之前打开并导出csv文件。如果我是对的,你会看到同样的问题。


0
投票

尝试追加功能:

str1 = io.StringIO('''
DESCRIPTION;EXTRAS;ADDRESS;AVAILABLE
1;House;WiFi;CP 432;1
2;Farm;NONE;CP 345;1
3;House;Wifi;CP 315;1
''')
df1 = pd.read_csv(str1, sep=";")

str2 = io.StringIO('''
DESCRIPTION;EXTRAS;ADDRESS;AVAILABLE
1;House;WiFi;CP 437;0
2;House;Wifi;CP 325;0
''')
df2 = pd.read_csv(str2, sep=";")

ddf = df1.append(df2)
print(ddf)

输出:

  DESCRIPTION EXTRAS ADDRESS  AVAILABLE
1       House   WiFi  CP 432          1
2        Farm   NONE  CP 345          1
3       House   Wifi  CP 315          1
1       House   WiFi  CP 437          0
2       House   Wifi  CP 325          0

如果要提供新的索引号,请使用ignore_index=True选项:

ddf = df1.append(df2, ignore_index=True)
print(ddf)

  DESCRIPTION EXTRAS ADDRESS  AVAILABLE
0       House   WiFi  CP 432          1
1        Farm   NONE  CP 345          1
2       House   Wifi  CP 315          1
3       House   WiFi  CP 437          0
4       House   Wifi  CP 325          0

0
投票

检查df1和df2的输入和类型。检查数据框中的索引,如有必要,请使用“df.reset_index()”

df1 =pd.DataFrame({"ID" : ["1","2","3"],
                   "DESCRIPTION" : ["House","Farm","House"],
                   "EXTRAS" : ["Wifi", None, "Wifi"],
                   "ADDRESS" : ["CP 432","CP 345","CP 315"],
                   "AVAILABLE" : [1,1,1]},
                   index = ["1","2","3"]
                  )
df2 =pd.DataFrame({"ID" : ["1","2"],
                   "DESCRIPTION" : ["House","House"],
                   "EXTRAS" : ["Wifi", "Wifi"],
                   "ADDRESS" : ["CP 432","CP 315"],
                   "AVAILABLE" : [0,0]},
                   index = [1,2]
                  )
frames = [df1, df2]
result=pd.concat(frames)
print(result)
df5 = pd.DataFrame(result)
df5.drop_duplicates( keep='first', inplace = True)
print(df5)

结果:

   ADDRESS  AVAILABLE DESCRIPTION EXTRAS ID
1  CP 432          1       House   Wifi  1
2  CP 345          1        Farm   None  2
3  CP 315          1       House   Wifi  3
1  CP 432          0       House   Wifi  1
2  CP 315          0       House   Wifi  2
© www.soinside.com 2019 - 2024. All rights reserved.