我想合并两个CSV文件如下:
第一个CSV文件:
df = pd.DataFrame()
df["ticket_number"] = ['AAA', 'AAA', 'AAA', 'ABC', 'ABA','ADC','ABA','BBB']
df["train_board_station"] = ['Tokyo', 'LA', 'Paris', 'New_York', 'Delhi','Phoenix', 'London','LA']
df["train_off_station"] = ['Phoenix', 'London', 'Sydney', 'Berlin', 'Shanghai','LA', 'Paris', 'New_York']
第二个CSV文件:
rec = pd.DataFrame()
rec["code"] = ['Tokyo','London','Paris','New_York','Shanghai','LA','Sydney','Berlin','Phoenix','Delhi']
rec["count_A"] = ['1.2','7.8','4','8','7.8','3','8','5','2','10']
rec["count_B"] = ['12','78','4','8','78','36','88','51','25','10']
我使用以下代码:
for x in ["board", "off"]:
df["station"] = df["train_" + x + "_station"]
df["code"] = df["train_" + x + "_station"]
df = pd.concat([df,rec], axis=1, join_axes=[df.index])
df[x + "_count_A"] = df["count_A"]
df[x + "_count_B"] = df["count_B"]
df = df.drop(["station", "code","count_A","count_B"], axis=1)
我得到以下不正确的输出:
ticket_number,train_board_station,train_off_station,board_count_A,board_count_B,off_count_A,off_count_B
AAA,Tokyo,Phoenix,1.2,12,1.2,12
AAA,LA,London,7.8,78,7.8,78
AAA,Paris,Sydney,4,4,4,4
ABC,New_York,Berlin,8,8,8,8
ABA,Delhi,Shanghai,7.8,78,7.8,78
ADC,Phoenix,LA,3,36,3,36
ABA,London,Paris,8,88,8,88
BBB,LA,New_York,5,51,5,51
我注意到,不是count_A和count_B与同一行的train_board station和train_off_station合并,第一行与train_board_station合并,第二行与train_off_station合并两次。
预期的产出是:
ticket_number,train_board_station,train_off_station,board_count_A,board_count_B,off_count_A,off_count_B
AAA,Tokyo,Phoenix,1.2,12,2,25
AAA,LA,London,3,36,7.8,78
AAA,Paris,Sydney,4,4,8,88
ABC,New_York,Berlin,8,8,5,51
ABA,Delhi,Shanghai,10,10,7.8,78
ADC,Phoenix,LA,2,26,3,36
ABA,London,Paris,7.7,78,4,4
BBB,LA,New_York,36,36,8,8
有重复的问题,我使用join
与左连接:
for x in ["board", "off"]:
df["code"] = df["station"] = df["train_" + x + "_station"]
df = df.join(rec.set_index('code'), on='code')
df[x + "_count_A"] = df["count_A"]
df[x + "_count_B"] = df["count_B"]
df = df.drop(["station", "code","count_A","count_B"], axis=1)
print (df)
ticket_number train_board_station train_off_station board_count_A \
0 AAA Tokyo Phoenix 1.2
1 AAA LA London 3
2 AAA Paris Sydney 4
3 ABC New_York Berlin 8
4 ABA Delhi Shanghai 10
5 ADC Phoenix LA 2
6 ABA London Paris 7.8
7 BBB LA New_York 3
board_count_B off_count_A off_count_B
0 12 2 25
1 36 7.8 78
2 4 8 88
3 8 5 51
4 10 7.8 78
5 25 3 36
6 78 4 4
7 36 8 8