考虑到我有 2 个数据帧 df1 和 df2,有 2 个关键列 key1 和 key2。我正在寻找一种方法来连接这两个数据帧,忽略空白并生成 df3 作为输出。
输入1(df1)
key1 key2 val1
0 A E 1
1 B F 2
2 C G 3
3 D H 4
输入2(df2)
key1 key2 val2
0 A E 5
1 B F 6
2 NaN G 7
3 D NaN 8
4 A NaN 9
预期输出(df3)
key1 key2 val1 val2
0 A E 1 5
1 B F 2 6
2 NaN G 3 7
3 D NaN 4 8
进一步解释一下,
import numpy as np
import pandas as pd
df1 = pd.DataFrame({
'key1':['A', 'B', 'C', 'D'],
'key2':['E', 'F', 'G', 'H'],
'val1':[1, 2, 3, 4]})
df2 = pd.DataFrame({
'key1':['A', 'B', np.NaN, 'D', 'A'],
'key2':['E', 'F', 'G', np.NaN, np.NaN],
'val2':[5, 6, 7, 8, 9]})
df3 = pd.DataFrame({
'key1':['A', 'B', np.NaN, 'D'],
'key2':['E', 'F', 'G', np.NaN],
'val1':[1, 2, 3, 4],
'val2':[5, 6, 7, 8]})
如果您不想使用我在上一个问题中提出的方法,您可以分别对两个键和单个键执行
merge
,然后 concat
和 drop_duplicates
保持第一场比赛:
tmp = df1.reset_index()
out = (pd.concat([df2.dropna(subset=['key1', 'key2'])
.merge(tmp, on=['key1', 'key2']),
df2.drop(columns='key1').dropna(subset='key2')
.merge(tmp.drop(columns='key1'), on='key2'),
df2.drop(columns='key2').dropna(subset='key1')
.merge(tmp.drop(columns='key2'), on='key1')])
.drop_duplicates('index').drop(columns='index')
)
输出:
key1 key2 val2 val1
0 A E 5 1
1 B F 6 2
2 NaN G 7 3
2 D NaN 8 4