我有两个数据帧(A和B)。我想删除B中的所有行,其中列Month,Year,Type,Name的值完全匹配。
数据帧A.
Name Type Month Year country Amount Expiration Paid
0 EXTRON GOLD March 2019 CA 20000 2019-09-07 yes
0 LEAF SILVER March 2019 PL 4893 2019-02-02 yes
0 JMC GOLD March 2019 IN 7000 2020-01-16 no
数据帧B.
Name Type Month Year country Amount Expiration Paid
0 JONS GOLD March 2018 PL 500 2019-10-17 yes
0 ABBY BRONZE March 2019 AU 60000 2019-02-02 yes
0 BUYT GOLD March 2018 BR 50 2018-03-22 no
0 EXTRON GOLD March 2019 CA 90000 2019-09-07 yes
0 JAYB PURPLE March 2019 PL 9.90 2018-04-20 yes
0 JMC GOLD March 2019 IN 6000 2020-01-16 no
0 JMC GOLD April 2019 IN 1000 2020-01-16 no
期望的输出:
数据帧B.
Name Type Month Year country Amount Expiration Paid
0 JONS GOLD March 2018 PL 500 2019-10-17 yes
0 ABBY BRONZE March 2019 AU 60000 2019-02-02 yes
0 BUYT GOLD March 2018 BR 50 2018-03-22 no
0 JAYB PURPLE March 2019 PL 9.90 2018-04-20 yes
0 JMC GOLD April 2019 IN 1000 2020-01-16 no
我们可以在这里使用merge
l=['Month', 'Year','Type', 'Name']
B=B.merge(A[l],on=l,indicator=True,how='outer').loc[lambda x : x['_merge']=='left_only'].copy()
# you can add drop here like B=B.drop('_merge',1)
Name Type Month Year country Amount Expiration Paid _merge
0 JONS GOLD March 2018 PL 500.0 2019-10-17 yes left_only
1 ABBY BRONZE March 2019 AU 60000.0 2019-02-02 yes left_only
2 BUYT GOLD March 2018 BR 50.0 2018-03-22 no left_only
4 JAYB PURPLE March 2019 PL 9.9 2018-04-20 yes left_only
6 JMC GOLD April 2019 IN 1000.0 2020-01-16 no left_only
我尝试使用MultiIndex。
cols =['Month', 'Year','Type', 'Name']
index1 = pd.MultiIndex.from_arrays([df1[col] for col in cols])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in cols])
df2 = df2.loc[~index2.isin(index1)]