如何在不同大小的数据帧之间使用 np.where ? '操作数不能一起广播'

问题描述 投票:0回答:3

我有两个不同大小的数据框。

df1
有地址但没有邮政编码。
df2
有地址和邮政编码。

我正在尝试使用

df1
匹配从
df2
np.where
的地址,如果匹配,请将相应的邮政编码带到
df1

但是我刚刚意识到这不适用于不同大小的数据帧。

第一个没有邮政编码的数据框:

df1 = pd.DataFrame({'address1':['1 o\'toole st','2 main st','3 high street','5 foo street','10 foo street'],
                   'address2':['town1',np.nan,np.nan,'Bartown',np.nan],
                   'address3':[np.nan,'village','city','county2','county3']})
df1['zipcode']=''
print(df1)

        address1 address2 address3 zipcode
0   1 o'toole st    town1      NaN        
1      2 main st      NaN  village        
2  3 high street      NaN     city        
3   5 foo street  Bartown  county2        
4  10 foo street      NaN  county3       

我想从中获取邮政编码的第二个数据框:

df2 = pd.DataFrame({'address1':['1 o\'toole st','2 main st','7 mill street','5 foo street','10 foo street','asda'],
                   'address2':['town1','village','city','Bartown','county3','efsefs'],
                   'address3':[np.nan,np.nan,np.nan,'county2','USA','asdasd'],
                   'zipcode': ['er45','qw23','rt67','yu89','yu83','aedsa']})
print(df2)

        address1 address2 address3 zipcode
0   1 o'toole st    town1      NaN    er45
1      2 main st  village      NaN    qw23
2  7 mill street     city      NaN    rt67
3   5 foo street  Bartown  county2    yu89
4  10 foo street  county3      USA    yu83
5           asda   efsefs   asdasd   aedsa

使用

df1['zipcode']
填写
np.where
列。如果两个地址匹配,则返回
df2['zipcode']
否则
'no_match'
:

df1['zipcode'] = np.where(df1['address1'].isin(df2['address1']), df2['zipcode'], 'no_match')



ValueError                                Traceback (most recent call last)
<ipython-input-176-499624d43d5c> in <module>
----> 1 df1['zipcode'] = np.where(df1['address1'].isin(df2['address1']), df2['zipcode'], 'no_match')
      2 df1

ValueError: operands could not be broadcast together with shapes (5,) (6,) ()

是否可以使用“np.where”和不同大小的数据帧来做到这一点?或者是否有更好的方法来搜索匹配项并提供邮政编码?

python pandas numpy dataframe
3个回答
1
投票

Series.map
与由
key
创建的新列
fillna
一起使用,因为没有匹配项会获取缺失值,所以最后添加
fillna('no_match')
:

df1['key'] = df1['address1'] + df1['address2'].fillna(df1['address3'])
df2['key'] = df2['address1'] + df2['address2'].fillna(df2['address3'])

df1['zipcode'] =  df1['key'].map(df2.set_index('key')['zipcode']).fillna('no_match')

print (df1)
        address1 address2 address3                   key   zipcode
0   1 o'toole st    town1      NaN     1 o'toole sttown1      er45
1      2 main st      NaN  village      2 main stvillage      qw23
2  3 high street      NaN     city     3 high streetcity  no_match
3   5 foo street  Bartown  county2   5 foo streetBartown      yu89
4  10 foo street      NaN  county3  10 foo streetcounty3      yu83

1
投票

您可以使用合并:

df_new = df1.merge(df2[['address1', 'zipcode']], on='address1', how='left')
df_new = df_new.fillna('no_match')

0
投票
df2=df2.reindex(df1.index)
df1['zipcode'] = np.where(df1['address1'].isin(df2['address1']), df2['zipcode'], 'no_match')
df1

    address1 address2 address3   zipcode
0   1 o'toole st    town1      NaN      er45
1      2 main st      NaN  village      qw23
2  3 high street      NaN     city  no_match
3   5 foo street  Bartown  county2      yu89
4  10 foo street      NaN  county3      yu83
© www.soinside.com 2019 - 2024. All rights reserved.