假设我有一个熊猫数据框,看起来像:
df1 = pd.DataFrame({"Item ID":["A", "B", "C", "D", "E"], "Value1":[1, 2, 3, 4, 0],
"Value2":[4, 5, 1, 8, 7], "Value3":[3, 8, 1, 2, 0],"Value4":[4, 5, 7, 9, 4]})
print(df1)
Item_ID Value1 Value2 Value3 Value4
0 A 1 4 3 4
1 B 2 5 8 5
2 C 3 1 1 7
3 D 4 8 2 9
4 E 0 7 0 4
现在我有了第二个数据框,看起来像:
df2 = {"Item ID":["A", "C", "D"], "Value5":[4, 5, 7]}
print(df2)
Item_ID Value5
0 A 4
1 C 5
2 D 7
[我想做的是找到我的两个数据框之间项目ID匹配的位置,然后将“ Value5”列值添加到行的交集,并且仅将df1的Value1和Value2列相交。
我的输出应显示:
7已添加到D行的“ Value1”和“ Value2”列中]
Item_ID Value1 Value2 Value3 Value4
0 A 5 8 3 4
1 B 2 5 8 5
2 C 8 6 1 7
3 D 11 15 2 9
4 E 0 7 0 4
当然,我的数据有数千行。我可以使用for循环来完成此操作,但这花费了太长时间。我希望能够以某种方式将其向量化。有什么想法吗?
尝试一下:
#merge the dataframes
(df1.merge(df2,on='Item ID',how='outer')
#add the columns
.assign(Value1 = lambda x : x.Value1.add(x.Value5,fill_value=0),
Value2 = lambda x : x.Value2.add(x.Value5, fill_value=0)
)
#remove irrelevant column
.drop('Value5',axis=1)
#not really needed
#if you are keen on having only integers
.astype({'Value1':'Int64','Value2':'Int64'})
)
Item ID Value1 Value2 Value3 Value4
0 A 5 8 3 4
1 B 2 5 8 5
2 C 8 6 1 7
3 D 11 15 2 9
4 E 0 7 0 4
如@mmm的评论所建议,这是使用Python词典的替代解决方案:
#create dictionaries
dict1 = (df1
#create temporary column
#and set as index
#the index allows us merge the dictionaries
#while still keeping Item ID column
.assign(temp=df1['Item ID'])
.set_index('temp')
.to_dict('index')
)
dict2 = (df2
.assign(temp=df2['Item ID'])
.set_index('temp')
.to_dict('index')
)
#check for keys that are in both dict1 and 2 i.e intersection
#print(dict1.keys() & dict2.keys())
#{'A', 'C', 'D'}
#loop through dict 1 and add values from dict2
for key in dict1.keys() & dict2.keys():
dict1[key]['Value1'] = dict1[key]['Value1'] + dict2[key]['Value5']
dict1[key]['Value2'] = dict1[key]['Value2'] + dict2[key]['Value5']
#print(dict1)
{'A': {'Item ID': 'A', 'Value1': 5, 'Value2': 8, 'Value3': 3, 'Value4': 4},
'B': {'Item ID': 'B', 'Value1': 2, 'Value2': 5, 'Value3': 8, 'Value4': 5},
'C': {'Item ID': 'C', 'Value1': 8, 'Value2': 6, 'Value3': 1, 'Value4': 7},
'D': {'Item ID': 'D', 'Value1': 11, 'Value2': 15, 'Value3': 2, 'Value4': 9},
'E': {'Item ID': 'E', 'Value1': 0, 'Value2': 7, 'Value3': 0, 'Value4': 4}}
#create dataframe
pd.DataFrame.from_dict(dict1,orient='index').reset_index(drop=True)
Item ID Value1 Value2 Value3 Value4
0 A 5 8 3 4
1 B 2 5 8 5
2 C 8 6 1 7
3 D 11 15 2 9
4 E 0 7 0 4