如何从列表中选择满足条件的行和列

问题描述 投票:2回答:1

假设我有一个熊猫数据框,看起来像:

df1 = pd.DataFrame({"Item ID":["A", "B", "C", "D", "E"], "Value1":[1, 2, 3, 4, 0], 
        "Value2":[4, 5, 1, 8, 7], "Value3":[3, 8, 1, 2, 0],"Value4":[4, 5, 7, 9, 4]})
print(df1)
        Item_ID  Value1  Value2  Value3  Value4
0             A       1       4       3       4
1             B       2       5       8       5
2             C       3       1       1       7
3             D       4       8       2       9
4             E       0       7       0       4

现在我有了第二个数据框,看起来像:

df2 = {"Item ID":["A", "C", "D"], "Value5":[4, 5, 7]}
print(df2)

     Item_ID  Value5
0          A       4
1          C       5
2          D       7

[我想做的是找到我的两个数据框之间项目ID匹配的位置,然后将“ Value5”列值添加到行的交集,并且仅将df1的Value1和Value2列相交。

我的输出应显示:

  • 4已添加到行A,“ Value1”和“ Value2”列
  • 5已添加到C行的“ Value1”和“ Value2”列]
  • 7已添加到D行的“ Value1”和“ Value2”列中]

            Item_ID  Value1  Value2  Value3  Value4
    0             A       5       8       3       4
    1             B       2       5       8       5
    2             C       8       6       1       7
    3             D       11     15       2       9
    4             E       0       7       0       4
    

当然,我的数据有数千行。我可以使用for循环来完成此操作,但这花费了太长时间。我希望能够以某种方式将其向量化。有什么想法吗?

python-3.x pandas dataframe slice
1个回答
0
投票

尝试一下:

 #merge the dataframes
 (df1.merge(df2,on='Item ID',how='outer')
   #add the columns
  .assign(Value1 = lambda x : x.Value1.add(x.Value5,fill_value=0),
          Value2 = lambda x : x.Value2.add(x.Value5, fill_value=0)
         )
  #remove irrelevant column
 .drop('Value5',axis=1)
 #not really needed
 #if you are keen on having only integers
 .astype({'Value1':'Int64','Value2':'Int64'})
 )

    Item ID Value1  Value2  Value3  Value4
0       A      5       8        3     4
1       B      2       5        8     5
2       C      8       6        1     7
3       D      11      15       2     9
4       E      0       7        0     4

如@mmm的评论所建议,这是使用Python词典的替代解决方案:

#create dictionaries
dict1 = (df1
         #create temporary column
         #and set as index
         #the index allows us merge the dictionaries
         #while still keeping Item ID column
         .assign(temp=df1['Item ID'])
         .set_index('temp')
         .to_dict('index')
        )

dict2 = (df2
         .assign(temp=df2['Item ID'])
         .set_index('temp')
         .to_dict('index')
        )

#check for keys that are in both dict1 and 2 i.e intersection

#print(dict1.keys() & dict2.keys())
#{'A', 'C', 'D'}

#loop through dict 1 and add values from dict2
for key in dict1.keys() & dict2.keys():
    dict1[key]['Value1'] = dict1[key]['Value1'] + dict2[key]['Value5']
    dict1[key]['Value2'] = dict1[key]['Value2'] + dict2[key]['Value5']

#print(dict1)
{'A': {'Item ID': 'A', 'Value1': 5, 'Value2': 8, 'Value3': 3, 'Value4': 4},
 'B': {'Item ID': 'B', 'Value1': 2, 'Value2': 5, 'Value3': 8, 'Value4': 5},
 'C': {'Item ID': 'C', 'Value1': 8, 'Value2': 6, 'Value3': 1, 'Value4': 7},
 'D': {'Item ID': 'D', 'Value1': 11, 'Value2': 15, 'Value3': 2, 'Value4': 9},
 'E': {'Item ID': 'E', 'Value1': 0, 'Value2': 7, 'Value3': 0, 'Value4': 4}}

#create dataframe
pd.DataFrame.from_dict(dict1,orient='index').reset_index(drop=True)

   Item ID  Value1  Value2  Value3  Value4
0     A        5       8       3       4
1     B        2       5       8       5
2     C        8       6       1       7
3     D       11       15      2       9
4     E        0       7       0       4
© www.soinside.com 2019 - 2024. All rights reserved.