如何从列表中选择满足条件的行和列

Question

假设我有一个熊猫数据框，看起来像：

df1 = pd.DataFrame({"Item ID":["A", "B", "C", "D", "E"], "Value1":[1, 2, 3, 4, 0], 
        "Value2":[4, 5, 1, 8, 7], "Value3":[3, 8, 1, 2, 0],"Value4":[4, 5, 7, 9, 4]})
print(df1)
        Item_ID  Value1  Value2  Value3  Value4
0             A       1       4       3       4
1             B       2       5       8       5
2             C       3       1       1       7
3             D       4       8       2       9
4             E       0       7       0       4

现在我有了第二个数据框，看起来像：

df2 = {"Item ID":["A", "C", "D"], "Value5":[4, 5, 7]}
print(df2)

     Item_ID  Value5
0          A       4
1          C       5
2          D       7

[我想做的是找到我的两个数据框之间项目ID匹配的位置，然后将“ Value5”列值添加到行的交集，并且仅将df1的Value1和Value2列相交。

我的输出应显示：

4已添加到行A，“ Value1”和“ Value2”列
5已添加到C行的“ Value1”和“ Value2”列]

7已添加到D行的“ Value1”和“ Value2”列中]

        Item_ID  Value1  Value2  Value3  Value4
0             A       5       8       3       4
1             B       2       5       8       5
2             C       8       6       1       7
3             D       11     15       2       9
4             E       0       7       0       4

当然，我的数据有数千行。我可以使用for循环来完成此操作，但这花费了太长时间。我希望能够以某种方式将其向量化。有什么想法吗？

Answer 1

尝试一下：

 #merge the dataframes
 (df1.merge(df2,on='Item ID',how='outer')
   #add the columns
  .assign(Value1 = lambda x : x.Value1.add(x.Value5,fill_value=0),
          Value2 = lambda x : x.Value2.add(x.Value5, fill_value=0)
         )
  #remove irrelevant column
 .drop('Value5',axis=1)
 #not really needed
 #if you are keen on having only integers
 .astype({'Value1':'Int64','Value2':'Int64'})
 )

    Item ID Value1  Value2  Value3  Value4
0       A      5       8        3     4
1       B      2       5        8     5
2       C      8       6        1     7
3       D      11      15       2     9
4       E      0       7        0     4

如@mmm的评论所建议，这是使用Python词典的替代解决方案：

#create dictionaries
dict1 = (df1
         #create temporary column
         #and set as index
         #the index allows us merge the dictionaries
         #while still keeping Item ID column
         .assign(temp=df1['Item ID'])
         .set_index('temp')
         .to_dict('index')
        )

dict2 = (df2
         .assign(temp=df2['Item ID'])
         .set_index('temp')
         .to_dict('index')
        )

#check for keys that are in both dict1 and 2 i.e intersection

#print(dict1.keys() & dict2.keys())
#{'A', 'C', 'D'}

#loop through dict 1 and add values from dict2
for key in dict1.keys() & dict2.keys():
    dict1[key]['Value1'] = dict1[key]['Value1'] + dict2[key]['Value5']
    dict1[key]['Value2'] = dict1[key]['Value2'] + dict2[key]['Value5']

#print(dict1)
{'A': {'Item ID': 'A', 'Value1': 5, 'Value2': 8, 'Value3': 3, 'Value4': 4},
 'B': {'Item ID': 'B', 'Value1': 2, 'Value2': 5, 'Value3': 8, 'Value4': 5},
 'C': {'Item ID': 'C', 'Value1': 8, 'Value2': 6, 'Value3': 1, 'Value4': 7},
 'D': {'Item ID': 'D', 'Value1': 11, 'Value2': 15, 'Value3': 2, 'Value4': 9},
 'E': {'Item ID': 'E', 'Value1': 0, 'Value2': 7, 'Value3': 0, 'Value4': 4}}

#create dataframe
pd.DataFrame.from_dict(dict1,orient='index').reset_index(drop=True)

   Item ID  Value1  Value2  Value3  Value4
0     A        5       8       3       4
1     B        2       5       8       5
2     C        8       6       1       7
3     D       11       15      2       9
4     E        0       7       0       4

如何从列表中选择满足条件的行和列

问题描述投票：2回答：1

1个回答

最新问题

如何从列表中选择满足条件的行和列

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1