从另一个数据帧中减去一个Pandas Dataframe中的属性值

问题描述 投票:3回答:4

此问题包含3个单独的数据帧。 df1代表产品1,2,3的'Total',包含'value1','value2'df2代表产品1,2,3的'Customer1',包含'value1','value2'df3代表'Customer2'产品1,2,3,包含'value1','value2'

df2和df3基本上是df1的子集。

我想创建另一个数据帧,从df1中减去df2和df3并标记这个df4。我希望df4成为“市场”专栏中的“剩余客户”。

这就是我到目前为止所做的

import pandas as pd


d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3], 
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1



d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1, 
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2


d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1, 
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3

这产生以下结果..

Market  Product Code  Value1  Value2
0  Total             1      10       5
1  Total             2      20      15
2  Total             3      30      25
  Market  Product Code  Value1  Value2
0  Customer1             1       3       2
1  Customer1             2      14       4
2  Customer1             3      10       6
  Market  Product Code  Value1  Value2
0  Customer2             1       3       2
1  Customer2             2       3       6
2  Customer2             3       4      10

要创建df4,我尝试以下代码并获取错误'TypeError:不支持的操作数类型 - :'str'和'str''任何人都可以帮忙吗?

df4 = df1-(df2+df3)

print(df4)
python python-3.x pandas
4个回答
2
投票

这是一种方式:

cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
               .assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
               .sort_index(axis=1)

#                Market  Product Code  Value1  Value2
# 0  RemainingCustomers             1       4       1
# 1  RemainingCustomers             2       3       5
# 2  RemainingCustomers             3      16       9

说明

  • df1[cols].subtract(df2[cols].add(df3[cols]))仅对指定列执行计算。
  • assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})添加了结果数据框所需的额外列。
  • sort_index(axis=1)重新排序列以获得所需的输出。

3
投票

删除Market,将Product Code设置为索引,并对产品代码执行索引对齐算术。之后,只需重置索引并将Market插入结果中。

df1, df2, df3 = [
      df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]

df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

3
投票

不完全是OP所要求的,但在我看来,这可能是管理数据的更好方法。

df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])

formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df

Market       Customer1        Customer2         Total        RemainingCustomers       
                Value1 Value2    Value1 Value2 Value1 Value2             Value1 Value2
Product Code                                                                          
1                    3      2         3      2     10      5                  4      1
2                   14      4         3      6     20     15                  3      5
3                   10      6         4     10     30     25                 16      9

df['RemainingCustomers']

              Value1  Value2
Product Code                
1                  4       1
2                  3       5
3                 16       9

如果我们坚持要求的输出

df.stack(0).reset_index().query(
    'Market == "RemainingCustomers"').reindex(columns=df1.columns)

                Market  Product Code  Value1  Value2
2   RemainingCustomers             1       4       1
6   RemainingCustomers             2       3       5
10  RemainingCustomers             3      16       9

要么

df.stack(0).xs(
    'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

2
投票

也许我们可以使用select_dtypes

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9
© www.soinside.com 2019 - 2024. All rights reserved.