Python Pandas中的分析滑动Windows函数

问题描述 投票:0回答:1

有表:

list_1= [['2016-01-01',1,'King', 1000],    
        ['2016-01-02',1,'King', -200],    
        ['2016-01-03',1,'King', 100],    
        ['2016-01-04',1,'King',-400],    
        ['2016-01-05',1,'King', 200],    
        ['2016-01-06',1,'King',  -200],    
        ['2016-01-01',2,'Smith',  1000],    
        ['2016-01-02',2,'Smith',  -300],    
        ['2016-01-03',2,'Smith',  -600],    
        ['2016-01-04',2,'Smith',  100],    
        ['2016-01-05',2,'Smith',  -100]]
labels=['a_date','c_id','c_name','c_action']
df=pd.DataFrame(list_1,columns=labels)
df

OUT:

    a_date       c_id   c_name  c_action
0   2016-01-01     1    King    1000
1   2016-01-02     1    King    -200
2   2016-01-03     1    King    100
3   2016-01-04     1    King    -400
4   2016-01-05     1    King    200
5   2016-01-06     1    King    -200
6   2016-01-01     2    Smith   1000
7   2016-01-02     2    Smith   -300
8   2016-01-03     2    Smith   -600
9   2016-01-04     2    Smith   100
10  2016-01-05     2    Smith   -100

需要获取表:

a_date      c_id    c_name  c_amount    Balance
2016-01-01     1    King    1000        1000
2016-01-02     1    King    -200        800
2016-01-03     1    King    100         900
2016-01-04     1    King    -400        500
2016-01-05     1    King    200         700
2016-01-06     1    King    -200        500
2016-01-01     2    Smith   1000        1000
2016-01-02     2    Smith   -300        700
2016-01-03     2    Smith   -600        100
2016-01-04     2    Smith   100         200
2016-01-05     2    Smith   -100        100

因此,我需要为每个客户的每次操作后在“余额”列中添加累计金额。这等效于SQL查询:

SELECT *,
        SUM(c_amount) OVER (PARTITION BY c_id ORDER BY a_date) AS 'Balance'
FROM account_actions

对于两个客户来说,解决方案并不困难,可以将表除以c_id,然后进行汇总和合并。但这应该是针对10000个客户的动态解决方案...

python sql pandas analytics window-functions
1个回答
1
投票

正如@Vaishali所说,这是groupbycumsum。您可能想要执行sort_values以确保数据已按顺序排序,尽管看起来已经如此:

# sort by `c_id` and `a_date`
df = df.sort_values(['c_id','a_date'])

df['balance'] = df.groupby('c_id')['c_action'].cumsum()

输出:

        a_date  c_id c_name  c_action  balance
0   2016-01-01     1   King      1000     1000
1   2016-01-02     1   King      -200      800
2   2016-01-03     1   King       100      900
3   2016-01-04     1   King      -400      500
4   2016-01-05     1   King       200      700
5   2016-01-06     1   King      -200      500
6   2016-01-01     2  Smith      1000     1000
7   2016-01-02     2  Smith      -300      700
8   2016-01-03     2  Smith      -600      100
9   2016-01-04     2  Smith       100      200
10  2016-01-05     2  Smith      -100      100
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.