有表:
list_1= [['2016-01-01',1,'King', 1000],
['2016-01-02',1,'King', -200],
['2016-01-03',1,'King', 100],
['2016-01-04',1,'King',-400],
['2016-01-05',1,'King', 200],
['2016-01-06',1,'King', -200],
['2016-01-01',2,'Smith', 1000],
['2016-01-02',2,'Smith', -300],
['2016-01-03',2,'Smith', -600],
['2016-01-04',2,'Smith', 100],
['2016-01-05',2,'Smith', -100]]
labels=['a_date','c_id','c_name','c_action']
df=pd.DataFrame(list_1,columns=labels)
df
OUT:
a_date c_id c_name c_action
0 2016-01-01 1 King 1000
1 2016-01-02 1 King -200
2 2016-01-03 1 King 100
3 2016-01-04 1 King -400
4 2016-01-05 1 King 200
5 2016-01-06 1 King -200
6 2016-01-01 2 Smith 1000
7 2016-01-02 2 Smith -300
8 2016-01-03 2 Smith -600
9 2016-01-04 2 Smith 100
10 2016-01-05 2 Smith -100
需要获取表:
a_date c_id c_name c_amount Balance
2016-01-01 1 King 1000 1000
2016-01-02 1 King -200 800
2016-01-03 1 King 100 900
2016-01-04 1 King -400 500
2016-01-05 1 King 200 700
2016-01-06 1 King -200 500
2016-01-01 2 Smith 1000 1000
2016-01-02 2 Smith -300 700
2016-01-03 2 Smith -600 100
2016-01-04 2 Smith 100 200
2016-01-05 2 Smith -100 100
因此,我需要为每个客户的每次操作后在“余额”列中添加累计金额。这等效于SQL查询:
SELECT *,
SUM(c_amount) OVER (PARTITION BY c_id ORDER BY a_date) AS 'Balance'
FROM account_actions
对于两个客户来说,解决方案并不困难,可以将表除以c_id,然后进行汇总和合并。但这应该是针对10000个客户的动态解决方案...
正如@Vaishali所说,这是groupby
和cumsum
。您可能想要执行sort_values
以确保数据已按顺序排序,尽管看起来已经如此:
# sort by `c_id` and `a_date`
df = df.sort_values(['c_id','a_date'])
df['balance'] = df.groupby('c_id')['c_action'].cumsum()
输出:
a_date c_id c_name c_action balance
0 2016-01-01 1 King 1000 1000
1 2016-01-02 1 King -200 800
2 2016-01-03 1 King 100 900
3 2016-01-04 1 King -400 500
4 2016-01-05 1 King 200 700
5 2016-01-06 1 King -200 500
6 2016-01-01 2 Smith 1000 1000
7 2016-01-02 2 Smith -300 700
8 2016-01-03 2 Smith -600 100
9 2016-01-04 2 Smith 100 200
10 2016-01-05 2 Smith -100 100