我有这个数据框:
lst=[['01012021','A',10],['01012021','B',20],['02012021','A',12],['02012021','B',23]]
df2=pd.DataFrame(lst,columns=['Date','FN','AuM'])
我想按日期和 FN 获取滚动总和。期望的结果如下所示:
lst=[['01012021','A',10,''],['01012021','B',20,''],['02012021','A',12,22],['02012021','B',23,33]]
df2=pd.DataFrame(lst,columns=['Date','FN','AuM','Roll2PeriodSum'])
你能帮我吗?
谢谢你
如果连续的日期时间,不使用列
date
来计算每组的计数,则解决方案:
df2['Roll2PeriodSum'] = (df2.groupby('FN').AuM
.rolling(2)
.sum()
.reset_index(level=0, drop=True))
print (df2)
Date FN AuM Roll2PeriodSum
0 01012021 A 10 NaN
1 01012021 B 20 NaN
2 02012021 A 12 22.0
3 02012021 B 23 43.0
日期时间的解决方案,使用列
date
进行计数:
df2['Date'] = pd.to_datetime(df2['Date'], format='%d%m%Y')
df = (df2.join(df2.set_index('Date')
.groupby('FN').AuM
.rolling('2D')
.sum().rename('Roll2PeriodSum'), on=['FN','Date']))
print (df)
Date FN AuM Roll2PeriodSum
0 2021-01-01 A 10 10.0
1 2021-01-01 B 20 20.0
2 2021-01-02 A 12 22.0
3 2021-01-02 B 23 43.0
df = (df2.join(df2.set_index('Date')
.groupby('FN').AuM
.rolling('2D', min_periods=2)
.sum()
.rename('Roll2PeriodSum'), on=['FN','Date']))
print (df)
Date FN AuM Roll2PeriodSum
0 2021-01-01 A 10 NaN
1 2021-01-01 B 20 NaN
2 2021-01-02 A 12 22.0
3 2021-01-02 B 23 43.0
import pandas as pd
import numpy as np
# Create the DataFrame
lst = [['01012021','A',10], ['01012021','B',20], ['02012021','A',12], ['02012021','B',23]]
df = pd.DataFrame(lst, columns=['Date', 'FN', 'AuM'])
# Convert 'Date' to datetime format
df['Date'] = pd.to_datetime(df['Date'], format='%d%m%Y')
# Sort the DataFrame by 'Date' and 'FN' to ensure the correct order
df = df.sort_values(by=['Date', 'FN'])
print(df)
"""
Date FN AuM
0 2021-01-01 A 10
1 2021-01-01 B 20
2 2021-01-02 A 12
3 2021-01-02 B 23
"""
# Convert columns to NumPy arrays for faster computation
values = df['AuM'].to_numpy()
dates = df['Date'].to_numpy()
fn = df['FN'].to_numpy()
# Create an array to store the rolling sums
empty_store_aum = np.full_like(values, '', dtype=object)
print(empty_store_aum)
cumsum = np.cumsum(values)
start_indices = np.arange(len(values)) -1
rolling_sums = cumsum - np.where(start_indices >= 0 , cumsum[start_indices],0)
print('rolling_sums : ')
print(rolling_sums)#[10 20 12 23]
#only fill the rolling_sums where the start_index is valid (i.e. start_indices >= 1)
valid_indices = (start_indices >= 1 )
empty_store_aum[valid_indices] = rolling_sums[valid_indices]
df['Desired_Col'] = empty_store_aum
print(df)
"""
Date FN AuM Desired_Col
0 2021-01-01 A 10
1 2021-01-01 B 20
2 2021-01-02 A 12 12
3 2021-01-02 B 23 23
"""
使用
groupby.rolling.sum
:
df2['Roll2PeriodSum'] = (
df2.assign(Date=pd.to_datetime(df2['Date'], format='%d%m%Y'))
.groupby('FN').rolling(2)['AuM'].sum().droplevel(0)
)
print(df2)
# Output
Date FN AuM Roll2PeriodSum
0 01012021 A 10 NaN
1 01012021 B 20 NaN
2 02012021 A 12 22.0
3 02012021 B 23 43.0