按周期滚动组总和

问题描述 投票:0回答:3

我有这个数据框:

lst=[['01012021','A',10],['01012021','B',20],['02012021','A',12],['02012021','B',23]]
df2=pd.DataFrame(lst,columns=['Date','FN','AuM'])

我想按日期和 FN 获取滚动总和。期望的结果如下所示:

lst=[['01012021','A',10,''],['01012021','B',20,''],['02012021','A',12,22],['02012021','B',23,33]]
df2=pd.DataFrame(lst,columns=['Date','FN','AuM','Roll2PeriodSum'])

你能帮我吗?

谢谢你

python pandas dataframe sum cumsum
3个回答
1
投票

如果连续的日期时间,不使用列

date
来计算每组的计数,则解决方案:

df2['Roll2PeriodSum'] = (df2.groupby('FN').AuM
                            .rolling(2)
                            .sum() 
                            .reset_index(level=0, drop=True))
print (df2)
       Date FN  AuM  Roll2PeriodSum
0  01012021  A   10             NaN
1  01012021  B   20             NaN
2  02012021  A   12            22.0
3  02012021  B   23            43.0

日期时间的解决方案,使用列

date
进行计数:

df2['Date'] = pd.to_datetime(df2['Date'], format='%d%m%Y')

df = (df2.join(df2.set_index('Date')
                  .groupby('FN').AuM
                  .rolling('2D')
                  .sum().rename('Roll2PeriodSum'), on=['FN','Date']))
print (df)
        Date FN  AuM  Roll2PeriodSum
0 2021-01-01  A   10            10.0
1 2021-01-01  B   20            20.0
2 2021-01-02  A   12            22.0
3 2021-01-02  B   23            43.0

df = (df2.join(df2.set_index('Date')
                  .groupby('FN').AuM
                  .rolling('2D', min_periods=2)
                  .sum()
                  .rename('Roll2PeriodSum'), on=['FN','Date']))
print (df)
        Date FN  AuM  Roll2PeriodSum
0 2021-01-01  A   10             NaN
1 2021-01-01  B   20             NaN
2 2021-01-02  A   12            22.0
3 2021-01-02  B   23            43.0

0
投票
import pandas as pd
import numpy as np

# Create the DataFrame
lst = [['01012021','A',10], ['01012021','B',20], ['02012021','A',12], ['02012021','B',23]]
df = pd.DataFrame(lst, columns=['Date', 'FN', 'AuM'])

# Convert 'Date' to datetime format
df['Date'] = pd.to_datetime(df['Date'], format='%d%m%Y')

# Sort the DataFrame by 'Date' and 'FN' to ensure the correct order
df = df.sort_values(by=['Date', 'FN'])
print(df)
"""
        Date FN  AuM
0 2021-01-01  A   10
1 2021-01-01  B   20
2 2021-01-02  A   12
3 2021-01-02  B   23
"""

# Convert columns to NumPy arrays for faster computation
values = df['AuM'].to_numpy()
dates = df['Date'].to_numpy()
fn = df['FN'].to_numpy()
# Create an array to store the rolling sums
empty_store_aum = np.full_like(values, '', dtype=object)
print(empty_store_aum)
cumsum = np.cumsum(values)
start_indices = np.arange(len(values)) -1 
rolling_sums = cumsum - np.where(start_indices >= 0 , cumsum[start_indices],0)
print('rolling_sums : ')
print(rolling_sums)#[10 20 12 23]
#only fill the rolling_sums where the start_index is valid (i.e. start_indices >= 1)
valid_indices = (start_indices >= 1 )

empty_store_aum[valid_indices] = rolling_sums[valid_indices]
df['Desired_Col'] = empty_store_aum
print(df)
"""
       Date FN  AuM   Desired_Col
0 2021-01-01  A   10            
1 2021-01-01  B   20            
2 2021-01-02  A   12          12
3 2021-01-02  B   23          23
"""

-1
投票

使用

groupby.rolling.sum

df2['Roll2PeriodSum'] = (
    df2.assign(Date=pd.to_datetime(df2['Date'], format='%d%m%Y'))
       .groupby('FN').rolling(2)['AuM'].sum().droplevel(0)
)
print(df2)

# Output
       Date FN  AuM  Roll2PeriodSum
0  01012021  A   10             NaN
1  01012021  B   20             NaN
2  02012021  A   12            22.0
3  02012021  B   23            43.0
© www.soinside.com 2019 - 2024. All rights reserved.