从今天向后重新采样 pandas 数据框

问题描述 投票:0回答:1

我有一个数据框,想对其重新采样并在average_period = 14天内进行聚合。 (对我来说)困难的部分是我希望从今天开始我的聚合箱,所以[今天,今天-14],[今天-14,今天-28],[今天-28,今天-42]等等。今天的日期总是在 df 中,但之前的日期不一定总是存在。

如果我执行以下操作,我会得到最大日期 2024-01-23,但应该是 2024-01-13。我怎样才能实现这个目标?

df = pd.DataFrame({'date_time':['2023-09-19', '2023-09-29', '2023-11-10', '2024-01-13'],
'col1':['0.100', '0.100', '0.100', '0.100'],
'col2':['r', 'r', 'r', 'r'],
'tot':[900, 800, 300, 400],
'hit':[24, 56, 26, 40],
'percent':[33, 23, 33, 31]})

df = df.assign(date_time=pd.to_datetime(df.date_time))

average_period = 14
(df
    .set_index('date_time')
    .groupby(['col1', 'col2']).resample(f'{average_period}D', 
                                              closed='right', 
                                              label='right').agg({'hit':'sum', 
                                                                  'tot':'sum', 
                                                                  'percent':'mean'})
    .reset_index())
python pandas dataframe aggregate pandas-resample
1个回答
0
投票
Try this, 

import pandas as pd
from datetime import datetime


df = pd.DataFrame({
    'date_time': ['2023-09-19', '2023-09-29', '2023-11-10', '2024-01-13'],
    'col1': ['0.100', '0.100', '0.100', '0.100'],
    'col2': ['r', 'r', 'r', 'r'],
    'tot': [900, 800, 300, 400],
    'hit': [24, 56, 26, 40],
    'percent': [33, 23, 33, 31]
})

# date_time to datetime object
df['date_time'] = pd.to_datetime(df['date_time'])

# Define today's date
today = datetime(2024, 1, 13)

# The average period
average_period = 14

# resampling
result = (df
          .set_index('date_time')
          .groupby(['col1', 'col2'])
          .resample(f'{average_period}D', 
                    closed='right', 
                    label='right',
                    origin=today)  # Set the origin to today's date
          .agg({'hit': 'sum', 
                'tot': 'sum', 
                'percent': 'mean'})
          .reset_index())

result
© www.soinside.com 2019 - 2024. All rights reserved.