我有一个数据框,想对其重新采样并在average_period = 14天内进行聚合。 (对我来说)困难的部分是我希望从今天开始我的聚合箱,所以[今天,今天-14],[今天-14,今天-28],[今天-28,今天-42]等等。今天的日期总是在 df 中,但之前的日期不一定总是存在。
如果我执行以下操作,我会得到最大日期 2024-01-23,但应该是 2024-01-13。我怎样才能实现这个目标?
df = pd.DataFrame({'date_time':['2023-09-19', '2023-09-29', '2023-11-10', '2024-01-13'],
'col1':['0.100', '0.100', '0.100', '0.100'],
'col2':['r', 'r', 'r', 'r'],
'tot':[900, 800, 300, 400],
'hit':[24, 56, 26, 40],
'percent':[33, 23, 33, 31]})
df = df.assign(date_time=pd.to_datetime(df.date_time))
average_period = 14
(df
.set_index('date_time')
.groupby(['col1', 'col2']).resample(f'{average_period}D',
closed='right',
label='right').agg({'hit':'sum',
'tot':'sum',
'percent':'mean'})
.reset_index())
Try this,
import pandas as pd
from datetime import datetime
df = pd.DataFrame({
'date_time': ['2023-09-19', '2023-09-29', '2023-11-10', '2024-01-13'],
'col1': ['0.100', '0.100', '0.100', '0.100'],
'col2': ['r', 'r', 'r', 'r'],
'tot': [900, 800, 300, 400],
'hit': [24, 56, 26, 40],
'percent': [33, 23, 33, 31]
})
# date_time to datetime object
df['date_time'] = pd.to_datetime(df['date_time'])
# Define today's date
today = datetime(2024, 1, 13)
# The average period
average_period = 14
# resampling
result = (df
.set_index('date_time')
.groupby(['col1', 'col2'])
.resample(f'{average_period}D',
closed='right',
label='right',
origin=today) # Set the origin to today's date
.agg({'hit': 'sum',
'tot': 'sum',
'percent': 'mean'})
.reset_index())
result