我有以下每日定价数据:
2017-06-01 15.00
2017-06-02 20.00
我想将它重新采样为每小时超过35小时的价格。因此,第一个24小时的每个样本的值为15.00,从24小时到35小时的价格将是20.00。
2017-06-01 00:00 15.00
2017-06-01 01:00 15.00
2017-06-01 02:00 15.00
…
2017-06-01 23:00 15.00
2017-06-02 00:00 20.00
2017-06-02 01:00 20.00
2017-06-02 02:00 20.00
…
2017-06-02 10:00 20.00
我尝试使用resample('3600S')。pad()但它不起作用。是否可以单独创建新数据范围并将其用作重采样功能的输入? resample()似乎没有在这里完成工作。
您可以按小时频率和重新索引创建自定义日期范围
df.index = pd.to_datetime(df.index)
rng=pd.date_range(start=df.index.min(), periods=35, freq='H')
df.reindex(rng).ffill()
val
2017-06-01 00:00:00 15.0
2017-06-01 01:00:00 15.0
2017-06-01 02:00:00 15.0
2017-06-01 03:00:00 15.0
2017-06-01 04:00:00 15.0
2017-06-01 05:00:00 15.0
2017-06-01 06:00:00 15.0
2017-06-01 07:00:00 15.0
2017-06-01 08:00:00 15.0
2017-06-01 09:00:00 15.0
2017-06-01 10:00:00 15.0
2017-06-01 11:00:00 15.0
2017-06-01 12:00:00 15.0
2017-06-01 13:00:00 15.0
2017-06-01 14:00:00 15.0
2017-06-01 15:00:00 15.0
2017-06-01 16:00:00 15.0
2017-06-01 17:00:00 15.0
2017-06-01 18:00:00 15.0
2017-06-01 19:00:00 15.0
2017-06-01 20:00:00 15.0
2017-06-01 21:00:00 15.0
2017-06-01 22:00:00 15.0
2017-06-01 23:00:00 15.0
2017-06-02 00:00:00 20.0
2017-06-02 01:00:00 20.0
2017-06-02 02:00:00 20.0
2017-06-02 03:00:00 20.0
2017-06-02 04:00:00 20.0
2017-06-02 05:00:00 20.0
2017-06-02 06:00:00 20.0
2017-06-02 07:00:00 20.0
2017-06-02 08:00:00 20.0
2017-06-02 09:00:00 20.0
2017-06-02 10:00:00 20.0
另一种方法是(a)resample without aggregation,(b)计算row-wise hourly difference然后(c)使用np.where
到conditionally set the value
column
样本数据
d = {'date':['2017-06-01','2017-06-02', '2017-06-03'], 'value':[15,20,30]}
df = pd.DataFrame.from_dict(d)
print(df)
date value
0 2017-06-01 15
1 2017-06-02 20
2 2017-06-03 30
码
from numpy import where, timedelta64
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').asfreq("H").iloc[:35,:]
# Get time difference in hours, relative to 1st row
df['hours'] = ((df.index - df.index[0])/timedelta64(1, 'h')).astype(int)
# Conditionally set 'value' column, using time difference
df['value'] = where(df['hours']<35, 15, 20)
print(df)
产量
value hours
date
2017-06-01 00:00:00 15 0
2017-06-01 01:00:00 15 1
2017-06-01 02:00:00 15 2
2017-06-01 03:00:00 15 3
2017-06-01 04:00:00 15 4
2017-06-01 05:00:00 15 5
2017-06-01 06:00:00 15 6
2017-06-01 07:00:00 15 7
2017-06-01 08:00:00 15 8
2017-06-01 09:00:00 15 9
2017-06-01 10:00:00 15 10
2017-06-01 11:00:00 15 11
2017-06-01 12:00:00 15 12
2017-06-01 13:00:00 15 13
2017-06-01 14:00:00 15 14
2017-06-01 15:00:00 15 15
2017-06-01 16:00:00 15 16
2017-06-01 17:00:00 15 17
2017-06-01 18:00:00 15 18
2017-06-01 19:00:00 15 19
2017-06-01 20:00:00 15 20
2017-06-01 21:00:00 15 21
2017-06-01 22:00:00 15 22
2017-06-01 23:00:00 15 23
2017-06-02 00:00:00 15 24
2017-06-02 01:00:00 15 25
2017-06-02 02:00:00 15 26
2017-06-02 03:00:00 15 27
2017-06-02 04:00:00 15 28
2017-06-02 05:00:00 15 29
2017-06-02 06:00:00 15 30
2017-06-02 07:00:00 15 31
2017-06-02 08:00:00 15 32
2017-06-02 09:00:00 15 33
2017-06-02 10:00:00 15 34
2017-06-02 11:00:00 20 35
编辑
代替
df = df.set_index('date').asfreq("H").iloc[:35,:]
你也可以用
df = df.set_index('date').asfreq("H")
df = df.loc[pd.date_range(start=df.index[0], periods=35, freq='H'),['value']]