我有这个数据帧
open high low close volume
TimeStamp
2017-12-22 13:15:00 12935.00 13200.00 12508.71 12514.91 244.728611
2017-12-22 13:30:00 12514.91 12999.99 12508.71 12666.34 150.457869
2017-12-22 13:45:00 12666.33 12899.97 12094.00 12094.00 198.680014
2017-12-22 14:00:00 12094.01 12354.99 11150.00 11150.00 256.812634
2017-12-22 14:15:00 11150.01 12510.00 10400.00 12276.33 262.217127
我想知道每行是否有15分钟的时间差异所以我建立了一个新列,其中第一列的移位
open high low close volume \
TimeStamp
2017-12-20 13:30:00 17503.98 17600.00 17100.57 17119.89 312.773644
2017-12-20 13:45:00 17119.89 17372.98 17049.00 17170.00 322.953671
2017-12-20 14:00:00 17170.00 17573.00 17170.00 17395.74 236.085829
2017-12-20 14:15:00 17395.74 17398.00 17200.01 17280.00 220.467382
2017-12-20 14:30:00 17280.00 17313.94 17150.00 17256.05 222.760598
new_time
TimeStamp
2017-12-20 13:30:00 2017-12-20 13:45:00
2017-12-20 13:45:00 2017-12-20 14:00:00
2017-12-20 14:00:00 2017-12-20 14:15:00
2017-12-20 14:15:00 2017-12-20 14:30:00
2017-12-20 14:30:00 2017-12-20 14:45:00
现在我想找到不遵守15分钟差异规则的每一行,所以我做了
dfh.loc[(dfh['new_time'].to_pydatetime()-dfh.index.to_pydatetime())>datetime.timedelta(0, 900)]
我收到这个错误,
Traceback (most recent call last):
File "<pyshell#252>", line 1, in <module>
dfh.loc[(dfh['new_time'].to_pydatetime()-dfh.index.to_pydatetime())>datetime.timedelta(0, 900)]
File "C:\Users\Araujo\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'to_pydatetime'
有没有办法做到这一点?
编辑:
Shift只是周期性地工作,有没有办法用非周期性的做法?
这可行:
import pandas as pd
import numpy as np
import datetime as dt
data = [
['2017-12-22 13:15:00', 12935.00, 13200.00, 12508.71, 12514.91, 244.728611],
['2017-12-22 13:30:00', 12514.91, 12999.99, 12508.71, 12666.34, 150.457869],
['2017-12-22 13:45:00', 12666.33, 12899.97, 12094.00, 12094.00, 198.680014],
['2017-12-22 14:00:00', 12094.01, 12354.99, 11150.00, 11150.00, 256.812634],
['2017-12-22 14:15:00', 11150.01, 12510.00, 10400.00, 12276.33, 262.217127]
]
df = pd.DataFrame(data, columns = ['Timestamp', 'open', 'high', 'low', 'close', 'volume'])
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['plus_15'] = df['Timestamp'].shift(1) + dt.timedelta(minutes = 15)
df['valid_time'] = np.where((df['Timestamp'] == df['plus_15']) | (df.index == 0), 1, 0)
print(df[['Timestamp', 'valid_time']])
#output
Timestamp valid_time
0 2017-12-22 13:15:00 1
1 2017-12-22 13:30:00 1
2 2017-12-22 13:45:00 1
3 2017-12-22 14:00:00 1
4 2017-12-22 14:15:00 1
因此,创建一个新列,加上15,查看上一个时间戳并添加15分钟。然后创建另一个列,即有效时间,它将时间戳列与正15列进行比较,并在它们相等时标记为1,否则为0。
我们可以这样做吗?
import pandas as pd
import numpy as np
data = '''\
TimeStamp open high low close volume
2017-12-22T13:15:00 12935.00 13200.00 12508.71 12514.91 244.728611
2017-12-22T13:30:00 12514.91 12999.99 12508.71 12666.34 150.457869
2017-12-22T13:45:00 12666.33 12899.97 12094.00 12094.00 198.680014
2017-12-22T14:00:00 12094.01 12354.99 11150.00 11150.00 256.812634
2017-12-22T14:15:00 11150.01 12510.00 10400.00 12276.33 262.217127'''
df = pd.read_csv(pd.compat.StringIO(data),
sep='\s+', parse_dates=['TimeStamp'], index_col=['TimeStamp'])
df['new_time'] = df.index[1:].tolist()+[np.NaN]
# df['new_time'] = np.roll(df.index, -1) # if last is not first+15min
# use boolean indexing to filter away unwanted rows
df[[(dt2-dt1)/np.timedelta64(1, 's') == 900
for dt1,dt2 in zip(df.index.values,df.new_time.values)]]