我尝试使用
dropna
方法在重置索引之前删除缺少“日期”值的行,然后我得到一个 KeyError:
df.set_index('Date', inplace=True)
df = df.between_time(TIME_BINANCE_OPEN, TIME_BINANCE_CLOSE)
df = df.dropna(subset=['Date'])
df.reset_index(inplace=True)
Traceback (most recent call last):
File "/Users/anon/stocks-prediction-Machine-learning-RealTime-TensorFlow/0_API_alpaca_historical.py", line 53, in <module>
df.dropna(subset=['Date'], inplace=True)
File "/Users/anon/stocks-prediction-Machine-learning-RealTime-TensorFlow/MLT/lib/python3.11/site-packages/pandas/core/frame.py", line 6421, in dropna
raise KeyError(np.array(subset)[check].tolist())
KeyError: ['Date']
当前代码:
for symbol in stocks_list:
print("Starting data fetching process Stock: ", symbol)
df = get_binance_bars(symbol, interval, START_DATE, END_DATE)
print("Data fetching process completed df.shape: ", df.shape)
print(df)
print(df.columns)
if df is not None:
df['Date'] = pd.to_datetime(df['Date'])
TIME_BINANCE_OPEN = "00:01:00"
TIME_BINANCE_CLOSE = "08:19:00"
# Perform time-based filtering using the "Date" column
df = df[(df['Date'].dt.time >= pd.to_datetime(TIME_BINANCE_OPEN).time()) &
(df['Date'].dt.time <= pd.to_datetime(TIME_BINANCE_CLOSE).time())]
df = df.dropna(subset=['Date'])
if not df.empty:
max_recent_date = df['Date'].max().strftime("%Y-%m-%d %H:%M:%S")
min_recent_date = df['Date'].min().strftime("%Y-%m-%d %H:%M:%S")
else:
min_recent_date = max_recent_date = None
directory = "d_price/RAW_binance/"
if not os.path.exists(directory):
os.makedirs(directory)
file_path = directory + "binance_" + symbol + '_' + interval + ".csv"
df.to_csv(file_path, sep="\t", index=None)
print("\tSTART: ", str(df['Date'].min()), " END: ", str(df['Date'].max()), " shape: ", df.shape, "\n")
追溯:
Starting data fetching process Stock: DASHUSDT
Data fetching process completed df.shape: (500, 6)
Date Open High Low Close Volume
0 2022-01-01 00:00:00 133.62 133.98 133.59 133.93 743.246
1 2022-01-01 00:01:00 133.93 134.79 133.85 134.65 835.018
2 2022-01-01 00:02:00 134.61 134.75 134.42 134.51 421.264
3 2022-01-01 00:03:00 134.56 134.63 134.37 134.49 209.346
4 2022-01-01 00:04:00 134.52 134.68 134.42 134.48 204.820
Index(['Date', 'Open', 'High', 'Low', 'Close', 'Volume'], dtype='object')
Traceback (most recent call last):
File "/Users/anon/stocks-prediction-Machine-learning-RealTime-TensorFlow/0_API_alpaca_historical.py", line 52, in <module>
max_recent_date = df['Date'].max().strftime("%Y-%m-%d %H:%M:%S")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "nattype.pyx", line 58, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support strftime
我发现,当您使用
df.reset_index(inplace=True)
重置索引时,如果“日期”列中存在任何缺失或 NaN 值,可能会导致空的 DataFrame。在重置索引之前,我通过使用“dropna”删除reset_index之前的NaN值来确保“Date”列中没有缺失值。但是我的解决方法不起作用。
此问题是您删除了
df = df.dropna(subset=['Date'])
中的所有行。因此,没有剩余的最小/最大日期可供使用。
您可能想在这里使用条件:
# ...
df = df.dropna(subset=['Date'])
if not df.empty:
max_recent_date = df['Date'].max().strftime("%Y-%m-%d %H:%M:%S")
min_recent_date = df['Date'].min().strftime("%Y-%m-%d %H:%M:%S")
else:
min_recent_date = max_recent_date = None
请注意,使用有效(非空)切片可以按预期工作:
if df is not None:
df['Date'] = pd.to_datetime(df['Date'])
TIME_BINANCE_OPEN = "00:01:00" # this will keep 2 rows
TIME_BINANCE_CLOSE = "00:02:00" #
# Perform time-based filtering using the "Date" column
df = df[(df['Date'].dt.time >= pd.to_datetime(TIME_BINANCE_OPEN).time()) &
(df['Date'].dt.time <= pd.to_datetime(TIME_BINANCE_CLOSE).time())]
df = df.dropna(subset=['Date'])
if not df.empty:
max_recent_date = df['Date'].max().strftime("%Y-%m-%d %H:%M:%S")
min_recent_date = df['Date'].min().strftime("%Y-%m-%d %H:%M:%S")
else:
min_recent_date = max_recent_date = None
print(min_recent_date)
print(max_recent_date)
输出:
2022-01-01 00:01:00
2022-01-01 00:02:00
或者,使用系列运算:
# ...
df = df.dropna(subset=['Date'])
min_recent_date, max_recent_date = df['Date'].agg(['min', 'max']).dt.strftime("%Y-%m-%d %H:%M:%S")