ValueError:NaTType 不支持 strftime

问题描述 投票:0回答:1

我尝试使用

dropna
方法在重置索引之前删除缺少“日期”值的行,然后我得到一个 KeyError:

df.set_index('Date', inplace=True)
df = df.between_time(TIME_BINANCE_OPEN, TIME_BINANCE_CLOSE)
df = df.dropna(subset=['Date'])
df.reset_index(inplace=True)
Traceback (most recent call last):
  File "/Users/anon/stocks-prediction-Machine-learning-RealTime-TensorFlow/0_API_alpaca_historical.py", line 53, in <module>
    df.dropna(subset=['Date'], inplace=True)
  File "/Users/anon/stocks-prediction-Machine-learning-RealTime-TensorFlow/MLT/lib/python3.11/site-packages/pandas/core/frame.py", line 6421, in dropna
    raise KeyError(np.array(subset)[check].tolist())
KeyError: ['Date']

当前代码:

for symbol in stocks_list:
    print("Starting data fetching process Stock: ", symbol)
    df = get_binance_bars(symbol, interval, START_DATE, END_DATE)
    print("Data fetching process completed df.shape: ", df.shape)

    print(df)
    print(df.columns)

    if df is not None:
        df['Date'] = pd.to_datetime(df['Date'])
        TIME_BINANCE_OPEN = "00:01:00"
        TIME_BINANCE_CLOSE = "08:19:00"
        
        # Perform time-based filtering using the "Date" column
        df = df[(df['Date'].dt.time >= pd.to_datetime(TIME_BINANCE_OPEN).time()) &
                (df['Date'].dt.time <= pd.to_datetime(TIME_BINANCE_CLOSE).time())]
        df = df.dropna(subset=['Date'])
        if not df.empty:
            max_recent_date = df['Date'].max().strftime("%Y-%m-%d %H:%M:%S")
            min_recent_date = df['Date'].min().strftime("%Y-%m-%d %H:%M:%S")
        else:
            min_recent_date = max_recent_date = None
        directory = "d_price/RAW_binance/"
        if not os.path.exists(directory):
            os.makedirs(directory)

        file_path = directory + "binance_" + symbol + '_' + interval + ".csv"
        df.to_csv(file_path, sep="\t", index=None)

        print("\tSTART: ", str(df['Date'].min()),  "  END: ", str(df['Date'].max()), " shape: ", df.shape, "\n")
       

追溯:

Starting data fetching process Stock:  DASHUSDT
Data fetching process completed df.shape:  (500, 6)
                 Date    Open    High     Low   Close   Volume
0 2022-01-01 00:00:00  133.62  133.98  133.59  133.93  743.246
1 2022-01-01 00:01:00  133.93  134.79  133.85  134.65  835.018
2 2022-01-01 00:02:00  134.61  134.75  134.42  134.51  421.264
3 2022-01-01 00:03:00  134.56  134.63  134.37  134.49  209.346
4 2022-01-01 00:04:00  134.52  134.68  134.42  134.48  204.820
Index(['Date', 'Open', 'High', 'Low', 'Close', 'Volume'], dtype='object')
Traceback (most recent call last):
  File "/Users/anon/stocks-prediction-Machine-learning-RealTime-TensorFlow/0_API_alpaca_historical.py", line 52, in <module>
    max_recent_date = df['Date'].max().strftime("%Y-%m-%d %H:%M:%S")
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "nattype.pyx", line 58, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support strftime

我发现,当您使用

df.reset_index(inplace=True)
重置索引时,如果“日期”列中存在任何缺失或 NaN 值,可能会导致空的 DataFrame。在重置索引之前,我通过使用“dropna”删除reset_index之前的NaN值来确保“Date”列中没有缺失值。但是我的解决方法不起作用。

pandas dataframe
1个回答
1
投票

此问题是您删除了

df = df.dropna(subset=['Date'])
中的所有行。因此,没有剩余的最小/最大日期可供使用。

您可能想在这里使用条件:

# ...
        df = df.dropna(subset=['Date'])
        if not df.empty:
            max_recent_date = df['Date'].max().strftime("%Y-%m-%d %H:%M:%S")
            min_recent_date = df['Date'].min().strftime("%Y-%m-%d %H:%M:%S")
        else:
            min_recent_date = max_recent_date = None

请注意,使用有效(非空)切片可以按预期工作:

if df is not None:
        df['Date'] = pd.to_datetime(df['Date'])
        TIME_BINANCE_OPEN = "00:01:00"    # this will keep 2 rows
        TIME_BINANCE_CLOSE = "00:02:00"   #
        
        # Perform time-based filtering using the "Date" column
        df = df[(df['Date'].dt.time >= pd.to_datetime(TIME_BINANCE_OPEN).time()) &
                (df['Date'].dt.time <= pd.to_datetime(TIME_BINANCE_CLOSE).time())]
        df = df.dropna(subset=['Date'])
        if not df.empty:
            max_recent_date = df['Date'].max().strftime("%Y-%m-%d %H:%M:%S")
            min_recent_date = df['Date'].min().strftime("%Y-%m-%d %H:%M:%S")
        else:
            min_recent_date = max_recent_date = None
            
print(min_recent_date)
print(max_recent_date)

输出:

2022-01-01 00:01:00
2022-01-01 00:02:00

或者,使用系列运算:

# ...
        df = df.dropna(subset=['Date'])
        min_recent_date, max_recent_date = df['Date'].agg(['min', 'max']).dt.strftime("%Y-%m-%d %H:%M:%S")
© www.soinside.com 2019 - 2024. All rights reserved.