我正在尝试通过修改此 Github 存储库中给出的示例将自定义数据集加载到 PyTorch Forecasting。然而,我坚持实例化
TimeSeriesDataSet
。相关部分代码如下:
import numpy as np
import pandas as pd
df = pd.read_csv("data.csv")
print(df.shape) # (300, 8)
# Divide the timestamps so that they are incremented by one each row.
df["unix"] = df["unix"].apply(lambda n: int(n / 86400))
# Set "unix" as the index
#df = df.set_index("unix")
# Add *integer* indices.
df["index"] = np.arange(300)
df = df.set_index("index")
# Add group column.
df["group"] = np.repeat(np.arange(30), 10)
from pytorch_forecasting import TimeSeriesDataSet
target = ["foo", "bar", "baz"]
# Create the dataset from the pandas dataframe
dataset = TimeSeriesDataSet(
df,
group_ids = ["group"],
target = target,
time_idx = "unix",
min_encoder_length = 50,
max_encoder_length = 50,
min_prediction_length = 20,
max_prediction_length = 20,
time_varying_unknown_reals = target,
allow_missing_timesteps = True
)
以及错误消息和回溯:
/home/user/.virtualenvs/torch/lib/python3.9/site-packages/pytorch_forecasting/data/timeseries.py:1241: UserWarning: Min encoder length and/or min_prediction_idx and/or min prediction length and/or lags are too large for 30 series/groups which therefore are not present in the dataset index. This means no predictions can be made for those series. First 10 removed groups: [{'__group_id__group': 0}, {'__group_id__group': 1}, {'__group_id__group': 2}, {'__group_id__group': 3}, {'__group_id__group': 4}, {'__group_id__group': 5}, {'__group_id__group': 6}, {'__group_id__group': 7}, {'__group_id__group': 8}, {'__group_id__group': 9}]
warnings.warn(
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
/tmp/ipykernel_822/3402560775.py in <module>
4
5 # create the dataset from the pandas dataframe
----> 6 dataset = TimeSeriesDataSet(
7 df,
8 group_ids = ["group"],
~/.virtualenvs/torch/lib/python3.9/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, constant_fill_strategy, allow_missing_timesteps, lags, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
437
438 # create index
--> 439 self.index = self._construct_index(data, predict_mode=predict_mode)
440
441 # convert to torch tensor for high performance data loading later
~/.virtualenvs/torch/lib/python3.9/site-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
1247 UserWarning,
1248 )
-> 1249 assert (
1250 len(df_index) > 0
1251 ), "filters should not remove entries all entries - check encoder/decoder lengths and lags"
AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags
我尝试调整初始化参数但没有成功。文件
timeseries.py
可以在同一个Github存储库中找到,这里。
据我所知,我猜这可能会发生,因为并非所有时间序列都有最小长度(
min_prediction_length
+ min_encoder_length
)。
就您而言,每个时间序列的长度至少应为 70。
我也遇到同样的问题,请问你解决了吗?