我有太多日期时间列,我需要将其更改为适合机器学习的格式?
2003-01-09
2022-10-12 23:03:34
矢量或其他东西
0.0145132 0.548542
有什么建议吗? 我需要将此 pandas 数据框列用于 SVM、K 最近邻或逻辑回归模型
它看起来像是参考开始 (=0) 和结束 (=1) 日期之间的线性缩放。
假设:
df = pd.DataFrame({'date': ['2003-01-09', '2022-10-12 23:03:34']})
您可以使用:
s = pd.to_datetime(df['date'], format='mixed')
start = pd.Timestamp('2002-06-26 20:47:03.860764027')
end = pd.Timestamp('2039-06-27 01:04:52.021882057')
df['encoded'] = (s-start)/(end-start)
输出:
date encoded
0 2003-01-09 0.014513
1 2022-10-12 23:03:34 0.548542
要根据示例计算参考日期:
x1 = 0.0145132
x2 = 0.548542
y1 = pd.Timestamp('2003-01-09')
y2 = pd.Timestamp('2022-10-12 23:03:34')
a = (y2-y1).total_seconds()/(x2-x1)
b = (y2-pd.Timestamp(0)).total_seconds() - a*x2
start = pd.Timestamp(0)+pd.Timedelta(b, unit='s')
# 2002-06-26 20:47:03.860764027
end = pd.Timestamp(0)+pd.Timedelta(a + b, unit='s')
# 2039-06-27 01:04:52.021882057