import pandas as pd
order_details_id order_id order_date order_time item_id
0 1 1 1/1/23 11:38:36 AM 109.0
1 2 2 1/1/23 11:57:40 AM 108.0
2 3 2 1/1/23 11:57:40 AM 124.0
3 4 2 1/1/23 11:57:40 AM 117.0
4 5 2 1/1/23 11:57:40 AM 129.0
df['order_date] = pd.to_datetime(df['order_date])
print(df)
order_details_id order_id order_date order_time item_id
0 1 1 2023-01-01 11:38:36 AM 109
1 2 2 2023-01-01 11:57:40 AM 108
2 3 2 2023-01-01 11:57:40 AM 124
3 4 2 2023-01-01 11:57:40 AM 117
4 5 2 2023-01-01 11:57:40 AM 129
df['order_time] = pd.to_datetime(df['order_time])
print(df)
order_details_id order_id order_date order_time item_id
0 1 1 2023-01-01 2023-12-29 11:38:36 109
1 2 2 2023-01-01 2023-12-29 11:57:40 108
2 3 2 2023-01-01 2023-12-29 11:57:40 124
3 4 2 2023-01-01 2023-12-29 11:57:40 117
4 5 2 2023-01-01 2023-12-29 11:57:40 129
我知道在
datetime
format=%y/%m/%d
中是强制性的,问题出在 order_time
列中,您会注意到日期从 2023-01-01
到 2023-12-29
的更改
您似乎期望小时列也包含日期信息,而实际上这两列都是计算确切日期和时间所必需的。
在日期时间转换之前组装它们
df['order_datetime'] = pd.to_datetime(df['order_date'].str.cat(df['order_time'], sep=' '))
输出:
order_details_id order_id order_date order_time item_id order_datetime
0 1 1 1/1/23 11:38:36 AM 109.0 2023-01-01 11:38:36
1 2 2 1/1/23 11:57:40 AM 108.0 2023-01-01 11:57:40
2 3 2 1/1/23 11:57:40 AM 124.0 2023-01-01 11:57:40
3 4 2 1/1/23 11:57:40 AM 117.0 2023-01-01 11:57:40
4 5 2 1/1/23 11:57:40 AM 129.0 2023-01-01 11:57:40
数据类型:
df.dtypes
order_details_id int64
order_id int64
order_date object
order_time object
item_id float64
order_datetime datetime64[ns]
dtype: object
参考:https://stackoverflow.com/a/19378497/12846804
或者, 您可能错误地读取了初始数据帧。将
order_date
和 order_time
读作单列,如下所示:
order_details_id order_id order_date_time item_id
0 1 1 1/1/23 11:38:36 AM 109.0
1 2 2 1/1/23 11:57:40 AM 108.0
2 3 2 1/1/23 11:57:40 AM 124.0
3 4 2 1/1/23 11:57:40 AM 117.0
4 5 2 1/1/23 11:57:40 AM 129.0
您可以使用此生成器获得:
df = pd.DataFrame({'order_details_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'order_id': {0: 1, 1: 2, 2: 2, 3: 2, 4: 2},
'order_date_time': {0: '1/1/23 11:38:36 AM',
1: '1/1/23 11:57:40 AM',
2: '1/1/23 11:57:40 AM',
3: '1/1/23 11:57:40 AM',
4: '1/1/23 11:57:40 AM'},
'item_id': {0: 109.0, 1: 108.0, 2: 124.0, 3: 117.0, 4: 129.0}}
然后你的生产线就可以工作了:
df['order_dt'] = pd.to_datetime(df['order_date_time'])
order_details_id order_id order_date_time item_id order_dt
0 1 1 1/1/23 11:38:36 AM 109.0 2023-01-01 11:38:36
1 2 2 1/1/23 11:57:40 AM 108.0 2023-01-01 11:57:40
2 3 2 1/1/23 11:57:40 AM 124.0 2023-01-01 11:57:40
3 4 2 1/1/23 11:57:40 AM 117.0 2023-01-01 11:57:40
4 5 2 1/1/23 11:57:40 AM 129.0 2023-01-01 11:57:40
那么,您确定
order_date
和 order_time
是两个独立的列吗?