我需要使用MuliIndex的第一个级别。但是在对Dataframe进行分片后,我得到的index.levels[0]和分片前一样。
[] df['2019-04-26 01:00:00':'2020-01-01 22:00:00'].index.levels[0]
[Out]
DatetimeIndex(['2019-01-01 01:00:00', '2019-01-01 04:00:00',
'2019-01-01 07:00:00', '2019-01-01 10:00:00',
'2019-01-01 13:00:00', '2019-01-01 16:00:00',
'2019-01-01 19:00:00', '2019-01-01 22:00:00',
'2019-01-02 01:00:00', '2019-01-02 04:00:00',
...
'2020-02-16 22:00:00', '2020-02-17 01:00:00',
'2020-02-17 04:00:00', '2020-02-17 07:00:00',
'2020-02-17 10:00:00', '2020-02-17 13:00:00',
'2020-02-17 16:00:00', '2020-02-17 19:00:00',
'2020-02-17 22:00:00', '2020-02-18 01:00:00'],
dtype='datetime64[ns]', name='Date', length=3307, freq=None)
完整的索引正确地切开了,但我只想要第一层。这是Pandas-1.0.3的bug还是功能?如何才能得到我想要的东西?这个Multiindex已经排序了,我试着给切片后的Dataframe分配另一个变量,比如说 df1 = df['2019-04-26 01:00:00':'2020-01-01 22:00:00']
和我得到了同样的 df1.index.levels[0]
新增:排序的MultiIndex是。
('2019-01-01 01:00:00', 'Зеленга'),
('2019-01-01 04:00:00', 'Астрахань'),
('2019-01-01 04:00:00', 'Зеленга'),
('2019-01-01 07:00:00', 'Астрахань'),
('2019-01-01 07:00:00', 'Зеленга'),
('2019-01-01 10:00:00', 'Астрахань'),
('2019-01-01 10:00:00', 'Зеленга'),
('2019-01-01 13:00:00', 'Астрахань'),
('2019-01-01 13:00:00', 'Зеленга'),
...
('2020-02-16 22:00:00', 'Досанг'),
('2020-02-17 01:00:00', 'Досанг'),
('2020-02-17 04:00:00', 'Досанг'),
('2020-02-17 07:00:00', 'Досанг'),
('2020-02-17 10:00:00', 'Досанг'),
('2020-02-17 13:00:00', 'Досанг'),
('2020-02-17 16:00:00', 'Досанг'),
('2020-02-17 19:00:00', 'Досанг'),
('2020-02-17 22:00:00', 'Досанг'),
('2020-02-18 01:00:00', 'Досанг')],
names=['Дата', 'Location'], length=13185)
然后我想删除没有完整列表的日期的位置。我计算了限制的边界,并对数据框架进行切片。date_frame=(Timestamp('2019-04-26 01:00:00'), Timestamp('2020-01-01 22:00:00'))
所以,对于 df[date_range[0]:date_range[1]].index
我得到了
('2019-04-26 01:00:00', 'Досанг'),
('2019-04-26 01:00:00', 'Зеленга'),
('2019-04-26 01:00:00', 'Харабали'),
('2019-04-26 01:00:00', 'Черный Яр'),
('2019-04-26 04:00:00', 'Астрахань'),
('2019-04-26 04:00:00', 'Досанг'),
('2019-04-26 04:00:00', 'Зеленга'),
('2019-04-26 04:00:00', 'Харабали'),
('2019-04-26 04:00:00', 'Черный Яр'),
...
('2020-01-01 19:00:00', 'Астрахань'),
('2020-01-01 19:00:00', 'Досанг'),
('2020-01-01 19:00:00', 'Зеленга'),
('2020-01-01 19:00:00', 'Харабали'),
('2020-01-01 19:00:00', 'Черный Яр'),
('2020-01-01 22:00:00', 'Астрахань'),
('2020-01-01 22:00:00', 'Досанг'),
('2020-01-01 22:00:00', 'Зеленга'),
('2020-01-01 22:00:00', 'Харабали'),
('2020-01-01 22:00:00', 'Черный Яр')],
names=['Дата', 'Location'], length=10040)
但对于 df[date_range[0]:date_range[1]].index.levels[0]
我总是有
'2019-01-01 07:00:00', '2019-01-01 10:00:00',
'2019-01-01 13:00:00', '2019-01-01 16:00:00',
'2019-01-01 19:00:00', '2019-01-01 22:00:00',
'2019-01-02 01:00:00', '2019-01-02 04:00:00',
...
'2020-02-16 22:00:00', '2020-02-17 01:00:00',
'2020-02-17 04:00:00', '2020-02-17 07:00:00',
'2020-02-17 10:00:00', '2020-02-17 13:00:00',
'2020-02-17 16:00:00', '2020-02-17 19:00:00',
'2020-02-17 22:00:00', '2020-02-18 01:00:00'],
dtype='datetime64[ns]', name='Дата', length=3307, freq=None)
我找到了一个类似的解决方法。
new_idx = df['2019-04-26 01:00:00':'2020-01-01 22:00:00'].reset_index(1).index
但我觉得这不是很优雅