消除熊猫dataframe的特定日期的最固定方法

问题描述 投票:0回答:1

df.ix['2016-04-22'] 从那天起拉所有行。但是,如果我想消除“ 2016-04-22”中的所有行怎么办? 我想要这样的函数:

df.ix[~'2016-04-22']

(但这不起作用)
,如果我想消除日期列表,该怎么办? 

现在,我有以下解决方案:

import numpy as np import pandas as pd from numpy import random ###Create a sample data frame dates = [pd.Timestamp('2016-04-25 06:48:33'), pd.Timestamp('2016-04-27 15:33:23'), pd.Timestamp('2016-04-23 11:23:41'), pd.Timestamp('2016-04-28 12:08:20'), pd.Timestamp('2016-04-21 15:03:49'), pd.Timestamp('2016-04-23 08:13:42'), pd.Timestamp('2016-04-27 21:18:22'), pd.Timestamp('2016-04-27 18:08:23'), pd.Timestamp('2016-04-27 20:48:22'), pd.Timestamp('2016-04-23 14:08:41'), pd.Timestamp('2016-04-27 02:53:26'), pd.Timestamp('2016-04-25 21:48:31'), pd.Timestamp('2016-04-22 12:13:47'), pd.Timestamp('2016-04-27 01:58:26'), pd.Timestamp('2016-04-24 11:48:37'), pd.Timestamp('2016-04-22 08:38:46'), pd.Timestamp('2016-04-26 13:58:28'), pd.Timestamp('2016-04-24 15:23:36'), pd.Timestamp('2016-04-22 07:53:46'), pd.Timestamp('2016-04-27 23:13:22')] values = random.normal(20, 20, 20) df = pd.DataFrame(index=dates, data=values, columns ['values']).sort_index() ### This is the list of dates I want to remove removelist = ['2016-04-22', '2016-04-24']

这个循环基本上抓住了我要删除的日期的索引,然后从主数据帧的索引中消除它,然后积极地从dataframe中选择剩余的日期(即良好日期)。

for r in removelist: elimlist = df.ix[r].index.tolist() ind = df.index.tolist() culind = [i for i in ind if i not in elimlist] df = df.ix[culind]

那里有什么更好的吗?

我还尝试在圆形日期+1天之前尝试索引,所以类似的事情:

df[~((df['Timestamp'] < r+pd.Timedelta("1 day")) & (df['Timestamp'] > r))]

但是这真的很麻烦,并且(归根结底)当我需要消除n个特定日期时,我仍然会使用for循环。 

必须是一种更好的方法!正确的?或许?

	
same想法是@Alexander,但使用

DatetimeIndex

numpy.in1d

的属性:

mask = ~np.in1d(df.index.date, pd.to_datetime(removelist).date) df = df.loc[mask, :]
python datetime pandas indexing data-science
1个回答
4
投票

%timeit df.loc[~np.in1d(df.index.date, pd.to_datetime(removelist).date), :] 1000 loops, best of 3: 1.42 ms per loop %timeit df[[d.date() not in pd.to_datetime(removelist) for d in df.index]] 100 loops, best of 3: 3.25 ms per loop


您可以使用列表理解来创建布尔掩码。

>>> df[[d.date() not in pd.to_datetime(removelist) for d in df.index]]
                        values
2016-04-21 15:03:49  28.059520
2016-04-23 08:13:42 -22.376577
2016-04-23 11:23:41  40.350252
2016-04-23 14:08:41  14.557856
2016-04-25 06:48:33  -0.271976
2016-04-25 21:48:31  20.156240
2016-04-26 13:58:28  -3.225795
2016-04-27 01:58:26  51.991293
2016-04-27 02:53:26  -0.867753
2016-04-27 15:33:23  31.585201
2016-04-27 18:08:23  11.639641
2016-04-27 20:48:22  42.968156
2016-04-27 21:18:22  27.335995
2016-04-27 23:13:22  13.120088
2016-04-28 12:08:20  53.730511

    

在2025年,这对我不起作用,因为PD.TO_DATETIME(removelist)生成了一个带有HH:MM:SS的数据,因此太具体了。我必须修改:

df[[d.date() not in [i.date() for i in pd.to_datetime(removelist)] for d in df.index]]

@@root解决方案大概很好,但是我想用它来避免导入另一个软件包

	
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.