如何在 Pandas 的一列中查找 Nan 之前的第一个非 NAN 数据

Question

例如，我有一些这样的数据：

column = pd.Series([1,2,3,np.nan,4,np.nan,7])
print column

执行命令，结果如下：

现在我想知道每个 NaN 值之前的第一个值是什么，例如第一个 NaN 之前的 3.0。 4.0 是第二个 NaN 值之前的结果。 pandas 中是否有任何内置函数可以完成此操作，或者我应该编写一个 for 循环来完成此操作？

Answer 1

解决方案适用于非连续

NaN

s。
您可以将

boolean indexing

与由

isnull

、

shift

和

fillna

创建的蒙版一起使用：

print (column[column.isnull().shift(-1).fillna(False)])
2    3.0
4    4.0
dtype: float64

print (column.isnull())
0    False
1    False
2    False
3     True
4    False
5     True
6    False
dtype: bool

print (column.isnull().shift(-1))
0    False
1    False
2     True
3    False
4     True
5    False
6      NaN
dtype: object

print (column.isnull().shift(-1).fillna(False))
0    False
1    False
2     True
3    False
4     True
5    False
6    False
dtype: bool

连续的

NaN

需要通过反转

乘以

mul

:

column = pd.Series([np.nan,2,3,np.nan,np.nan,np.nan,7,np.nan, np.nan, 5,np.nan])

c = column.isnull()
mask = c.shift(-1).fillna(False).mul(~c)
print (mask)
0     False
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8     False
9      True
10    False
dtype: bool

print (column[mask])
2    3.0
6    7.0
9    5.0
dtype: float64

Answer 2

与@jezrael相同的想法...

numpy

fied。

column[np.append(np.isnan(column.values)[1:], False)]

2    3.0
4    4.0
dtype: float64

完成

pd.Series

重建

m = np.append(np.isnan(column.values)[1:], False)
pd.Series(column.values[m], column.index[m])

2    3.0
4    4.0
dtype: float64

不是那么快，但很直观。按

cumsum

中的

isnull

进行分组并取最后一个值。在这个结果中，去掉最后一行。

column.groupby(column.isnull().cumsum()).last().iloc[:-1]

0    3.0
1    4.0
dtype: float64

Answer 3

duckdb：   
 (
    df1.sql.set_alias("tb1")
    .select("*,last_value(col1 ignore nulls) over(order by index) col2")
    
    .order("index")
    )
    
    ┌───────┬────────┬────────┐
    │ index │  col1  │  col2  │
    │ int64 │ double │ double │
    ├───────┼────────┼────────┤
    │     0 │    1.0 │    1.0 │
    │     1 │    2.0 │    2.0 │
    │     2 │   NULL │    2.0 │
    │     3 │   NULL │    2.0 │
    │     4 │    4.0 │    4.0 │
    │     5 │   NULL │    4.0 │
    │     6 │    7.0 │    7.0 │
    └───────┴────────┴────────┘

如何在 Pandas 的一列中查找 Nan 之前的第一个非 NAN 数据

问题描述投票：0回答：3

3个回答

最新问题

如何在 Pandas 的一列中查找 Nan 之前的第一个非 NAN 数据

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3