如何使用NaN替换异常值同时使用python中的pandas保持行完整？

Question

我正在处理一个非常大的文件，需要消除每列的不同异常值。

我已经能够找到异常值并用NaN替换它们，但它将整行转换为NaN。我确信我错过了一些简单但我似乎无法找到的东西。

import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 100000)   
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 1000)

df = pd.read_excel('example sheet.xlsx')   

df = df.replace(df.loc[df['column 2']<=0] ,np.nan)
print(df)

如何只将一个值转换为NaN而不是整行？

谢谢

Answer 1

要使用NAN更改某个单元格，您应该更改系列值。而不是数据帧替换，您应该使用系列repalce。

错误的方法：

df = df.replace(df.loc[df['column 2']<=0] ,np.nan)

一种正确的方法：

for col in df.columns:
    s = df[col]
    outlier_s = s<=0
    df[col] = s.where(~outlier_s,np.nan)

where function: Replace values where the condition is False.

http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html?highlight=where#pandas.DataFrame.where

Answer 2

使用np.where根据条件替换值。

# if you have to perform only for single column
df['column 2'] = np.where(df['column 2']<=0, np.nan, df['column 2'])


# if you want to apply on all/multiple columns.
for col in df.columns:
    df[col] = np.where(df[col]<=0, np.nan, df[col])

Answer 3

您可以执行以下操作：

df.mask(df <= 0, np.nan, axis=1)

无需迭代列。

但是，我建议你使用适当的统计数据来定义异常值，而不是<= 0。

您可以像以下一样使用quantiles：

df.mask(((df < df.quantile(0.05)) or (df > df.quantile(0.95))), np.nan, axis=1)

如何使用NaN替换异常值同时使用python中的pandas保持行完整？

问题描述投票：0回答：3

3个回答

where function: Replace values where the condition is False.

最新问题

如何使用NaN替换异常值同时使用python中的pandas保持行完整？

问题描述 投票：0回答：3

3个回答

where function: Replace values where the condition is False.

最新问题

问题描述投票：0回答：3