Python Polars 如何根据行条件过滤列

Question

根据列中的值过滤（排除列）极坐标数据帧的正确方法是什么？例如：

polars_df.std()

输出：

col_1 (f64)	col_2 (f64)
20242.888632	0.0

# now, the column with a standard deviation of 0 are to be excluded from the dataframe:
polars_df = polars_df.select(cs.numeric() != 0)

这会产生以下结果：

col_1（布尔）	col_2（布尔）
真实	假

这标识了正确的列，但我希望结果是一个包含标准差大于 0 的所有列的数据框。我在这里缺少什么才能到达那里？

Answer 1

我希望结果是一个数据框，其中所有列的标准差都大于 0

做

df.select(col for col in df.iter_columns() if col.std() > 0)

Answer 2

如果您发现自己反复需要执行此类操作，则可以构建一个接受

DataFrame

和某个表达式的函数（该表达式应聚合并返回布尔数据类型，但不会执行任何检查来强制执行此操作）。

from numpy.random import default_rng
import polars as pl
from polars import selectors as cs

def column_filter(df, expr):
    tmp = df.select(expr)
    if isinstance(df, pl.LazyFrame):
        tmp = tmp.collect()
    status = tmp.rows()[0]
    return pl.col(col for col, st in zip(df.columns, status) if st)

rng = default_rng(0)
df = (
    pl.DataFrame(
        rng.normal([[0, 1, 2, 3]], [[0, 1, 0, 2]], size=(100, 4)),
        schema=[*'abcd']
    )
    .with_columns(e=rng.choice([*'WXYZ'], size=100))
)


print(
    df.select(
        column_filter(df, cs.numeric().std() > 0),
        pl.col('e'),
    ),
    sep='\n',
)
# shape: (100, 3)
# ┌───────────┬──────────┬─────┐
# │ b         ┆ d        ┆ e   │
# │ ---       ┆ ---      ┆ --- │
# │ f64       ┆ f64      ┆ str │
# ╞═══════════╪══════════╪═════╡
# │ 0.867895  ┆ 3.2098   ┆ X   │
# │ 1.361595  ┆ 4.894162 ┆ W   │
# │ -0.265421 ┆ 3.082652 ┆ Z   │
# │ 0.781208  ┆ 1.535465 ┆ Y   │
# │ …         ┆ …        ┆ …   │
# │ 0.512782  ┆ 1.759142 ┆ W   │
# │ 1.367466  ┆ 2.039108 ┆ Z   │
# │ 0.418967  ┆ 3.178112 ┆ Z   │
# │ -0.095472 ┆ 3.887983 ┆ X   │
# └───────────┴──────────┴─────┘

这样做的好处是可以将过滤后的列作为表达式返回，以便您可以根据需要对子集进行操作并重新组合数据。

Python Polars 如何根据行条件过滤列

问题描述投票：0回答：2

2个回答

最新问题

Python Polars 如何根据行条件过滤列

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2