Pandas 数据帧的按行聚合

Question

编写一个函数的最 Pythonic 方法是什么，该函数对 pandas 数据帧的指定列集（列表中的列名称）进行行式聚合（总和、最小值、最大值、平均值等），同时跳过 NaN 值？

import pandas as pd
import numpy as np

df = pd.DataFrame({"col1": [1, np.NaN, 1],
                   "col2": [2, 2, np.NaN]})

def aggregate_rows(df, column_list, func):
    # Check if the specified columns exist in the DataFrame
    missing_columns = [col for col in column_list if col not in df.columns]
    if missing_columns:
        raise ValueError(f"Columns not found in DataFrame: {missing_columns}")

    # Check if func is callable
    if not callable(func):
        raise ValueError("The provided function is not callable.")

    # Sum the specified columns
    agg_series = df[column_list].apply(lambda row: func(row.dropna()), axis=1)

    return agg_series

df["sum"] = aggregate_rows(df, ["col1", "col2"], sum)
df["max"] = aggregate_rows(df, ["col1", "col2"], max)
df["mean"] = aggregate_rows(df, ["col1", "col2"], lambda x: x.mean())
print(df)

结果（如预期）：

   col1  col2  sum  max  mean
0   1.0   2.0  3.0  2.0   1.5
1   NaN   2.0  2.0  2.0   2.0
2   1.0   NaN  1.0  1.0   1.0

但是只有 NaN 值的行，

df = pd.DataFrame({"col1": [1, np.NaN, 1, np.NaN],
                   "col2": [2, 2, np.NaN, np.NaN]})

结果：

ValueError: max() arg is an empty sequence

解决此问题的最佳方法是什么？

Answer 1

您可以尝试使用

numpy.sum

/

numpy.max

/

numpy.mean

代替 Python 的内置函数：

df["sum"] = aggregate_rows(df, ["col1", "col2"], np.sum)
df["max"] = aggregate_rows(df, ["col1", "col2"], np.max)
df["mean"] = aggregate_rows(df, ["col1", "col2"], np.mean)

print(df)

打印：

   col1  col2  sum  max  mean
0   1.0   2.0  3.0  2.0   1.5
1   NaN   2.0  2.0  2.0   2.0
2   1.0   NaN  1.0  1.0   1.0
3   NaN   NaN  0.0  NaN   NaN

Pandas 数据帧的按行聚合

问题描述投票：0回答：1

1个回答

最新问题

Pandas 数据帧的按行聚合

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1