根据过滤条件用极坐标替换多行/相当于pandas中的df.loc

Question

考虑这个虚拟数据集，

import numpy as np
import pandas as pd
import polars as pl


np.random.seed(25)
num_rows = 6
data = {
    'item_id': np.random.choice(['A', 'B'], num_rows),
    'store_id': np.random.choice([1, 2], num_rows),
    'sold_quantity': np.random.randint(0, 5, num_rows),  
    'total_sku_count': np.random.choice([0, 1], num_rows),
    'netsales': np.random.choice([50,100], num_rows)
    
}
df = pd.DataFrame(data)

主数据框，

# pl.from_pandas(df.reset_index())
shape: (6, 6)
┌───────┬─────────┬──────────┬───────────────┬─────────────────┬──────────┐
│ index ┆ item_id ┆ store_id ┆ sold_quantity ┆ total_sku_count ┆ netsales │
│ ---   ┆ ---     ┆ ---      ┆ ---           ┆ ---             ┆ ---      │
│ i64   ┆ str     ┆ i64      ┆ i64           ┆ i64             ┆ i64      │
╞═══════╪═════════╪══════════╪═══════════════╪═════════════════╪══════════╡
│ 0     ┆ A       ┆ 2        ┆ 4             ┆ 0               ┆ 100      │
│ 1     ┆ A       ┆ 2        ┆ 4             ┆ 1               ┆ 50       │
│ 2     ┆ A       ┆ 1        ┆ 1             ┆ 1               ┆ 100      │
│ 3     ┆ B       ┆ 1        ┆ 4             ┆ 0               ┆ 50       │
│ 4     ┆ B       ┆ 2        ┆ 1             ┆ 1               ┆ 100      │
│ 5     ┆ A       ┆ 1        ┆ 3             ┆ 1               ┆ 100      │
└───────┴─────────┴──────────┴───────────────┴─────────────────┴──────────┘

现在我正在创建一个过滤后的 df，其中的行使用“.iloc”仅包含“item_id”=“A”。然后，我将所有行的“netsales”值更改为 120，

sub_df = df.loc[df["item_id"]=="A"] 
sub_df['netsales'] = 120

这是过滤后的 df，其值已更改，

# pl.from_pandas(sub_df.reset_index())
shape: (4, 6)
┌───────┬─────────┬──────────┬───────────────┬─────────────────┬──────────┐
│ index ┆ item_id ┆ store_id ┆ sold_quantity ┆ total_sku_count ┆ netsales │
│ ---   ┆ ---     ┆ ---      ┆ ---           ┆ ---             ┆ ---      │
│ i64   ┆ str     ┆ i64      ┆ i64           ┆ i64             ┆ i64      │
╞═══════╪═════════╪══════════╪═══════════════╪═════════════════╪══════════╡
│ 0     ┆ A       ┆ 2        ┆ 4             ┆ 0               ┆ 120      │
│ 1     ┆ A       ┆ 2        ┆ 4             ┆ 1               ┆ 120      │
│ 2     ┆ A       ┆ 1        ┆ 1             ┆ 1               ┆ 120      │
│ 5     ┆ A       ┆ 1        ┆ 3             ┆ 1               ┆ 120      │
└───────┴─────────┴──────────┴───────────────┴─────────────────┴──────────┘

现在我可以在同一位置使用下面的代码行用过滤后的 df 替换主 df 的行。

df.loc[df["item_id"]=="A"] = sub_df

这是最终的df，

# pl.from_pandas(df)
shape: (6, 5)
┌─────────┬──────────┬───────────────┬─────────────────┬──────────┐
│ item_id ┆ store_id ┆ sold_quantity ┆ total_sku_count ┆ netsales │
│ ---     ┆ ---      ┆ ---           ┆ ---             ┆ ---      │
│ str     ┆ i64      ┆ i64           ┆ i64             ┆ i64      │
╞═════════╪══════════╪═══════════════╪═════════════════╪══════════╡
│ A       ┆ 2        ┆ 4             ┆ 0               ┆ 120      │
│ A       ┆ 2        ┆ 4             ┆ 1               ┆ 120      │
│ A       ┆ 1        ┆ 1             ┆ 1               ┆ 120      │
│ B       ┆ 1        ┆ 4             ┆ 0               ┆ 50       │
│ B       ┆ 2        ┆ 1             ┆ 1               ┆ 100      │
│ A       ┆ 1        ┆ 3             ┆ 1               ┆ 120      │
└─────────┴──────────┴───────────────┴─────────────────┴──────────┘

我实际上想在 Polars 中执行相同的操作。我尝试使用 Polars 的“过滤器”方法来实现此目的，但由于我对 Polars 缺乏经验，所以没有成功。如果有办法实现此目的，请帮助我。非常感谢您的支持。

Answer 1

您可以通过多种方式完成任务。 IIUC 可能有这里有两个问题：

您如何执行您想要的特定操作。
我们如何更新 DataFrame

1.根据条件替换单列中的值

import numpy as np
import polars as pl

np.random.seed(25)
num_rows = 6
data = {
    'item_id': np.random.choice(['A', 'B'], num_rows),
    'store_id': np.random.choice([1, 2], num_rows),
    'sold_quantity': np.random.randint(0, 5, num_rows),
    'total_sku_count': np.random.choice([0, 1], num_rows),
    'netsales': np.random.choice([50,100], num_rows)

}
pl_df = pl.DataFrame(data).lazy() # We'll use a LazyFrame

pl_df = pl_df.with_columns(
    netsales=pl.when(pl.col('item_id') == 'A').then(120).otherwise(pl.col('netsales'))
)

print(pl_df.collect())
# shape: (6, 5)
# ┌─────────┬──────────┬───────────────┬─────────────────┬──────────┐
# │ item_id ┆ store_id ┆ sold_quantity ┆ total_sku_count ┆ netsales │
# │ ---     ┆ ---      ┆ ---           ┆ ---             ┆ ---      │
# │ str     ┆ i64      ┆ i64           ┆ i64             ┆ i64      │
# ╞═════════╪══════════╪═══════════════╪═════════════════╪══════════╡
# │ A       ┆ 2        ┆ 4             ┆ 0               ┆ 120      │
# │ A       ┆ 2        ┆ 4             ┆ 1               ┆ 120      │
# │ A       ┆ 1        ┆ 1             ┆ 1               ┆ 120      │
# │ B       ┆ 1        ┆ 4             ┆ 0               ┆ 50       │
# │ B       ┆ 2        ┆ 1             ┆ 1               ┆ 100      │
# │ A       ┆ 1        ┆ 3             ┆ 1               ┆ 120      │
# └─────────┴──────────┴───────────────┴─────────────────┴──────────┘

以上完成了您的操作，但是您可能也有兴趣更改上面有很多列按照子框架。

2. “替换”子框架：过滤、更新

您可以根据计算出的掩码对 DataFrame 进行分区，其中组内是

df.filter(mask)

，剩下的组是

df.filter(~mask)

。这您可以对组内进行更改并将两个部分连接回来的方式一起。我们还需要跟踪行号以恢复原始顺序我们的行。

import numpy as np
import polars as pl

np.random.seed(25)
num_rows = 6
data = {
    'item_id': np.random.choice(['A', 'B'], num_rows),
    'store_id': np.random.choice([1, 2], num_rows),
    'sold_quantity': np.random.randint(0, 5, num_rows),
    'total_sku_count': np.random.choice([0, 1], num_rows),
    'netsales': np.random.choice([50,100], num_rows)

}
pl_df = pl.DataFrame(data).lazy().with_row_count()

mask = pl.col('item_id') == 'A'
sub_df = (
    pl_df.filter(mask)
    .with_columns(netsales=pl.lit(120))
    .cast(pl_df.schema)
)
remaining_df = pl_df.filter(~mask)

# Reconstitute original DataFrame with new chunk
pl_df = pl.concat([remaining_df, sub_df]).sort(pl.col('row_nr'))

print(
    pl_df.collect(),
)
# shape: (6, 6)
# ┌────────┬─────────┬──────────┬───────────────┬─────────────────┬──────────┐
# │ row_nr ┆ item_id ┆ store_id ┆ sold_quantity ┆ total_sku_count ┆ netsales │
# │ ---    ┆ ---     ┆ ---      ┆ ---           ┆ ---             ┆ ---      │
# │ u32    ┆ str     ┆ i64      ┆ i64           ┆ i64             ┆ i64      │
# ╞════════╪═════════╪══════════╪═══════════════╪═════════════════╪══════════╡
# │ 0      ┆ A       ┆ 2        ┆ 4             ┆ 0               ┆ 120      │
# │ 1      ┆ A       ┆ 2        ┆ 4             ┆ 1               ┆ 120      │
# │ 2      ┆ A       ┆ 1        ┆ 1             ┆ 1               ┆ 120      │
# │ 3      ┆ B       ┆ 1        ┆ 4             ┆ 0               ┆ 50       │
# │ 4      ┆ B       ┆ 2        ┆ 1             ┆ 1               ┆ 100      │
# │ 5      ┆ A       ┆ 1        ┆ 3             ┆ 1               ┆ 120      │
# └────────┴─────────┴──────────┴───────────────┴─────────────────┴──────────┘

3. “替换”子框架：过滤、更新

Polars 还有一个

DataFrame.update

方法，用于模拟就地更新。这采用了与我们上面类似的方法，除了它使用了

.join(…)….coalesce(…)

在引擎盖下而不是连接。

import numpy as np
import polars as pl

np.random.seed(25)
num_rows = 6
data = {
    'item_id': np.random.choice(['A', 'B'], num_rows),
    'store_id': np.random.choice([1, 2], num_rows),
    'sold_quantity': np.random.randint(0, 5, num_rows),
    'total_sku_count': np.random.choice([0, 1], num_rows),
    'netsales': np.random.choice([50,100], num_rows)

}
pl_df = pl.DataFrame(data).lazy().with_row_count()

sub_df = (
    pl_df.filter(pl.col('item_id') == 'A')
    .with_columns(netsales=pl.lit(120))
)

print(
    pl_df.update(sub_df, on='row_nr').collect()
)
# shape: (6, 6)
# ┌────────┬─────────┬──────────┬───────────────┬─────────────────┬──────────┐
# │ row_nr ┆ item_id ┆ store_id ┆ sold_quantity ┆ total_sku_count ┆ netsales │
# │ ---    ┆ ---     ┆ ---      ┆ ---           ┆ ---             ┆ ---      │
# │ u32    ┆ str     ┆ i64      ┆ i64           ┆ i64             ┆ i64      │
# ╞════════╪═════════╪══════════╪═══════════════╪═════════════════╪══════════╡
# │ 0      ┆ A       ┆ 2        ┆ 4             ┆ 0               ┆ 120      │
# │ 1      ┆ A       ┆ 2        ┆ 4             ┆ 1               ┆ 120      │
# │ 2      ┆ A       ┆ 1        ┆ 1             ┆ 1               ┆ 120      │
# │ 3      ┆ B       ┆ 1        ┆ 4             ┆ 0               ┆ 50       │
# │ 4      ┆ B       ┆ 2        ┆ 1             ┆ 1               ┆ 100      │
# │ 5      ┆ A       ┆ 1        ┆ 3             ┆ 1               ┆ 120      │
# └────────┴─────────┴──────────┴───────────────┴─────────────────┴──────────┘

Answer 2

您想使用当/然后/否则:

df.with_columns(
    net_sales=pl.when(pl.col("item_id") == "A")
        .then(120)
        .otherwise(pl.col("net_sales"))
)

根据过滤条件用极坐标替换多行/相当于pandas中的df.loc

问题描述投票：0回答：2

2个回答

1.根据条件替换单列中的值

2. “替换”子框架：过滤、更新

3. “替换”子框架：过滤、更新

最新问题

根据过滤条件用极坐标替换多行/相当于pandas中的df.loc

问题描述 投票：0回答：2

2个回答

1.根据条件替换单列中的值

2. “替换”子框架：过滤、更新

3. “替换”子框架：过滤、更新

最新问题

问题描述投票：0回答：2