如何获取到 Polars 数据框中某个值之前出现的距离?

问题描述 投票:0回答:1

我想有效地找到从当前行到上一个出现的距离。我知道极坐标没有索引,但公式大致是:

if prior_occurrence {
  (current_row_index - prior_occurrence_index - 1)
} else {
  -1
}

这是输入数据框:

let df_a = df![
    "a" => [1, 2, 2, 1, 4, 1],
    "b" => ["c","a", "b", "c", "c","a"]
].unwrap();

println!("{}", df_a);
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 1   ┆ c   │
│ 2   ┆ a   │
│ 2   ┆ b   │
│ 1   ┆ c   │
│ 4   ┆ c   │
│ 1   ┆ a   │
└─────┴─────┘

想要的输出:

┌─────┬─────┬────────┐
│ a   ┆ b   ┆ b_dist │
│ --- ┆ --- ┆ ---    │
│ i32 ┆ str ┆ i32    │
╞═════╪═════╪════════╡
│ 1   ┆ c   ┆ -1     │
│ 2   ┆ a   ┆ -1     │
│ 2   ┆ b   ┆ -1     │
│ 1   ┆ c   ┆ 2      │
│ 4   ┆ c   ┆ 0      │
│ 1   ┆ a   ┆ 3      │
└─────┴─────┴────────┘

最有效的方法是什么?

dataframe rust rust-polars
1个回答
1
投票

蟒蛇

(df
 .with_row_count("idx")
 .with_columns([
      ((pl.col("idx") - pl.col("idx").shift()).cast(pl.Int32).fill_null(0) - 1)
      .over("a").alias("a_distance_to_a")
 ])
)

生锈


fn func1() -> PolarsResult<()> {
    let df_a = df![
    "a" => [1, 2, 2, 1, 4, 1],
    "b" => ["c","a", "b", "c", "c","a"]
    ]?;

    let out = df_a
        .lazy()
        .with_row_count("idx", None)
        .with_columns([((col("idx") - col("idx").shift(1))
            .cast(DataType::Int32)
            .fill_null(0)
            - lit(1))
        .over("a")
        .alias("a_distance_to_a")])
        .collect()?;

    Ok(())

输出

shape: (6, 4)
┌─────┬─────┬─────┬─────────────────┐
│ idx ┆ a   ┆ b   ┆ a_distance_to_a │
│ --- ┆ --- ┆ --- ┆ ---             │
│ u32 ┆ i64 ┆ str ┆ i32             │
╞═════╪═════╪═════╪═════════════════╡
│ 0   ┆ 1   ┆ c   ┆ -1              │
│ 1   ┆ 2   ┆ a   ┆ -1              │
│ 2   ┆ 2   ┆ b   ┆ 0               │
│ 3   ┆ 1   ┆ c   ┆ 2               │
│ 4   ┆ 4   ┆ c   ┆ -1              │
│ 5   ┆ 1   ┆ a   ┆ 1               │
└─────┴─────┴─────┴─────────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.