我正在努力实现以下目标:
aggregate_commitment_df = commitment_df.group_by("sprint_start_date", "squad").agg(
stories_committed=pl.count(pl.col("issue_type") == "Story"),
spikes_committed=pl.count(pl.col("issue_type") == "Spike"),
bugs_committed=pl.count(pl.col("issue_type") == "Bug"),
story_points_committed=pl.sum("story_points"),
)
,例如,
stories_committed
将是每个 issue_type == "Story"
和 sprint_start_date
具有 squad
的所有行的计数。
在使用 count
之前,您需要在聚合上下文中使用
filter。
import polars as pl
from datetime import date
commitment_df = pl.DataFrame({
"sprint_start_date": [date(2023,8,1),date(2023,8,2),date(2023,8,3),] * 8,
"squad": ["team1", "team2"] * 12,
"issue_type": ["Story"] * 10 + ["Spike"] * 8 + ["Bug"] * 6,
"story_points": [*range(4)] * 6,
})
aggregate_commitment_df = commitment_df.group_by("sprint_start_date", "squad").agg(
stories_committed=pl.col("issue_type").filter(pl.col("issue_type") == "Story").count(),
spikes_committed=pl.col("issue_type").filter(pl.col("issue_type") == "Spike").count(),
bugs_committed=pl.col("issue_type").filter(pl.col("issue_type") == "Bug").count(),
story_points_committed=pl.sum("story_points"),
)
print(aggregate_commitment_df)
结果:
shape: (6, 6)
┌───────────────────┬───────┬───────────────────┬──────────────────┬────────────────┬────────────────────────┐
│ sprint_start_date ┆ squad ┆ stories_committed ┆ spikes_committed ┆ bugs_committed ┆ story_points_committed │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ str ┆ u32 ┆ u32 ┆ u32 ┆ i64 │
╞═══════════════════╪═══════╪═══════════════════╪══════════════════╪════════════════╪════════════════════════╡
│ 2023-08-03 ┆ team2 ┆ 1 ┆ 2 ┆ 1 ┆ 8 │
│ 2023-08-02 ┆ team1 ┆ 1 ┆ 2 ┆ 1 ┆ 4 │
│ 2023-08-01 ┆ team1 ┆ 2 ┆ 1 ┆ 1 ┆ 4 │
│ 2023-08-03 ┆ team1 ┆ 2 ┆ 1 ┆ 1 ┆ 4 │
│ 2023-08-02 ┆ team2 ┆ 2 ┆ 1 ┆ 1 ┆ 8 │
│ 2023-08-01 ┆ team2 ┆ 2 ┆ 1 ┆ 1 ┆ 8 │
└───────────────────┴───────┴───────────────────┴──────────────────┴────────────────┴────────────────────────┘