在极坐标中运行 group_by_dynamic 但仅在时间戳上运行

问题描述 投票:0回答:1

我有一些像这样的虚拟数据:

datetime,duration_in_traffic_s
2023-12-20T10:50:43.063641000,221.0
2023-12-20T10:59:09.884939000,219.0
2023-12-20T11:09:56.003331000,206.0
...
more rows with different dates
...

假设此数据存储在文件中

mwe.csv
。 使用
polars
,我现在想要计算第二列的平均值,按一小时的块分组。我想使用
group_by_dynamic
(doc) 每 10 分钟获取一次数据。我跑

(
    pl.read_csv("mwe.csv")
    .with_columns(pl.col("datetime").cast(pl.Datetime))
    .sort("datetime")
    .group_by_dynamic(
        index_column="datetime",
        every="10m",
        period="1h",
    )
    .agg(pl.col("duration_in_traffic_s").mean())
)

结果看起来像这样

但是,我不希望平均值考虑日期,而只考虑时间,例如

2023-12-20 10:40
2023-12-21 10:40
应落入同一个垃圾箱。

我希望将

.with_columns(pl.col("datetime").dt.time())
添加到管道中会有所帮助,但
group_by_dynamic
不适用于时间数据。

我可以手动将时间列计算为浮动

(
    pl.read_csv("mwe.csv")
    .with_columns(pl.col("datetime").cast(dtype=pl.Datetime))
    .with_columns(
        t=pl.col("datetime").dt.hour().cast(pl.Float64)
        + pl.col("datetime").dt.minute().cast(pl.Float64) / 60
        + pl.col("datetime").dt.second().cast(pl.Float64) / 60 / 60
    )
).sort("t")

但我不知道如何进行分组。另外,我确实喜欢时间格式,所以我希望能够保留它。

有没有办法只对时间数据进行动态分组,忽略日期?

这是完整的

mwe.csv
文件:

datetime,duration_in_traffic_s
2023-12-20T10:50:43.063641000,221.0
2023-12-20T10:59:09.884939000,219.0
2023-12-20T11:09:56.003331000,206.0
2023-12-20T11:12:42.347660000,206.0
2023-12-20T11:17:40.084821000,200.0
2023-12-20T11:31:14.957092000,222.0
2023-12-20T11:46:08.886872000,209.0
2023-12-20T12:00:02.024328000,198.0
2023-12-20T12:15:01.910446000,251.0
2023-12-20T12:30:01.447496000,229.0
2023-12-20T12:45:02.761839000,206.0
2023-12-20T14:00:01.456811000,262.0
2023-12-20T14:15:01.718898000,226.0
2023-12-20T14:30:02.452185000,194.0
2023-12-20T14:45:01.717522000,191.0
2023-12-20T14:49:10.150735000,196.0
2023-12-20T14:50:55.800417000,194.0
2023-12-20T14:57:05.230577000,202.0
2023-12-20T14:59:23.005408000,192.0
2023-12-20T15:00:01.316240000,193.0
2023-12-20T15:00:14.842233000,193.33333333333334
2023-12-20T15:00:49.370172000,193.66666666666666
2023-12-20T15:01:06.300133000,193.66666666666666
2023-12-20T15:15:01.943587000,183.0
2023-12-20T15:20:01.567126000,184.0
2023-12-20T15:30:01.784686000,197.0
2023-12-20T15:40:02.468132000,188.0
2023-12-20T15:50:01.968746000,226.0
2023-12-20T16:00:01.864652000,233.0
2023-12-20T16:10:01.185016000,213.0
2023-12-20T16:20:01.544796000,252.0
2023-12-20T16:30:01.621331000,224.0
2023-12-20T16:40:03.567996000,228.0
2023-12-20T16:50:01.014911000,220.0
2023-12-20T17:00:01.723306000,232.0
2023-12-20T17:10:02.490695000,215.0
2023-12-20T17:20:01.844304000,214.0
2023-12-20T17:30:02.147457000,204.0
2023-12-20T17:40:02.217333000,198.0
2023-12-20T17:50:01.741479000,193.0
2023-12-20T18:00:01.665714000,193.0
2023-12-20T18:10:02.334926000,182.0
2023-12-20T18:26:43.135849000,185.0
2023-12-20T18:30:02.434296000,184.0
2023-12-20T18:32:41.033250000,175.0
2023-12-20T18:40:02.941171000,176.0
2023-12-20T19:36:47.313925000,175.0
2023-12-20T19:40:01.895983000,171.0
2023-12-20T19:50:02.049567000,167.0
2023-12-20T20:00:08.284378000,166.0
2023-12-20T20:10:02.727202000,166.0
2023-12-20T20:40:02.407489000,161.0
2023-12-20T21:10:02.100392000,158.0
2023-12-20T21:21:56.063346000,157.0
2023-12-20T21:30:02.005594000,159.0
2023-12-20T21:40:01.915306000,153.0
2023-12-20T21:50:02.318419000,152.0
2023-12-20T22:00:02.369086000,154.0
2023-12-20T22:10:02.704019000,154.0
2023-12-20T22:20:01.968418000,160.0
2023-12-20T22:30:01.965742000,159.0
2023-12-20T22:40:02.718295000,164.0
2023-12-20T22:50:02.347303000,160.0
2023-12-21T05:00:02.595535000,164.0
2023-12-21T05:10:02.642932000,163.0
2023-12-21T05:20:02.390676000,164.0
2023-12-21T05:30:01.971166000,165.0
2023-12-21T05:40:01.874958000,169.0
2023-12-21T05:50:01.806441000,167.0
2023-12-21T06:00:02.396094000,169.0
2023-12-21T06:10:02.350196000,169.0
2023-12-21T06:20:02.041357000,169.0
2023-12-21T06:33:43.895397000,177.0
2023-12-21T07:30:02.240918000,210.0
2023-12-21T07:47:16.654805000,200.0
2023-12-21T07:50:02.960362000,199.0
2023-12-21T08:10:16.746286000,194.0
2023-12-21T08:20:02.218056000,198.0
2023-12-21T08:30:01.729418000,198.0
2023-12-21T08:40:02.345477000,194.0
2023-12-21T08:50:01.464156000,190.0
2023-12-21T09:00:02.476057000,188.0
2023-12-21T09:10:02.130653000,213.0
2023-12-21T09:20:02.364758000,188.0
2023-12-21T09:30:02.499917000,188.0
2023-12-21T09:40:01.911754000,188.0
2023-12-21T09:50:01.885705000,197.0
2023-12-21T10:00:01.633757000,198.0
2023-12-21T10:10:02.531765000,200.0
2023-12-21T10:20:01.685657000,221.0
2023-12-21T10:30:01.567600000,207.0
2023-12-21T10:40:02.279429000,203.0
2023-12-21T10:50:02.548892000,191.0
2023-12-21T11:00:01.622794000,219.0
2023-12-21T11:10:01.435424000,200.0
2023-12-21T11:20:01.849114000,234.0
2023-12-21T11:30:02.391425000,222.0
2023-12-21T11:40:01.796607000,191.0
2023-12-21T11:50:01.776906000,205.0
2023-12-21T12:00:02.485984000,239.0
python python-polars
1个回答
0
投票

我建议截断 10 分钟,然后使用

.dt.time
:

df.group_by(pl.col("datetime").dt.truncate("10m").dt.time().alias("time")).agg(
    pl.col("duration_in_traffic_s").mean()
).sort("time")
Out[12]:
shape: (85, 2)
┌──────────┬───────────────────────┐
│ time     ┆ duration_in_traffic_s │
│ ---      ┆ ---                   │
│ time     ┆ f64                   │
╞══════════╪═══════════════════════╡
│ 05:00:00 ┆ 164.0                 │
│ 05:10:00 ┆ 163.0                 │
│ 05:20:00 ┆ 164.0                 │
│ 05:30:00 ┆ 165.0                 │
│ …        ┆ …                     │
│ 22:20:00 ┆ 160.0                 │
│ 22:30:00 ┆ 159.0                 │
│ 22:40:00 ┆ 164.0                 │
│ 22:50:00 ┆ 160.0                 │
└──────────┴───────────────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.