我有一个数据集,其中包含以下结构(具有许多公司和很长的时间范围)的数据集:r:
date month_id company_id value
2024-01-02 1 1 2
2024-01-03 1 1 4
2024-01-04 1 1 2
2024-01-05 1 1 3
2024-01-08 1 1 7
...
2024-06-28 6 1 3
我想对每个Company_ID的过去三个月数据进行计算(基于月_ID)。理想情况下,我只能在过去的n个月中进行分组,并对此进行计算,而不是为滚动均值使用固定函数。以下是使用我的示例数据集的最后三个月_id的滚动均值输出:
month_id company_id 3m_mean
1 1 NA
2 1 NA
3 1 4.903
4 1 4.873
5 1 4.723
6 1 5.063
我还没有找到一个很好的方法。所有滚动功能似乎都应用于最后的N行,但我想在日历月份执行此操作。此外,我的数据集很大(数百万行),我的目标是进行回归而不是均值。
完整示例数据集如下:
date month_id company_id value
2024-01-02 1 1 2
2024-01-03 1 1 4
2024-01-04 1 1 2
2024-01-05 1 1 3
2024-01-08 1 1 7
2024-01-09 1 1 10
2024-01-10 1 1 0
2024-01-11 1 1 10
2024-01-12 1 1 7
2024-01-16 1 1 7
2024-01-17 1 1 3
2024-01-18 1 1 7
2024-01-19 1 1 0
2024-01-22 1 1 9
2024-01-23 1 1 6
2024-01-24 1 1 4
2024-01-25 1 1 6
2024-01-26 1 1 9
2024-01-29 1 1 2
2024-01-30 1 1 7
2024-01-31 1 1 7
2024-02-01 2 1 10
2024-02-02 2 1 2
2024-02-05 2 1 4
2024-02-06 2 1 7
2024-02-07 2 1 0
2024-02-08 2 1 3
2024-02-09 2 1 8
2024-02-12 2 1 7
2024-02-13 2 1 7
2024-02-14 2 1 6
2024-02-15 2 1 7
2024-02-16 2 1 0
2024-02-20 2 1 0
2024-02-21 2 1 4
2024-02-22 2 1 1
2024-02-23 2 1 4
2024-02-26 2 1 7
2024-02-27 2 1 5
2024-02-28 2 1 0
2024-02-29 2 1 6
2024-03-01 3 1 5
2024-03-04 3 1 7
2024-03-05 3 1 4
2024-03-06 3 1 1
2024-03-07 3 1 2
2024-03-08 3 1 2
2024-03-11 3 1 2
2024-03-12 3 1 3
2024-03-13 3 1 4
2024-03-14 3 1 7
2024-03-15 3 1 8
2024-03-18 3 1 7
2024-03-19 3 1 5
2024-03-20 3 1 0
2024-03-21 3 1 9
2024-03-22 3 1 4
2024-03-25 3 1 1
2024-03-26 3 1 10
2024-03-27 3 1 10
2024-03-28 3 1 8
2024-03-29 3 1 5
2024-04-01 4 1 6
2024-04-02 4 1 2
2024-04-03 4 1 6
2024-04-04 4 1 7
2024-04-05 4 1 2
2024-04-08 4 1 3
2024-04-09 4 1 0
2024-04-10 4 1 7
2024-04-11 4 1 3
2024-04-12 4 1 3
2024-04-15 4 1 7
2024-04-16 4 1 9
2024-04-17 4 1 5
2024-04-18 4 1 7
2024-04-19 4 1 1
2024-04-22 4 1 8
2024-04-23 4 1 8
2024-04-24 4 1 1
2024-04-25 4 1 3
2024-04-26 4 1 9
2024-04-29 4 1 10
2024-04-30 4 1 8
2024-05-01 5 1 6
2024-05-02 5 1 0
2024-05-03 5 1 0
2024-05-06 5 1 8
2024-05-07 5 1 0
2024-05-08 5 1 3
2024-05-09 5 1 5
2024-05-10 5 1 9
2024-05-13 5 1 9
2024-05-14 5 1 1
2024-05-15 5 1 9
2024-05-16 5 1 2
2024-05-17 5 1 6
2024-05-20 5 1 6
2024-05-21 5 1 5
2024-05-22 5 1 2
2024-05-23 5 1 9
2024-05-24 5 1 5
2024-05-28 5 1 1
2024-05-29 5 1 0
2024-05-30 5 1 0
2024-05-31 5 1 2
2024-06-03 6 1 8
2024-06-04 6 1 9
2024-06-05 6 1 0
2024-06-06 6 1 3
2024-06-07 6 1 0
2024-06-10 6 1 7
2024-06-11 6 1 10
2024-06-12 6 1 4
2024-06-13 6 1 10
2024-06-14 6 1 5
2024-06-17 6 1 9
2024-06-18 6 1 6
2024-06-20 6 1 9
2024-06-21 6 1 8
2024-06-24 6 1 6
2024-06-25 6 1 9
2024-06-26 6 1 3
2024-06-27 6 1 7
2024-06-28 6 1 3
。从Https://slider.r-lib.org/index.html
:示例数据:
slide_index()
计算相对于索引的滚动计算。如果您曾经想计算一个“ 3个月滚动平均值”之类的东西,其中每个月的天数不规则,您可能会喜欢此功能。参考
library(slider) library(dplyr) dat |> mutate(`3m_mean` = slide_index_mean(value, month_id, before = 2, complete = TRUE)) |> glimpse() |> summarise(`3m_mean` = `3m_mean`[1], .by = c(month_id, company_id)) #> Rows: 125 #> Columns: 5 #> $ date <chr> "2024-01-02", "2024-01-03", "2024-01-04", "2024-01-05", "20… #> $ month_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… #> $ company_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… #> $ value <int> 2, 4, 2, 3, 7, 10, 0, 10, 7, 7, 3, 7, 0, 9, 6, 4, 6, 9, 2, … #> $ `3m_mean` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… #> month_id company_id 3m_mean #> 1 1 1 NA #> 2 2 1 NA #> 3 3 1 4.903226 #> 4 4 1 4.873016 #> 5 5 1 4.723077 #> 6 6 1 5.063492
dat <- read.table(header = TRUE, text =
"date month_id company_id value
2024-01-02 1 1 2
2024-01-03 1 1 4
2024-01-04 1 1 2
2024-01-05 1 1 3
2024-01-08 1 1 7
2024-01-09 1 1 10
2024-01-10 1 1 0
2024-01-11 1 1 10
2024-01-12 1 1 7
2024-01-16 1 1 7
2024-01-17 1 1 3
2024-01-18 1 1 7
2024-01-19 1 1 0
2024-01-22 1 1 9
2024-01-23 1 1 6
2024-01-24 1 1 4
2024-01-25 1 1 6
2024-01-26 1 1 9
2024-01-29 1 1 2
2024-01-30 1 1 7
2024-01-31 1 1 7
2024-02-01 2 1 10
2024-02-02 2 1 2
2024-02-05 2 1 4
2024-02-06 2 1 7
2024-02-07 2 1 0
2024-02-08 2 1 3
2024-02-09 2 1 8
2024-02-12 2 1 7
2024-02-13 2 1 7
2024-02-14 2 1 6
2024-02-15 2 1 7
2024-02-16 2 1 0
2024-02-20 2 1 0
2024-02-21 2 1 4
2024-02-22 2 1 1
2024-02-23 2 1 4
2024-02-26 2 1 7
2024-02-27 2 1 5
2024-02-28 2 1 0
2024-02-29 2 1 6
2024-03-01 3 1 5
2024-03-04 3 1 7
2024-03-05 3 1 4
2024-03-06 3 1 1
2024-03-07 3 1 2
2024-03-08 3 1 2
2024-03-11 3 1 2
2024-03-12 3 1 3
2024-03-13 3 1 4
2024-03-14 3 1 7
2024-03-15 3 1 8
2024-03-18 3 1 7
2024-03-19 3 1 5
2024-03-20 3 1 0
2024-03-21 3 1 9
2024-03-22 3 1 4
2024-03-25 3 1 1
2024-03-26 3 1 10
2024-03-27 3 1 10
2024-03-28 3 1 8
2024-03-29 3 1 5
2024-04-01 4 1 6
2024-04-02 4 1 2
2024-04-03 4 1 6
2024-04-04 4 1 7
2024-04-05 4 1 2
2024-04-08 4 1 3
2024-04-09 4 1 0
2024-04-10 4 1 7
2024-04-11 4 1 3
2024-04-12 4 1 3
2024-04-15 4 1 7
2024-04-16 4 1 9
2024-04-17 4 1 5
2024-04-18 4 1 7
2024-04-19 4 1 1
2024-04-22 4 1 8
2024-04-23 4 1 8
2024-04-24 4 1 1
2024-04-25 4 1 3
2024-04-26 4 1 9
2024-04-29 4 1 10
2024-04-30 4 1 8
2024-05-01 5 1 6
2024-05-02 5 1 0
2024-05-03 5 1 0
2024-05-06 5 1 8
2024-05-07 5 1 0
2024-05-08 5 1 3
2024-05-09 5 1 5
2024-05-10 5 1 9
2024-05-13 5 1 9
2024-05-14 5 1 1
2024-05-15 5 1 9
2024-05-16 5 1 2
2024-05-17 5 1 6
2024-05-20 5 1 6
2024-05-21 5 1 5
2024-05-22 5 1 2
2024-05-23 5 1 9
2024-05-24 5 1 5
2024-05-28 5 1 1
2024-05-29 5 1 0
2024-05-30 5 1 0
2024-05-31 5 1 2
2024-06-03 6 1 8
2024-06-04 6 1 9
2024-06-05 6 1 0
2024-06-06 6 1 3
2024-06-07 6 1 0
2024-06-10 6 1 7
2024-06-11 6 1 10
2024-06-12 6 1 4
2024-06-13 6 1 10
2024-06-14 6 1 5
2024-06-17 6 1 9
2024-06-18 6 1 6
2024-06-20 6 1 9
2024-06-21 6 1 8
2024-06-24 6 1 6
2024-06-25 6 1 9
2024-06-26 6 1 3
2024-06-27 6 1 7
2024-06-28 6 1 3")