如何在r?

问题描述 投票:0回答:1

我有一个数据集,其中包含以下结构(具有许多公司和很长的时间范围)的数据集:r:

date          month_id  company_id  value
2024-01-02    1        1           2
2024-01-03    1        1           4
2024-01-04    1        1           2
2024-01-05    1        1           3
2024-01-08    1        1           7
...
2024-06-28    6        1           3

我想对每个Company_ID的过去三个月数据进行计算(基于月_ID)。理想情况下,我只能在过去的n个月中进行分组,并对此进行计算,而不是为滚动均值使用固定函数。以下是使用我的示例数据集的最后三个月_id的滚动均值输出:

month_id  company_id  3m_mean
1         1          NA
2         1          NA
3         1          4.903
4         1          4.873
5         1          4.723
6         1          5.063
我还没有找到一个很好的方法。所有滚动功能似乎都应用于最后的N行,但我想在日历月份执行此操作。此外,我的数据集很大(数百万行),我的目标是进行回归而不是均值。

完整示例数据集如下:

date month_id company_id value 2024-01-02 1 1 2 2024-01-03 1 1 4 2024-01-04 1 1 2 2024-01-05 1 1 3 2024-01-08 1 1 7 2024-01-09 1 1 10 2024-01-10 1 1 0 2024-01-11 1 1 10 2024-01-12 1 1 7 2024-01-16 1 1 7 2024-01-17 1 1 3 2024-01-18 1 1 7 2024-01-19 1 1 0 2024-01-22 1 1 9 2024-01-23 1 1 6 2024-01-24 1 1 4 2024-01-25 1 1 6 2024-01-26 1 1 9 2024-01-29 1 1 2 2024-01-30 1 1 7 2024-01-31 1 1 7 2024-02-01 2 1 10 2024-02-02 2 1 2 2024-02-05 2 1 4 2024-02-06 2 1 7 2024-02-07 2 1 0 2024-02-08 2 1 3 2024-02-09 2 1 8 2024-02-12 2 1 7 2024-02-13 2 1 7 2024-02-14 2 1 6 2024-02-15 2 1 7 2024-02-16 2 1 0 2024-02-20 2 1 0 2024-02-21 2 1 4 2024-02-22 2 1 1 2024-02-23 2 1 4 2024-02-26 2 1 7 2024-02-27 2 1 5 2024-02-28 2 1 0 2024-02-29 2 1 6 2024-03-01 3 1 5 2024-03-04 3 1 7 2024-03-05 3 1 4 2024-03-06 3 1 1 2024-03-07 3 1 2 2024-03-08 3 1 2 2024-03-11 3 1 2 2024-03-12 3 1 3 2024-03-13 3 1 4 2024-03-14 3 1 7 2024-03-15 3 1 8 2024-03-18 3 1 7 2024-03-19 3 1 5 2024-03-20 3 1 0 2024-03-21 3 1 9 2024-03-22 3 1 4 2024-03-25 3 1 1 2024-03-26 3 1 10 2024-03-27 3 1 10 2024-03-28 3 1 8 2024-03-29 3 1 5 2024-04-01 4 1 6 2024-04-02 4 1 2 2024-04-03 4 1 6 2024-04-04 4 1 7 2024-04-05 4 1 2 2024-04-08 4 1 3 2024-04-09 4 1 0 2024-04-10 4 1 7 2024-04-11 4 1 3 2024-04-12 4 1 3 2024-04-15 4 1 7 2024-04-16 4 1 9 2024-04-17 4 1 5 2024-04-18 4 1 7 2024-04-19 4 1 1 2024-04-22 4 1 8 2024-04-23 4 1 8 2024-04-24 4 1 1 2024-04-25 4 1 3 2024-04-26 4 1 9 2024-04-29 4 1 10 2024-04-30 4 1 8 2024-05-01 5 1 6 2024-05-02 5 1 0 2024-05-03 5 1 0 2024-05-06 5 1 8 2024-05-07 5 1 0 2024-05-08 5 1 3 2024-05-09 5 1 5 2024-05-10 5 1 9 2024-05-13 5 1 9 2024-05-14 5 1 1 2024-05-15 5 1 9 2024-05-16 5 1 2 2024-05-17 5 1 6 2024-05-20 5 1 6 2024-05-21 5 1 5 2024-05-22 5 1 2 2024-05-23 5 1 9 2024-05-24 5 1 5 2024-05-28 5 1 1 2024-05-29 5 1 0 2024-05-30 5 1 0 2024-05-31 5 1 2 2024-06-03 6 1 8 2024-06-04 6 1 9 2024-06-05 6 1 0 2024-06-06 6 1 3 2024-06-07 6 1 0 2024-06-10 6 1 7 2024-06-11 6 1 10 2024-06-12 6 1 4 2024-06-13 6 1 10 2024-06-14 6 1 5 2024-06-17 6 1 9 2024-06-18 6 1 6 2024-06-20 6 1 9 2024-06-21 6 1 8 2024-06-24 6 1 6 2024-06-25 6 1 9 2024-06-26 6 1 3 2024-06-27 6 1 7 2024-06-28 6 1 3

	
r rolling-computation
1个回答
0
投票

slider::slide_index_mean()

Https://slider.r-lib.org/index.html

slide_index()计算相对于索引的滚动计算。如果您曾经想计算一个“ 3个月滚动平均值”之类的东西,其中每个月的天数不规则,您可能会喜欢此功能。参考

library(slider)
library(dplyr)

dat |> 
  mutate(`3m_mean` = slide_index_mean(value, month_id, before = 2, complete = TRUE)) |> 
  glimpse() |> 
  summarise(`3m_mean` = `3m_mean`[1], .by = c(month_id, company_id))

#> Rows: 125
#> Columns: 5
#> $ date       <chr> "2024-01-02", "2024-01-03", "2024-01-04", "2024-01-05", "20…
#> $ month_id   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ company_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ value      <int> 2, 4, 2, 3, 7, 10, 0, 10, 7, 7, 3, 7, 0, 9, 6, 4, 6, 9, 2, …
#> $ `3m_mean`  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

#>   month_id company_id  3m_mean
#> 1        1          1       NA
#> 2        2          1       NA
#> 3        3          1 4.903226
#> 4        4          1 4.873016
#> 5        5          1 4.723077
#> 6        6          1 5.063492

示例数据:
dat <- read.table(header = TRUE, text = 
"date    month_id    company_id  value
2024-01-02  1   1   2
2024-01-03  1   1   4
2024-01-04  1   1   2
2024-01-05  1   1   3
2024-01-08  1   1   7
2024-01-09  1   1   10
2024-01-10  1   1   0
2024-01-11  1   1   10
2024-01-12  1   1   7
2024-01-16  1   1   7
2024-01-17  1   1   3
2024-01-18  1   1   7
2024-01-19  1   1   0
2024-01-22  1   1   9
2024-01-23  1   1   6
2024-01-24  1   1   4
2024-01-25  1   1   6
2024-01-26  1   1   9
2024-01-29  1   1   2
2024-01-30  1   1   7
2024-01-31  1   1   7
2024-02-01  2   1   10
2024-02-02  2   1   2
2024-02-05  2   1   4
2024-02-06  2   1   7
2024-02-07  2   1   0
2024-02-08  2   1   3
2024-02-09  2   1   8
2024-02-12  2   1   7
2024-02-13  2   1   7
2024-02-14  2   1   6
2024-02-15  2   1   7
2024-02-16  2   1   0
2024-02-20  2   1   0
2024-02-21  2   1   4
2024-02-22  2   1   1
2024-02-23  2   1   4
2024-02-26  2   1   7
2024-02-27  2   1   5
2024-02-28  2   1   0
2024-02-29  2   1   6
2024-03-01  3   1   5
2024-03-04  3   1   7
2024-03-05  3   1   4
2024-03-06  3   1   1
2024-03-07  3   1   2
2024-03-08  3   1   2
2024-03-11  3   1   2
2024-03-12  3   1   3
2024-03-13  3   1   4
2024-03-14  3   1   7
2024-03-15  3   1   8
2024-03-18  3   1   7
2024-03-19  3   1   5
2024-03-20  3   1   0
2024-03-21  3   1   9
2024-03-22  3   1   4
2024-03-25  3   1   1
2024-03-26  3   1   10
2024-03-27  3   1   10
2024-03-28  3   1   8
2024-03-29  3   1   5
2024-04-01  4   1   6
2024-04-02  4   1   2
2024-04-03  4   1   6
2024-04-04  4   1   7
2024-04-05  4   1   2
2024-04-08  4   1   3
2024-04-09  4   1   0
2024-04-10  4   1   7
2024-04-11  4   1   3
2024-04-12  4   1   3
2024-04-15  4   1   7
2024-04-16  4   1   9
2024-04-17  4   1   5
2024-04-18  4   1   7
2024-04-19  4   1   1
2024-04-22  4   1   8
2024-04-23  4   1   8
2024-04-24  4   1   1
2024-04-25  4   1   3
2024-04-26  4   1   9
2024-04-29  4   1   10
2024-04-30  4   1   8
2024-05-01  5   1   6
2024-05-02  5   1   0
2024-05-03  5   1   0
2024-05-06  5   1   8
2024-05-07  5   1   0
2024-05-08  5   1   3
2024-05-09  5   1   5
2024-05-10  5   1   9
2024-05-13  5   1   9
2024-05-14  5   1   1
2024-05-15  5   1   9
2024-05-16  5   1   2
2024-05-17  5   1   6
2024-05-20  5   1   6
2024-05-21  5   1   5
2024-05-22  5   1   2
2024-05-23  5   1   9
2024-05-24  5   1   5
2024-05-28  5   1   1
2024-05-29  5   1   0
2024-05-30  5   1   0
2024-05-31  5   1   2
2024-06-03  6   1   8
2024-06-04  6   1   9
2024-06-05  6   1   0
2024-06-06  6   1   3
2024-06-07  6   1   0
2024-06-10  6   1   7
2024-06-11  6   1   10
2024-06-12  6   1   4
2024-06-13  6   1   10
2024-06-14  6   1   5
2024-06-17  6   1   9
2024-06-18  6   1   6
2024-06-20  6   1   9
2024-06-21  6   1   8
2024-06-24  6   1   6
2024-06-25  6   1   9
2024-06-26  6   1   3
2024-06-27  6   1   7
2024-06-28  6   1   3")

用Rreprexv2.1.1

于2025-03-16创建

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.