分组日期的滚动平均值

问题描述 投票:0回答:1

我有一个每半小时观察一次的时间序列。 我想要每天测量值范围的运行平均值。 我按日期分组并获得正确的每日范围,但是我能想到的获取运行平均值的所有方法都只能在每日组内工作,而不是在每日组内工作,因为每天只有一个范围,所以我得到的值是总是那个范围。 这是一个例子: library(tidyverse) library(zoo) set.seed(3) dts <- sort(sample(seq(as_datetime("2024-08-15 09:00:00 EDT"), as_datetime("2024-08-22 09:00:00 EDT"), by="hour"), 24)) df <- tibble(dts = dts, temp = sample(10:20, 24, replace=TRUE)) df <- df %>% mutate(date = as.Date(dts)) %>% group_by(date) %>% mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE)) %>% mutate(range = tMax - tMin) %>% mutate(rollRange = rollapply(range, 3, mean, fill=NA))

在实际数据中,每天的观测值总是比滚动窗口中的天数多得多,因此每天顶部只有 NA,此后与 
range

相同。 一个额外的复杂性是,由于其他原因,每天的行数是随机的,所以我不能只创建我的窗口

obs/day * days
。 我是否必须将其
summarize()
导出到单独的数据框,然后将其合并回来?
抱歉,这是reprex的结果加上列

desired

显示了我正在寻找的内容:

   dts                  temp date        tMax  tMin range rollRange desired
   <dttm>              <int> <date>     <int> <int> <int>     <dbl>   <dbl>
 1 2024-08-15 13:00:00    17 2024-08-15    19    17     2        NA   NA   
 2 2024-08-15 20:00:00    19 2024-08-15    19    17     2        NA   NA   
 3 2024-08-16 02:00:00    20 2024-08-16    20    12     8        NA    4   
 4 2024-08-16 04:00:00    16 2024-08-16    20    12     8         8    4   
 5 2024-08-16 06:00:00    12 2024-08-16    20    12     8         8    4   
 6 2024-08-16 20:00:00    14 2024-08-16    20    12     8         8    4   
 7 2024-08-16 21:00:00    16 2024-08-16    20    12     8        NA    4   
 8 2024-08-17 00:00:00    15 2024-08-17    17    15     2        NA    5.33
 9 2024-08-17 08:00:00    17 2024-08-17    17    15     2        NA    5.33
10 2024-08-18 06:00:00    19 2024-08-18    19    13     6        NA    4.33
11 2024-08-18 10:00:00    13 2024-08-18    19    13     6        NA    4.33
12 2024-08-19 16:00:00    10 2024-08-19    15    10     5        NA    6.66
13 2024-08-19 19:00:00    12 2024-08-19    15    10     5         5    6.66
14 2024-08-19 20:00:00    15 2024-08-19    15    10     5        NA    6.66
15 2024-08-20 06:00:00    13 2024-08-20    18    13     5        NA    6.66
16 2024-08-20 08:00:00    18 2024-08-20    18    13     5        NA    6.66
17 2024-08-21 00:00:00    19 2024-08-21    19    10     9        NA    6   
18 2024-08-21 01:00:00    16 2024-08-21    19    10     9         9    6   
19 2024-08-21 04:00:00    19 2024-08-21    19    10     9         9    6   
20 2024-08-21 08:00:00    10 2024-08-21    19    10     9         9    6   
21 2024-08-21 19:00:00    18 2024-08-21    19    10     9         9    6   
22 2024-08-21 21:00:00    10 2024-08-21    19    10     9        NA    6   
23 2024-08-22 05:00:00    18 2024-08-22    18    14     4        NA   NA   
24 2024-08-22 09:00:00    14 2024-08-22    18    14     4        NA   NA  

	
r grouping rolling-computation
1个回答
0
投票

library(dplyr) df |> mutate(date = as.Date(dts)) |> mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE), range = tMax - tMin, rollRange = zoo::rollapplyr(range, 3, mean, fill=NA), .by = date) #> dts temp date tMax tMin range rollRange #> 1 2024-08-15 13:00:00 17 2024-08-15 19 17 2 NA #> 2 2024-08-15 20:00:00 19 2024-08-15 19 17 2 NA #> 3 2024-08-16 02:00:00 20 2024-08-16 20 12 8 NA #> 4 2024-08-16 04:00:00 16 2024-08-16 20 12 8 NA #> 5 2024-08-16 06:00:00 12 2024-08-16 20 12 8 8 #> 6 2024-08-16 20:00:00 14 2024-08-16 20 12 8 8 #> 7 2024-08-16 21:00:00 16 2024-08-16 20 12 8 8 #> 8 2024-08-17 00:00:00 15 2024-08-17 17 15 2 NA #> 9 2024-08-17 08:00:00 17 2024-08-17 17 15 2 NA #> 10 2024-08-18 06:00:00 19 2024-08-18 19 13 6 NA #> 11 2024-08-18 10:00:00 13 2024-08-18 19 13 6 NA #> 12 2024-08-19 16:00:00 10 2024-08-19 15 10 5 NA #> 13 2024-08-19 19:00:00 12 2024-08-19 15 10 5 NA #> 14 2024-08-19 20:00:00 15 2024-08-19 15 10 5 5 #> 15 2024-08-20 06:00:00 13 2024-08-20 18 13 5 NA #> 16 2024-08-20 08:00:00 18 2024-08-20 18 13 5 NA #> 17 2024-08-21 00:00:00 19 2024-08-21 19 10 9 NA #> 18 2024-08-21 01:00:00 16 2024-08-21 19 10 9 NA #> 19 2024-08-21 04:00:00 19 2024-08-21 19 10 9 9 #> 20 2024-08-21 08:00:00 10 2024-08-21 19 10 9 9 #> 21 2024-08-21 19:00:00 18 2024-08-21 19 10 9 9 #> 22 2024-08-21 21:00:00 10 2024-08-21 19 10 9 9 #> 23 2024-08-22 05:00:00 18 2024-08-22 18 14 4 NA #> 24 2024-08-22 09:00:00 14 2024-08-22 18 14 4 NA

注意

给出数据

df = structure(list(dts = c("2024-08-15 13:00:00", "2024-08-15 20:00:00", "2024-08-16 02:00:00", "2024-08-16 04:00:00", "2024-08-16 06:00:00", "2024-08-16 20:00:00", "2024-08-16 21:00:00", "2024-08-17 00:00:00", "2024-08-17 08:00:00", "2024-08-18 06:00:00", "2024-08-18 10:00:00", "2024-08-19 16:00:00", "2024-08-19 19:00:00", "2024-08-19 20:00:00", "2024-08-20 06:00:00", "2024-08-20 08:00:00", "2024-08-21 00:00:00", "2024-08-21 01:00:00", "2024-08-21 04:00:00", "2024-08-21 08:00:00", "2024-08-21 19:00:00", "2024-08-21 21:00:00", "2024-08-22 05:00:00", "2024-08-22 09:00:00"), temp = c(17L, 19L, 20L, 16L, 12L, 14L, 16L, 15L, 17L, 19L, 13L, 10L, 12L, 15L, 13L, 18L, 19L, 16L, 19L, 10L, 18L, 10L, 18L, 14L)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24"), class = "data.frame")

	
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.