我有一个每半小时观察一次的时间序列。 我想要每天测量值范围的运行平均值。 我按日期分组并获得正确的每日范围,但是我能想到的获取运行平均值的所有方法都只能在每日组内工作,而不是在每日组内工作,因为每天只有一个范围,所以我得到的值是总是那个范围。 这是一个例子:
library(tidyverse)
library(zoo)
set.seed(3)
dts <- sort(sample(seq(as_datetime("2024-08-15 09:00:00 EDT"), as_datetime("2024-08-22 09:00:00 EDT"), by="hour"), 24))
df <- tibble(dts = dts, temp = sample(10:20, 24, replace=TRUE))
df <- df %>%
mutate(date = as.Date(dts)) %>%
group_by(date) %>%
mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE)) %>%
mutate(range = tMax - tMin) %>%
mutate(rollRange = rollapply(range, 3, mean, fill=NA))
在实际数据中,每天的观测值总是比滚动窗口中的天数多得多,因此每天顶部只有 NA,此后与
range
相同。 一个额外的复杂性是,由于其他原因,每天的行数是随机的,所以我不能只创建我的窗口
obs/day * days
。 我是否必须将其summarize()
导出到单独的数据框,然后将其合并回来?抱歉,这是reprex的结果加上列desired
显示了我正在寻找的内容:
dts temp date tMax tMin range rollRange desired
<dttm> <int> <date> <int> <int> <int> <dbl> <dbl>
1 2024-08-15 13:00:00 17 2024-08-15 19 17 2 NA NA
2 2024-08-15 20:00:00 19 2024-08-15 19 17 2 NA NA
3 2024-08-16 02:00:00 20 2024-08-16 20 12 8 NA 4
4 2024-08-16 04:00:00 16 2024-08-16 20 12 8 8 4
5 2024-08-16 06:00:00 12 2024-08-16 20 12 8 8 4
6 2024-08-16 20:00:00 14 2024-08-16 20 12 8 8 4
7 2024-08-16 21:00:00 16 2024-08-16 20 12 8 NA 4
8 2024-08-17 00:00:00 15 2024-08-17 17 15 2 NA 5.33
9 2024-08-17 08:00:00 17 2024-08-17 17 15 2 NA 5.33
10 2024-08-18 06:00:00 19 2024-08-18 19 13 6 NA 4.33
11 2024-08-18 10:00:00 13 2024-08-18 19 13 6 NA 4.33
12 2024-08-19 16:00:00 10 2024-08-19 15 10 5 NA 6.66
13 2024-08-19 19:00:00 12 2024-08-19 15 10 5 5 6.66
14 2024-08-19 20:00:00 15 2024-08-19 15 10 5 NA 6.66
15 2024-08-20 06:00:00 13 2024-08-20 18 13 5 NA 6.66
16 2024-08-20 08:00:00 18 2024-08-20 18 13 5 NA 6.66
17 2024-08-21 00:00:00 19 2024-08-21 19 10 9 NA 6
18 2024-08-21 01:00:00 16 2024-08-21 19 10 9 9 6
19 2024-08-21 04:00:00 19 2024-08-21 19 10 9 9 6
20 2024-08-21 08:00:00 10 2024-08-21 19 10 9 9 6
21 2024-08-21 19:00:00 18 2024-08-21 19 10 9 9 6
22 2024-08-21 21:00:00 10 2024-08-21 19 10 9 NA 6
23 2024-08-22 05:00:00 18 2024-08-22 18 14 4 NA NA
24 2024-08-22 09:00:00 14 2024-08-22 18 14 4 NA NA
library(dplyr)
df |> mutate(date = as.Date(dts)) |>
mutate(tMax = max(temp, na.rm = TRUE),
tMin = min(temp, na.rm = TRUE),
range = tMax - tMin,
rollRange = zoo::rollapplyr(range, 3, mean, fill=NA),
.by = date)
#> dts temp date tMax tMin range rollRange
#> 1 2024-08-15 13:00:00 17 2024-08-15 19 17 2 NA
#> 2 2024-08-15 20:00:00 19 2024-08-15 19 17 2 NA
#> 3 2024-08-16 02:00:00 20 2024-08-16 20 12 8 NA
#> 4 2024-08-16 04:00:00 16 2024-08-16 20 12 8 NA
#> 5 2024-08-16 06:00:00 12 2024-08-16 20 12 8 8
#> 6 2024-08-16 20:00:00 14 2024-08-16 20 12 8 8
#> 7 2024-08-16 21:00:00 16 2024-08-16 20 12 8 8
#> 8 2024-08-17 00:00:00 15 2024-08-17 17 15 2 NA
#> 9 2024-08-17 08:00:00 17 2024-08-17 17 15 2 NA
#> 10 2024-08-18 06:00:00 19 2024-08-18 19 13 6 NA
#> 11 2024-08-18 10:00:00 13 2024-08-18 19 13 6 NA
#> 12 2024-08-19 16:00:00 10 2024-08-19 15 10 5 NA
#> 13 2024-08-19 19:00:00 12 2024-08-19 15 10 5 NA
#> 14 2024-08-19 20:00:00 15 2024-08-19 15 10 5 5
#> 15 2024-08-20 06:00:00 13 2024-08-20 18 13 5 NA
#> 16 2024-08-20 08:00:00 18 2024-08-20 18 13 5 NA
#> 17 2024-08-21 00:00:00 19 2024-08-21 19 10 9 NA
#> 18 2024-08-21 01:00:00 16 2024-08-21 19 10 9 NA
#> 19 2024-08-21 04:00:00 19 2024-08-21 19 10 9 9
#> 20 2024-08-21 08:00:00 10 2024-08-21 19 10 9 9
#> 21 2024-08-21 19:00:00 18 2024-08-21 19 10 9 9
#> 22 2024-08-21 21:00:00 10 2024-08-21 19 10 9 9
#> 23 2024-08-22 05:00:00 18 2024-08-22 18 14 4 NA
#> 24 2024-08-22 09:00:00 14 2024-08-22 18 14 4 NA
给出数据
df = structure(list(dts = c("2024-08-15 13:00:00", "2024-08-15 20:00:00",
"2024-08-16 02:00:00", "2024-08-16 04:00:00", "2024-08-16 06:00:00",
"2024-08-16 20:00:00", "2024-08-16 21:00:00", "2024-08-17 00:00:00",
"2024-08-17 08:00:00", "2024-08-18 06:00:00", "2024-08-18 10:00:00",
"2024-08-19 16:00:00", "2024-08-19 19:00:00", "2024-08-19 20:00:00",
"2024-08-20 06:00:00", "2024-08-20 08:00:00", "2024-08-21 00:00:00",
"2024-08-21 01:00:00", "2024-08-21 04:00:00", "2024-08-21 08:00:00",
"2024-08-21 19:00:00", "2024-08-21 21:00:00", "2024-08-22 05:00:00",
"2024-08-22 09:00:00"), temp = c(17L, 19L, 20L, 16L, 12L, 14L,
16L, 15L, 17L, 19L, 13L, 10L, 12L, 15L, 13L, 18L, 19L, 16L, 19L,
10L, 18L, 10L, 18L, 14L)), row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24"), class = "data.frame")