与这篇文章类似计算 cummean() 和 cumsd(),同时忽略 NA 值并填充 NA 但我宁愿坚持
tidyverse
我的数据:
structure(list(season = c("Winter", "Winter", "Winter", "Winter",
"Winter", "Winter", "Winter", "Winter", "Winter", "Spring", "Spring",
"Spring", "Spring", "Spring", "Spring", "Spring", "Spring", "Spring"
), tmean = c(NA, 2, 3, 4, NA, NA, NA, 8, NA, 7, 8, 9, NA, NA,
5, 3, 2, NA)), class = "data.frame", row.names = c(NA, -18L))
season tmean
Winter NA
Winter 2
Winter 3
Winter 4
Winter NA
Winter NA
Winter NA
Winter 8
Winter NA
Spring 7
Spring 8
Spring 9
Spring NA
Spring NA
Spring 5
Spring 3
Spring 2
Spring NA
我想要什么:
season tmean cumtmean
Winter NA NA
Winter 2 2
Winter 3 2.5
Winter 4 3
Winter NA 3
Winter NA 3
Winter NA 3
Winter 8 4.25
Winter NA 4.25
Spring 7 7
Spring 8 7.5
Spring 9 8
Spring NA 8
Spring NA 8
Spring 5 7.25
Spring 3 6.4
Spring 2 5.66
Spring NA 5.66
您可以定义自己的辅助函数来考虑 NA 值。
cummean_na <- function(x) {
num <- cumsum(tidyr::replace_na(x,0))
denom <- cumsum(!is.na(x))
if_else(denom!=0, num/denom, NA)
}
然后如果你的data.frame被称为
dd
你可以使用dplyr
来做
dd %>%
mutate(cmtean = cummean_na(tmean), .by=season)
返回预期值
season tmean cmtean
1 Winter NA NA
2 Winter 2 2.000000
3 Winter 3 2.500000
4 Winter 4 3.000000
5 Winter NA 3.000000
6 Winter NA 3.000000
7 Winter NA 3.000000
8 Winter 8 4.250000
9 Winter NA 4.250000
10 Spring 7 7.000000
11 Spring 8 7.500000
12 Spring 9 8.000000
13 Spring NA 8.000000
14 Spring NA 8.000000
15 Spring 5 7.250000
16 Spring 3 6.400000
17 Spring 2 5.666667
18 Spring NA 5.666667
你可以这样做:
library(dplyr)
library(tidyr)
df |>
group_by(season) |>
mutate(cumtmean = replace(tmean, complete.cases(tmean), cummean(na.omit(tmean)))) |>
fill(cumtmean)