我需要使用平均值按列(组)聚合数据框。我只想汇总每列中每组中每个组的缺失观察值少于(假设为20%)的情况(如果没有,则返回NA)。知道我将如何实现吗? (我也可以使用data.table或dplyr之类的包)
#Sample data
set.seed(123)
dat <- data.frame(group = sample(letters[1:4], 100, replace = T),
x = sample(c(rnorm(4, 12, 0.3), NA), 100, replace = T),
y = sample(c(rnorm(4, 12, 0.3), NA), 100, replace = T),
z = sample(c(rnorm(4, 12, 0.3), NA), 100, replace = T))
head(dat)
cols2check <- c("x", "y", "z")
out <- colMeans(dat[cols2check], na.rm = TRUE)
out[sapply(dat[cols2check], function(x) mean(is.na(x)) < 0.2)] <- NA_real_
out
# x y z
# 12.11241 11.59669 NA