我有一个包含多个站点的调查数据列表,我需要对其进行循环方差分析,以比较每个站点的不同测量值的调查。数据框如下所示:
> Site Survey Over Under Total
> A 21 4 8 6
> A 22 5 6 8
> A 23 6 7 7
> B 21 4 6 8
> B 22 7 3 3
> B 23 4 8 9
> C 21 7 9 4
> C 22 2 1 5
> C 23 5 2 6
> D 21 7 4 3
> D 22 2 6 4
> D 23 1 5 2
还有很多网站、调查和重复,这些都是假数据。理想情况下,我想运行以便查看每个站点的结果、每个调查之间的每个结果(超过、低于、总计)。能够按 p 过滤<0.05 and see the results for each result in one table/dataframe.
我已经成功地使用以下方法在一个数据框中获取了 tibbles 列表:
aov<-df%>%
dplyr::group_by(SITE)%>%
tidyr::nest()%>%
dplyr::mutate(.data=.,
aov_results=data %>% purrr::map(.x = ., .f = ~ summary(aov(TOTAL ~ SURVEY, data = .))))%>%
但是我必须使用以下方法单独询问每个人:
aov$aov_results[1]
任何帮助将不胜感激,我对 R 和统计总体来说很陌生。 预先感谢您。
我想我知道你在找什么。本质上是运行一个嵌套循环,对因变量和站点的每个组合进行回归。然而,一个密切相关的解决方案是估计完整的多元模型。这将为您提供完全相同的响应,但由于方差估计不同,因此标准误差不同,因此 p 值也不同;假设每个站点具有相同的错误分布。这两种方法都可以有效,具体取决于您想要什么。 第三种方法可能是混合模型。如果您期望站点之间存在相关性,那么这值得研究。
su <- structure(list(Site=structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L),
levels=c("A", "B", "C", "D"), class="factor"), Survey=c(21, 21,
21, 22, 22, 22, 23, 23, 23, 21, 21, 21, 22, 22, 22, 23, 23, 23,
21, 21, 21, 22, 22, 22, 23, 23, 23, 21, 21, 21, 22, 22, 22, 23,
23, 23), Over=c(4, 3, 3, 5, 5, 5, 6, 6, 7, 4, 3, 3, 7, 7, 7, 8,
8, 9, 2, 1, 1, 3, 3, 3, 4, 4, 5, 1, 2, 3, 2, 2, 2, 1, 1, 1),
Under=c(8, 8, 8, 7, 6, 5, 6, 6, 6, 6, 6, 6, 3, 2, 4, 2, 2, 2,
9, 9, 9, 2, 4, 3, 1, 1, 1, 6, 6, 6, 5, 4, 5, 4, 4, 2),
Total=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 4, 2, 3, 5, 3, 4, 6,
3, 4, 5, 5, 6, 7, 7, 8, 9, 6, 4, 2, 5, 4, 3, 6, 4, 2)),
row.names=c(NA, -36L), class="data.frame")
lev <- levels(su$Site)
dv <- c("Over", "Under", "Total")
eg <- expand.grid(dv=dv, lev=lev)
mods <- lapply(1:nrow(eg),
function(i) {
.lev <- eg[i, "lev"]
.dv <- eg[i, "dv"]
f <- paste(.dv, "~ Survey")
.mod <- lm(f, data=su[su$Site == .lev,])
.s <- as.data.frame(summary(.mod)$coefficients[, c(1, 4)])
cbind(coef=rownames(.s), eg[i,], .s)
}
)
mods <- do.call(rbind, mods)
mods[c("Estimate", "Pr(>|t|)")] <- round(mods[c("Estimate", "Pr(>|t|)")], 6)
mods
# coef dv lev Estimate Pr(>|t|)
# 1 (Intercept) Over A -28.111111 0.000208
# 2 Survey Over A 1.500000 0.000075
# 3 (Intercept) Under A 28.666667 0.003940
# 4 Survey Under A -1.000000 0.014246
# 5 (Intercept) Total A -61.000000 0.000159
# 6 Survey Total A 3.000000 0.000096
# 7 (Intercept) Over B -48.777778 0.000189
# 8 Survey Over B 2.500000 0.000088
# 9 (Intercept) Under B 47.666667 0.000209
# 10 Survey Under B -2.000000 0.000340
# 11 (Intercept) Total B -18.666667 0.185389
# 12 Survey Total B 1.000000 0.126870
# 13 (Intercept) Over C -30.111111 0.000135
# 14 Survey Over C 1.500000 0.000075
# 15 (Intercept) Under C 92.333333 0.000057
# 16 Survey Under C -4.000000 0.000078
# 17 (Intercept) Total C -38.000000 0.002584
# 18 Survey Total C 2.000000 0.001134
# 19 (Intercept) Over D 12.666667 0.050469
# 20 Survey Over D -0.500000 0.079602
# 21 (Intercept) Under D 34.000000 0.000924
# 22 Survey Under D -1.333333 0.002125
# 23 (Intercept) Total D 4.000000 0.789383
# 24 Survey Total D 0.000000 1.000000
mlm <- lm(cbind(Over, Under, Total) ~ Survey*Site, data=su)
(cof <- round(coef(mlm), 6))
# Over Under Total
# (Intercept) -28.111111 28.666667 -61.000000
# Survey 1.500000 -1.000000 3.000000
# SiteB -20.666667 19.000000 42.333333
# SiteC -2.000000 63.666667 23.000000
# SiteD 40.777778 5.333333 65.000000
# Survey:SiteB 1.000000 -1.000000 -2.000000
# Survey:SiteC 0.000000 -3.000000 -1.000000
# Survey:SiteD -2.000000 -0.333333 -3.000000
p.val <- round(sapply(summary(mlm), function(x) x[[4]][, 4]), 6)
p.val
# Response Over Response Under Response Total
# (Intercept) 0.000009 0.001053 0.000009
# Survey 0.000001 0.009018 0.000003
# SiteB 0.008803 0.097851 0.012877
# SiteC 0.787207 0.000004 0.159973
# SiteD 0.000006 0.634454 0.000339
# Survey:SiteB 0.005617 0.057100 0.009997
# Survey:SiteC 1.000000 0.000002 0.177990
# Survey:SiteD 0.000002 0.513738 0.000284
### Responses are the same, but p-values differ, giving different
### results when filtering
mods[mods$"Pr(>|t|)" > 0.01, c("Estimate", "Pr(>|t|)")] <- NA
mods
# coef dv lev Estimate Pr(>|t|)
# 1 (Intercept) Over A -28.111111 0.000208
# 2 Survey Over A 1.500000 0.000075
# 3 (Intercept) Under A 28.666667 0.003940
# 4 Survey Under A NA NA
# 5 (Intercept) Total A -61.000000 0.000159
# 6 Survey Total A 3.000000 0.000096
# 7 (Intercept) Over B -48.777778 0.000189
# 8 Survey Over B 2.500000 0.000088
# 9 (Intercept) Under B 47.666667 0.000209
# 10 Survey Under B -2.000000 0.000340
# 11 (Intercept) Total B NA NA
# 12 Survey Total B NA NA
# 13 (Intercept) Over C -30.111111 0.000135
# 14 Survey Over C 1.500000 0.000075
# 15 (Intercept) Under C 92.333333 0.000057
# 16 Survey Under C -4.000000 0.000078
# 17 (Intercept) Total C -38.000000 0.002584
# 18 Survey Total C 2.000000 0.001134
# 19 (Intercept) Over D NA NA
# 20 Survey Over D NA NA
# 21 (Intercept) Under D 34.000000 0.000924
# 22 Survey Under D -1.333333 0.002125
# 23 (Intercept) Total D NA NA
# 24 Survey Total D NA NA
cof[p.val > 0.01] <- NA
cof
# Over Under Total
# (Intercept) -28.111111 28.666667 -61
# Survey 1.500000 -1.000000 3
# SiteB -20.666667 NA NA
# SiteC NA 63.666667 NA
# SiteD 40.777778 NA 65
# Survey:SiteB 1.000000 NA -2
# Survey:SiteC NA -3.000000 NA
# Survey:SiteD -2.000000 NA -3