aov 在一个数据框中通过多个组

问题描述 投票:0回答:1

我有一个包含多个站点的调查数据列表,我需要对其进行循环方差分析,以比较每个站点的不同测量值的调查。数据框如下所示:

>  Site Survey Over Under Total
>     A     21     4     8     6
>     A     22     5     6     8
>     A     23     6     7     7
>     B     21     4     6     8
>     B     22     7     3     3
>     B     23     4     8     9
>     C     21     7     9     4
>     C     22     2     1     5
>     C     23     5     2     6
>     D     21     7     4     3
>     D     22     2     6     4
>     D     23     1     5     2

还有很多网站、调查和重复,这些都是假数据。理想情况下,我想运行以便查看每个站点的结果、每个调查之间的每个结果(超过、低于、总计)。能够按 p 过滤<0.05 and see the results for each result in one table/dataframe.

我已经成功地使用以下方法在一个数据框中获取了 tibbles 列表:

    aov<-df%>%
  dplyr::group_by(SITE)%>%
  tidyr::nest()%>%
  dplyr::mutate(.data=.,
                aov_results=data %>% purrr::map(.x = ., .f = ~ summary(aov(TOTAL ~ SURVEY, data = .))))%>%

但是我必须使用以下方法单独询问每个人:

aov$aov_results[1]

任何帮助将不胜感激,我对 R 和统计总体来说很陌生。 预先感谢您。

r loops anova
1个回答
0
投票

我想我知道你在找什么。本质上是运行一个嵌套循环,对因变量和站点的每个组合进行回归。然而,一个密切相关的解决方案是估计完整的多元模型。这将为您提供完全相同的响应,但由于方差估计不同,因此标准误差不同,因此 p 值也不同;假设每个站点具有相同的错误分布。这两种方法都可以有效,具体取决于您想要什么。 第三种方法可能是混合模型。如果您期望站点之间存在相关性,那么这值得研究。

su <- structure(list(Site=structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
   1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
   3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L),
   levels=c("A", "B", "C", "D"), class="factor"), Survey=c(21, 21,
   21, 22, 22, 22, 23, 23, 23, 21, 21, 21, 22, 22, 22, 23, 23, 23,
   21, 21, 21, 22, 22, 22, 23, 23, 23, 21, 21, 21, 22, 22, 22, 23,
   23, 23), Over=c(4, 3, 3, 5, 5, 5, 6, 6, 7, 4, 3, 3, 7, 7, 7, 8,
   8, 9, 2, 1, 1, 3, 3, 3, 4, 4, 5, 1, 2, 3, 2, 2, 2, 1, 1, 1),
   Under=c(8, 8, 8, 7, 6, 5, 6, 6, 6, 6, 6, 6, 3, 2, 4, 2, 2, 2,
   9, 9, 9, 2, 4, 3, 1, 1, 1, 6, 6, 6, 5, 4, 5, 4, 4, 2),
   Total=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 4, 2, 3, 5, 3, 4, 6,
   3, 4, 5, 5, 6, 7, 7, 8, 9, 6, 4, 2, 5, 4, 3, 6, 4, 2)),
   row.names=c(NA, -36L), class="data.frame")

lev <- levels(su$Site)
dv <- c("Over", "Under", "Total")
eg <- expand.grid(dv=dv, lev=lev)

mods <- lapply(1:nrow(eg),
  function(i) {
      .lev <-  eg[i, "lev"]
      .dv <-  eg[i, "dv"]
      f <- paste(.dv, "~ Survey")
      .mod <- lm(f, data=su[su$Site == .lev,])
      .s <- as.data.frame(summary(.mod)$coefficients[, c(1, 4)])
      cbind(coef=rownames(.s), eg[i,], .s)
  }
)
mods <- do.call(rbind, mods)
mods[c("Estimate", "Pr(>|t|)")] <- round(mods[c("Estimate", "Pr(>|t|)")], 6)
mods
#           coef    dv lev   Estimate Pr(>|t|)
# 1  (Intercept)  Over   A -28.111111 0.000208
# 2       Survey  Over   A   1.500000 0.000075
# 3  (Intercept) Under   A  28.666667 0.003940
# 4       Survey Under   A  -1.000000 0.014246
# 5  (Intercept) Total   A -61.000000 0.000159
# 6       Survey Total   A   3.000000 0.000096
# 7  (Intercept)  Over   B -48.777778 0.000189
# 8       Survey  Over   B   2.500000 0.000088
# 9  (Intercept) Under   B  47.666667 0.000209
# 10      Survey Under   B  -2.000000 0.000340
# 11 (Intercept) Total   B -18.666667 0.185389
# 12      Survey Total   B   1.000000 0.126870
# 13 (Intercept)  Over   C -30.111111 0.000135
# 14      Survey  Over   C   1.500000 0.000075
# 15 (Intercept) Under   C  92.333333 0.000057
# 16      Survey Under   C  -4.000000 0.000078
# 17 (Intercept) Total   C -38.000000 0.002584
# 18      Survey Total   C   2.000000 0.001134
# 19 (Intercept)  Over   D  12.666667 0.050469
# 20      Survey  Over   D  -0.500000 0.079602
# 21 (Intercept) Under   D  34.000000 0.000924
# 22      Survey Under   D  -1.333333 0.002125
# 23 (Intercept) Total   D   4.000000 0.789383
# 24      Survey Total   D   0.000000 1.000000

mlm <- lm(cbind(Over, Under, Total) ~ Survey*Site, data=su)
(cof <- round(coef(mlm), 6))
#                    Over     Under      Total
# (Intercept)  -28.111111 28.666667 -61.000000
# Survey         1.500000 -1.000000   3.000000
# SiteB        -20.666667 19.000000  42.333333
# SiteC         -2.000000 63.666667  23.000000
# SiteD         40.777778  5.333333  65.000000
# Survey:SiteB   1.000000 -1.000000  -2.000000
# Survey:SiteC   0.000000 -3.000000  -1.000000
# Survey:SiteD  -2.000000 -0.333333  -3.000000

p.val <- round(sapply(summary(mlm), function(x) x[[4]][, 4]), 6)
p.val
#              Response Over Response Under Response Total
# (Intercept)       0.000009       0.001053       0.000009
# Survey            0.000001       0.009018       0.000003
# SiteB             0.008803       0.097851       0.012877
# SiteC             0.787207       0.000004       0.159973
# SiteD             0.000006       0.634454       0.000339
# Survey:SiteB      0.005617       0.057100       0.009997
# Survey:SiteC      1.000000       0.000002       0.177990
# Survey:SiteD      0.000002       0.513738       0.000284

### Responses are the same, but p-values differ, giving different
### results when filtering
mods[mods$"Pr(>|t|)" > 0.01, c("Estimate", "Pr(>|t|)")] <- NA
mods
#           coef    dv lev   Estimate Pr(>|t|)
# 1  (Intercept)  Over   A -28.111111 0.000208
# 2       Survey  Over   A   1.500000 0.000075
# 3  (Intercept) Under   A  28.666667 0.003940
# 4       Survey Under   A         NA       NA
# 5  (Intercept) Total   A -61.000000 0.000159
# 6       Survey Total   A   3.000000 0.000096
# 7  (Intercept)  Over   B -48.777778 0.000189
# 8       Survey  Over   B   2.500000 0.000088
# 9  (Intercept) Under   B  47.666667 0.000209
# 10      Survey Under   B  -2.000000 0.000340
# 11 (Intercept) Total   B         NA       NA
# 12      Survey Total   B         NA       NA
# 13 (Intercept)  Over   C -30.111111 0.000135
# 14      Survey  Over   C   1.500000 0.000075
# 15 (Intercept) Under   C  92.333333 0.000057
# 16      Survey Under   C  -4.000000 0.000078
# 17 (Intercept) Total   C -38.000000 0.002584
# 18      Survey Total   C   2.000000 0.001134
# 19 (Intercept)  Over   D         NA       NA
# 20      Survey  Over   D         NA       NA
# 21 (Intercept) Under   D  34.000000 0.000924
# 22      Survey Under   D  -1.333333 0.002125
# 23 (Intercept) Total   D         NA       NA
# 24      Survey Total   D         NA       NA

cof[p.val > 0.01] <- NA
cof
#                    Over     Under Total
# (Intercept)  -28.111111 28.666667   -61
# Survey         1.500000 -1.000000     3
# SiteB        -20.666667        NA    NA
# SiteC                NA 63.666667    NA
# SiteD         40.777778        NA    65
# Survey:SiteB   1.000000        NA    -2
# Survey:SiteC         NA -3.000000    NA
# Survey:SiteD  -2.000000        NA    -3
© www.soinside.com 2019 - 2024. All rights reserved.