嵌套随机效应中缺少观察结果

问题描述 投票:0回答:1

我正在尝试分析从多个位置的树苗收集的高度数据。在每个位置,在复制块中应用不同的处理。其中两个地点缺少一种幼苗(所有种植的幼苗都死亡)。数据如下:

data <- data.frame(location = rep(c("A", "B", "C", "D"), each = 8),
                   species = rep(c("S1", "S2", "S3", "S4"), times = 8),
                   trt =  rep(c("t1", "t2", "t3", "t4"), times = 4), 
                   block = c(1, 1, 1, 1, 2, 2, 2, 2),
                   height = rnorm(32) + 10)

data <- data[!(data$location == "A" & data$species == "S4"), ]
data <- data[!(data$location == "B" & data$species == "S4"), ]

data

当前使用的模型声明是:

mod <- lmer(height ~ species * trt + (1|location/block), 
            data = data, REML = FALSE)

我的问题是,在几个位置(位置 A 和 B 处的物种 S4)缺少一个物种,这是一个有效的模型,还是我通过不平衡的实验设计向模型系数引入了偏差?我对研究“位置”和“街区”之间的差异不感兴趣,但我有兴趣研究处理类型和物种之间的差异。 忽略运行此特定模型时出现的警告,因为当我在真实数据上运行它时,它们不会出现。

我已经查看了有关堆栈溢出和交换的许多响应,但尚未找到有关此问题的问题。

r experimental-design
1个回答
0
投票

简短的回答:是的,至少对于所讨论的物种而言。

长答案:“失踪”是什么意思?

为了回答这个问题,我认为进行模拟和一些场景来解释为什么数据可能会丢失以及人们可以采取什么措施是合适的。

  • 案例0:一切都按计划进行
  • 案例1:数据丢失。现场技术人员戴夫(Dave)留下的笔记无人看管,其中一些笔记随风飘扬。我们不知道某个物种在某个地点的高度是多少。
  • 案例2:植物未能发芽。无论出于何种原因,其中两个地点的其中一个物种未能发芽。可能不是侥幸,所以我们将
    height
    记录为零。
  • 案例三:树苗失踪。可能是一只鹿穿过栅栏吃掉了它们,但我们不知道。最好完全删除受影响的物种,因为我们不能将部分缺失归因于任何事情。

以下是四种场景的模拟和结果。 从摘要中可以看出,除了 S4 之外,估计值是相同的。与案例 0 相比,案例 1 的估计值更高,因为缺失的数据位于会产生较短幼苗的位置/块中。在案例 2 中,它显然要低得多,因为我们将失败的发芽编码为零。在情况 3 中,没有 S4。 其他物种和处理估计不受影响,而 S4 相互作用受到影响。 S4 的标准误差受到影响,在情况 1 中是因为数量减少,在情况 2 中是因为方差增加。 你应该做什么主要取决于你觉得你可以证明什么

data0 <- expand.grid(
  block=c("1", "2"),
  location=c("A", "B", "C", "D"),
  species=c("S1", "S2", "S3", "S4"),
  trt=c("t1", "t2", "t3", "t4"),
  rep=1:3
)

set.seed(1)
data0$height <- with(lapply(data0, as.numeric), {
      10 + 
      0.5*block +
      0.5*location + 
      0.5*species + 
      0.5*trt + 
      0.5*species*trt +
      0.5*block*location +
      rnorm(nrow(data0), 0, 2)
})

# Sanity check with a regular linear model.
# Estimates for the fixed effects should be the same as with lmer
summary(lmmod <- lm(height ~ location/block + species*trt, data=data0))
#
# Coefficients:
#                  Estimate Std. Error t value  Pr(>|t|)    
# ...snip
# speciesS2         0.89170    0.55406  1.6094 0.1084032    
# speciesS3         2.37137    0.55406  4.2800 2.397e-05 ***
# speciesS4         2.99510    0.55406  5.4058 1.175e-07 ***
# trtt2             1.63266    0.55406  2.9467 0.0034201 ** 
# trtt3             2.45394    0.55406  4.4290 1.256e-05 ***
# trtt4             2.39836    0.55406  4.3287 1.945e-05 ***
# speciesS2:trtt2   0.75106    0.78355  0.9585 0.3384389    
# speciesS3:trtt2   1.20322    0.78355  1.5356 0.1255136    
# ...snip

data3 <- data2 <- data1 <- data0

# Height data for S4 at loc A and B randomly missing 
# (a gust of wind took the notes)
data1 <- data1[!(data1$location == "A" & data1$species == "S4"), ]
data1 <- data1[!(data1$location == "B" & data1$species == "S4"), ]

# S4 at loc A and B failed to sprout, height recorded as 0
data2[(data2$location == "A" & data2$species == "S4"), "height"] <- 0
data2[(data2$location == "B" & data2$species == "S4"), "height"] <- 0

# Height data for S4 at loc A and B missing for unknown reasons
# (maybe a deer came over the fence and ate the seedlings?)
# Removing S4 altogether, just in case
data3 <- data3[data3$species != "S4", ]

mod0 <- lmer(height ~ species * trt + (1|location/block), data=data0, REML=FALSE)
mod1 <- lmer(height ~ species * trt + (1|location/block), data=data1, REML=FALSE)
mod2 <- lmer(height ~ species * trt + (1|location/block), data=data2, REML=FALSE)
mod3 <- lmer(height ~ species * trt + (1|location/block), data=data3, REML=FALSE)

summary(mod0, correlation=FALSE)
# ...snip
# Random effects:
#  Groups         Name        Variance Std.Dev.
#  block:location (Intercept) 1.7237   1.3129  
#  location       (Intercept) 1.0285   1.0142  
#  Residual                   3.5368   1.8806  
# Number of obs: 384, groups:  block:location, 8; location, 4
# 
# Fixed effects:
#                  Estimate Std. Error        df t value  Pr(>|t|)    
# (Intercept)      15.08784    0.78738   6.61706 19.1622 4.806e-07 ***
# speciesS2         0.89170    0.54289 376.00001  1.6425  0.101323    
# speciesS3         2.37137    0.54289 376.00001  4.3680 1.623e-05 ***
# speciesS4         2.99510    0.54289 376.00001  5.5169 6.432e-08 ***
# trtt2             1.63266    0.54289 376.00001  3.0073  0.002813 ** 
# trtt3             2.45394    0.54289 376.00001  4.5201 8.292e-06 ***
# trtt4             2.39836    0.54289 376.00001  4.4177 1.306e-05 ***
# speciesS2:trtt2   0.75106    0.76777 376.00001  0.9782  0.328586    
# speciesS3:trtt2   1.20322    0.76777 376.00001  1.5672  0.117915    
# ...snip

summary(mod1, correlation=FALSE)
# ...snip
# Random effects:
#  Groups         Name        Variance Std.Dev.
#  block:location (Intercept) 1.87953  1.3710  
#  location       (Intercept) 0.61183  0.7822  
#  Residual                   3.56989  1.8894  
# Number of obs: 336, groups:  block:location, 8; location, 4
# 
# Fixed effects:
#                  Estimate Std. Error        df t value  Pr(>|t|)    
# (Intercept)      15.08784    0.73256   7.20014 20.5960 1.153e-07 ***
# speciesS2         0.89170    0.54543 327.99470  1.6349 0.1030372    
# speciesS3         2.37137    0.54543 327.99470  4.3477 1.838e-05 ***
# speciesS4         3.60631    0.67694 328.52650  5.3274 1.853e-07 ***
# trtt2             1.63266    0.54543 327.99470  2.9934 0.0029688 ** 
# trtt3             2.45394    0.54543 327.99470  4.4991 9.486e-06 ***
# trtt4             2.39836    0.54543 327.99470  4.3972 1.484e-05 ***
# speciesS2:trtt2   0.75106    0.77135 327.99470  0.9737 0.3309272    
# speciesS3:trtt2   1.20322    0.77135 327.99470  1.5599 0.1197500    
# ...snip

summary(mod2, correlation=FALSE)
# ...snip
# Random effects:
#  Groups         Name        Variance Std.Dev.
#  block:location (Intercept)  1.1019  1.0497  
#  location       (Intercept) 13.9403  3.7337  
#  Residual                   26.2045  5.1190  
# Number of obs: 384, groups:  block:location, 8; location, 4
# 
# Fixed effects:
#                  Estimate Std. Error        df t value  Pr(>|t|)    
# (Intercept)      15.08784    2.17133   6.52184  6.9487 0.0003051 ***
# speciesS2         0.89170    1.47774 376.00001  0.6034 0.5465924    
# speciesS3         2.37137    1.47774 376.00001  1.6047 0.1093926    
# speciesS4        -5.21246    1.47774 376.00001 -3.5273 0.0004717 ***
# trtt2             1.63266    1.47774 376.00001  1.1048 0.2699379    
# trtt3             2.45394    1.47774 376.00001  1.6606 0.0976263 .  
# trtt4             2.39836    1.47774 376.00001  1.6230 0.1054291    
# speciesS2:trtt2   0.75106    2.08984 376.00001  0.3594 0.7195083    
# speciesS3:trtt2   1.20322    2.08984 376.00001  0.5757 0.5651289    
# ...snip

summary(mod3, correlation=FALSE)
# ...snip
# Random effects:
#  Groups         Name        Variance Std.Dev.
#  block:location (Intercept) 1.70673  1.30642 
#  location       (Intercept) 0.71358  0.84474 
#  Residual                   3.63333  1.90613 
# Number of obs: 288, groups:  block:location, 8; location, 4
# 
# Fixed effects:
#                  Estimate Std. Error        df t value  Pr(>|t|)    
# (Intercept)      15.08784    0.73697   7.20466 20.4728 1.194e-07 ***
# speciesS2         0.89170    0.55025 279.99999  1.6205 0.1062447    
# speciesS3         2.37137    0.55025 279.99999  4.3096 2.267e-05 ***
# trtt2             1.63266    0.55025 279.99999  2.9671 0.0032657 ** 
# trtt3             2.45394    0.55025 279.99999  4.4597 1.190e-05 ***
# trtt4             2.39836    0.55025 279.99999  4.3587 1.840e-05 ***
# speciesS2:trtt2   0.75106    0.77817 279.99999  0.9652 0.3353006    
# speciesS3:trtt2   1.20322    0.77817 279.99999  1.5462 0.1231826    
# ...snip
© www.soinside.com 2019 - 2024. All rights reserved.