基于 glm 的聚类 se 的 p 值

Question

我想从 glm 中检索具有聚类标准误差的 p 值。我使用一个函数来计算聚类标准误差，然后使用它们生成模型结果的摘要。当我对从模型结果汇总中获得的 p 值进行手动计算时，p 值与我手动计算的 p 值不相似。

我按照三个步骤手动计算 p 值。

估计聚类se并将它们存储为向量。
根据公式计算z值：估计系数/聚类se。
根据 z 值检索两侧 p 值。我使用正态分布，因为我有逻辑回归。

下面是我的测试代码。

data(mtcars)

cluster <- sample(1:3, 32, replace =TRUE)

mtcars <-cbind(mtcars, cluster)

model <- glm(am ~ wt + hp, data = mtcars, family = binomial(link = "logit"))

# Extract coefficients

coef_estimates <- coef(model)

# function for the clusterred se
get_CL_vcov<-function(model, cluster){
  require(sandwich, quietly = TRUE)
  require(lmtest, quietly = TRUE)
  
  #calculate degree of freedom adjustment
  M <- length(unique(cluster))
  N <- length(cluster)
  K <- model$rank
  dfc <- (M/(M-1))*((N-1)/(N-K))
  
  #calculate the uj's
  uj  <- apply(estfun(model),2, function(x) tapply(x, cluster, sum))
  
  #use sandwich to get the var-covar matrix
  vcovCL <- dfc*sandwich(model, meat=crossprod(uj)/N)
  return(vcovCL)
}

model_vcov <- get_CL_vcov(model, mtcars$cluster)

model_cse <- sqrt(diag(model_vcov))

# z-values for each coefficient
z_values <- coef_estimates / model_cse 

# two-tailed p-values using normal distribution for z-values
p_values <- 2 * pnorm(-abs(z_values))


#### Compare results for no clusterred se, clusterred se (p-values not manually corrected) and clusterred se (p-values manually calculated)
#summary(model_cse)

summary(model)

summary(model, model_cse)

p_values

我不确定问题是否出在我计算聚类标准误差或 p 值的方式上。我的配方是否需要进行一些调整或者我仍然应该考虑的事情？我使用 n=32 的样本只是为了说明问题，但我不认为这就是我有不同 p 值的原因。

Answer 1

您需要做的就是将

glm()

的输出提供给

lmtest::coeftest()

并提供 VCOV 矩阵来获取统计数据。

lmtest::coeftest(model, vcov = model_vcov)

这会使用您计算的 VCOV 矩阵生成测试。您还可以使用

sandwich

来计算

coeftest()

内的 VCOV 矩阵：

lmtest::coeftest(model, vcov = sandwich::vcovCL, cluster = ~cluster, type = "HC1")

它们产生相同的值，等于您手动计算的 p 值。

语法

summary(model, model_cse)

无效；

summary()

的第二个参数是 GLM 的离散参数，而不是标准误差估计。

基于 glm 的聚类 se 的 p 值

问题描述投票：0回答：1

1个回答

最新问题

基于 glm 的聚类 se 的 p 值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1