我想对三个治疗组进行随机推理测试,并获得每个治疗组的 p 值。我有一个集群和一个分层变量,它们需要成为测试的一部分。我使用的包是ritest。
我不确定在这种情况下如何使用仪式,因为我必须分别对每个治疗臂进行测试。我似乎没有在仪式函数中使用正确的 resampvar。 resampvar 应该是我想要 p 值的治疗臂,但是当我在仪式之前运行回归模型时使用给定治疗臂的系数名称时,出现错误。
这是我正在尝试做的事情的说明。
### Data
set.seed(123)
ngroup = 4
nrep = 15
treat <- rep(c("control", "treat1", "treat2", "treat3"), each = nrep)
strata <- sample(1:3, 60, replace =TRUE)
cluster <- sample(1:5, 60, replace =TRUE)
Y<- rbinom(60, 1, 0.5)
data <- cbind(Y, treat, strata, cluster)
data <- as.data.frame(data)
data <- data %>%
mutate(treat = as.factor(treat),
Y = as.numeric (Y))
### Regression
#install.packages("remotes")
#remotes::install_github("grantmcdermott/ritest")
library(ritest)
reg <- glm(Y ~ factor(treat) + factor(strata), family = binomial(link = "logit"), data=data)
## I would like to have clusterred std. errors for the logit model and later on for the ritest.
### obtain clusterred se (source: https://rforpublichealth.blogspot.com/2014/10/easy-clustered-standard-errors-in-r.html)
get_CL_vcov<-function(model, cluster){
require(sandwich, quietly = TRUE)
require(lmtest, quietly = TRUE)
M <- length(unique(cluster))
N <- length(cluster)
K <- model$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(model),2, function(x) tapply(x, cluster, sum))
vcovCL <- dfc*sandwich(model, meat=crossprod(uj)/N)
return(vcovCL)
}
vcovCL <- get_CL_vcov(reg, data$cluster)
clse <- sqrt(diag(vcovCL))
reg_ri = ritest(reg, resampvar = "treattreat1", cluster='cluster', strata='strata', reps=500, seed = 123, verbose = TRUE)
# treattreat1 I find as the coeff. name stored in the reg output for treat1 (the first tretament arm).
如果有任何关于这种情况下仪式出了什么问题的提示,我将不胜感激。或者也许我可以为此使用另一个 R 包?
我认为我找到了如何使用多个治疗臂(包括分层和集群)进行仪式测试。我认为上面的数据设置的问题在于我将分层变量作为因子变量。这样做的原因是我必须在回归中包含分层虚拟对象,这似乎是实现此目的的一种可能方法。然而,当指定了strata时,strata变量的因子格式会使rites感到困惑。因此,我所做的是将分层变量格式化为数字,生成要在回归中使用的单独的分层虚拟变量,然后可以使用分层变量执行仪式。我的簇变量也为数字,但我认为这是数据集中的标准格式。对于strata,我不确定常见的做法是什么。
这是我的所有代码,包括在 stargazer 中导出结果。我导出 logit 的边际效应,因为它们很容易解释,并且 p 值形成检验以比较边际效应和仪式的显着性。
### Data
set.seed(123)
ngroup = 4
nrep = 15
treat <- rep(c("control", "treat1", "treat2", "treat3"), each = nrep)
#treat <- rbinom(60, 1, 0.5)
strata <- sample(1:3, 60, replace =TRUE)
cluster <- sample(1:5, 60, replace =TRUE)
Y<- rbinom(60, 1, 0.5)
data <- cbind(Y, treat, strata, cluster)
data <- as.data.frame(data)
data <- data %>%
mutate(treat = as.factor(treat),
strata = as.numeric(strata),
cluster = as.numeric(cluster),
Y = as.numeric (Y))
data <- fastDummies::dummy_cols(data, select_columns = "strata")
strata_vars <- c("strata_2", "strata_3")
### Logit with strata as numeric variable
reg1<- glm(as.formula(paste("Y ~ treat +", paste(strata_vars, collapse = "+"))), family = binomial(link = "logit"), data = data)
## Conduct the ritest
resamp_vars <- c("treattreat1", "treattreat2", "treattreat3") # Add more as needed
# Initialize a list to store results
p_values <- numeric(length(resamp_vars))
names(p_values) <- resamp_vars
for (resamp_var in resamp_vars) {
# Perform the ritest for each resampling variable
reg_ri_result <- ritest(reg1, resampvar = resamp_var, cluster = "cluster", strata = "strata", reps = 1000, verbose = TRUE)
# Extract the p-value for each treatment arm
p_value <- reg_ri_result$pval
# Store each p-value in a vector (num format)
p_values[resamp_var] <- p_value
}
### My reporting of the results with stargazer (marginal effects of the logit model with clustered std.errors + p.values of the ritest)
library(mfx)
regmfx <- logitmfx(Y ~ treat+ strata_2 + strata_3, clustervar1= "cluster", atmean = T, data=data)
p_mxf <- regmfx$mfxest[, 4]
coeff_mfx <- regmfx$mfxest[,1]
library(stargazer)
stargazer(regmfx$fit, coef = list(coeff_mfx), se=list(p_mxf), p= list(p_values), report=('vcs*p*'), omit = c("strata_2", "strata_3", "Constant"), omit.stat = c("ll", "adj.rsq", "f", "ser", "aic"))
# to replace the p= and set it to [..], check this post: [https://stackoverflow.com/questions/42995868/r-stargazer-package-eliminate-t-label-from-reported-test-statistics][1]
也许有一种更简单、更优雅的方法来做到这一点,但这就是我在这个阶段设法提出的。