R专家晚上好,
我正在使用当前项目研究中的所有变量在 R 中生成 Spearman 相关矩阵。为了与我正在进行的其他分析保持一致,我想使用围绕每个 Spearman 系数进行 10,000 次迭代来构建 BCa 自举置信区间。但是,我还没有找到有效且可靠的方法来实现这一目标。
我使用以下代码创建了带有系数和 p 值的 Spearman 相关矩阵。请注意,为了这个示例,我生成了随机数据。
library(psych)
library(rstatix)
df_test <- data.frame(
recall=c(45, 1, 32, 17, 79, 15, 75, 100, 43, 80, 74, 91, 60, 54, 67, 26, 97, 53, 51, 30),
recog=c(90, 29, 93, 73, 34, 68, 78, 56, 92, 85, 35, 81, 7, 58, 4, 52, 82, 31, 6, 23),
hits=c(77, 89, 44, 8, 70, 96, 76, 62, 95, 27, 49, 12, 16, 28, 83, 2, 36, 10, 61, 86),
misses=c(59, 78, 14, 44, 86, 61, 80, 72, 25, 93, 5, 42, 64, 95, 73, 54, 7, 67, 11, 53),
false_alarms=c(32, 34, 55, 41, 77, 76, 89, 36, 12, 100, 70, 62, 47, 81, 90, 63, 13, 83, 79, 38)
)
df_test.cor <- cor(df_test, y = NULL, use = "pairwise.complete.obs",
method = c("spearman"))
#Get coefficients
df_test.mat <- cor_mat(
df_test,
vars = NULL,
method = "spearman",
alternative = "two.sided",
conf.level = 0.95
)
df_test.mat
#Get sig values
df_test.pmat <- cor_pmat(
df_test,
vars = NULL,
method = "spearman",
alternative = "two.sided",
conf.level = 0.95
)
df_test.pmat
ggcorrplot(df_test.mat,
type = "lower", method = "circle", colors = c("turquoise3", "khaki1", "violetred3"), p.mat = df_test.pmat)
cor.ci(df_test, keys = NULL, n.iter = 10000, p = 0.050, poly = FALSE, method = "spearman")
结果如下:
2。可视化
3.使用
psych
包引导 95% CI
这些置信区间存在许多问题,列举如下:
cor_pmat()
包中的
rstatix
命令生成的值不对应。
如何使用 BCa 方法在整个相关矩阵中围绕 Spearman 系数构建自举置信区间,精度高达小数点后 4 位?
## Using bivariate normal sampling distribution
rho <- .5
sigma <- matrix(rho, ncol=2, nrow=2)
diag(sigma) <- 1
## True Spearman rank correlation + coverage function
GROUND_TRUTH <- (6/pi)*asin(rho/2)
covers <- function(ci) (ci[1] <= GROUND_TRUTH) & (GROUND_TRUTH <= ci[2])
## Function to bootstrap a single Spearman correlation
bootfun <- function(data, i) {
r <- cor(data[i,], method = "spearman")
r[lower.tri(r)]
}
## Generate one sample & bootstrap it
## Returns whether the obtained intervals cover the true parameter
one_boot <- function(seed, n=100) {
set.seed(seed)
data <- mvtnorm::rmvnorm(n, sigma = sigma)
b <- boot::boot(data, bootfun, R=1E4)
ci <- boot::boot.ci(b, type = c("norm", "basic", "perc", "bca"))
c("norm" = covers(ci$normal[2:3]), "basic"= covers(ci$basic[4:5]),
"perc" = covers(ci$percent[4:5]), "bca" = covers(ci$bca[4:5]))
}
## Generate many samples & bootstrap each of them
## Returns coverage properties of each interval
future::plan("multisession", workers=4)
many_boots <- future.apply::future_vapply(seq_len(1E4), one_boot,
logical(4), future.seed = TRUE) |>
matrixStats::rowMeans2()
#> norm basic perc bca
#> 0.9399 0.9297 0.9535 0.9544
如您所见,原始百分位数至少具有与 BCa 区间一样好的覆盖特性,并且这是在理论上完美的标准法线(其中加速因子也是精确的)下。如果您的抽样分布具有不太明确的偏度或不平滑,则此间隔将比其他替代方案表现更差。
作为奖励,以下是如何通过重采样(在零值下!)获得 Spearman 相关性的单个两侧P 值:
permute_spearman <- function(x, y, iters=1E4) {
t0 <- cor(x, y, method="spearman")
mean(vapply(seq_len(iters), \(d) {
abs(cor(x, y[sample(seq_along(y))], method="spearman")) > t0
}, logical(1)))
}
boot 包计算 BCa 间隔。
首先,为数据子集i
的 Spearman 相关性创建一个统计函数:
spearman <- function(d, i) {
rho <- cor(d[i, ], method = "spearman")
rho[lower.tri(rho)]
}
然后,使用 boot()
获取引导样本:
set.seed(42)
boot_out <- boot::boot(df_test, spearman, R = 10000)
boot_out
#>
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#>
#>
#> Call:
#> boot::boot(data = df_test, statistic = spearman, R = 10000)
#>
#>
#> Bootstrap Statistics :
#> original bias std. error
#> t1* 0.09172932 -0.004161858 0.2181451
#> t2* -0.22105263 0.008094763 0.2623494
#> t3* 0.13684211 -0.009041776 0.2462778
#> t4* 0.22406015 -0.004986228 0.2420079
#> t5* -0.05413534 0.003039217 0.2235030
#> t6* -0.18345865 0.006896525 0.2341026
#> t7* -0.29022556 0.006665581 0.2469688
#> t8* 0.08872180 -0.004816326 0.1924127
#> t9* -0.20902256 0.011034138 0.2315252
#> t10* 0.48120301 -0.021141040 0.2029307
最后,用 boot.ci()
计算置信区间。
index
指定 您想要输出统计数据的哪个元素的间隔:
boot::boot.ci(boot_out, type = "bca", index = 10)
#> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#> Based on 10000 bootstrap replicates
#>
#> CALL :
#> boot::boot.ci(boot.out = boot_out, type = "bca", index = 10)
#>
#> Intervals :
#> Level BCa
#> 95% (-0.0322, 0.7649 )
#> Calculations and Intervals on Original Scale
向量 index
具有不同的含义(第二个元素被解释为给出第一个元素的方差),因此要获取所有 CI 您需要显式迭代,例如使用
lapply()
:
lapply(seq_along(boot_out$t0), function(i) {
boot::boot.ci(boot_out, type = "bca", index = i)
})
如果您只需要输出中的间隔,则可以从 $bca
元素中提取它们:
sapply(seq_along(boot_out$t0), function(i) {
boot::boot.ci(boot_out, type = "bca", index = i)$bca[, 4:5]
})
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> -0.3439443 -0.6648910 -0.3743482 -0.3385006 -0.4651028 -0.6057627 -0.7137479
#> 0.5040802 0.3668189 0.5936317 0.6228635 0.4083453 0.3079268 0.2545514
#> [,8] [,9] [,10]
#> -0.2991324 -0.6217852 -0.0321938
#> 0.4506211 0.2709281 0.7648544