我有一个小标题,下面显示了一个示例。它具有七个预测变量(V4
至V10
)和九个结果(w1
,w2
,w3
,mw
,i1
,i2
,i3
,mi
,[ C0])。我正在尝试为第2列(p2
)至第10列(w1
)
p2
[当我在vars w1 w2 w3 mw i1 i2 i3 mi p2
V4 0.084 0.017 0.061 0.054 22.800 4.570 16.700 14.700 0.367
V5 0.032 0.085 0.039 0.052 8.840 23.100 10.700 14.200 0.367
V6 0.026 0.066 0.022 0.038 7.030 18.000 6.070 10.400 0.367
V7 0.097 0.020 0.066 0.061 26.300 5.420 18.100 16.600 0.367
V8 0.048 0.071 0.043 0.054 13.100 19.300 11.800 14.700 0.367
V9 0.018 0.111 0.020 0.050 4.800 30.300 5.440 13.500 0.367
V10 0.053 0.020 0.103 0.058 14.300 5.330 28.000 15.900 0.367
V4 0.084 0.017 0.060 0.054 22.400 4.420 16.200 14.300 0.373
V5 0.032 0.072 0.036 0.047 8.630 19.300 9.760 12.500 0.373
V6 0.030 0.076 0.023 0.043 8.080 20.500 6.070 11.500 0.373
V7 0.080 0.021 0.087 0.063 21.500 5.720 23.300 16.800 0.373
V8 0.053 0.090 0.034 0.059 14.100 24.000 9.110 15.700 0.373
V9 0.016 0.101 0.025 0.048 4.410 27.100 6.790 12.800 0.373
V10 0.060 0.022 0.100 0.061 16.000 5.950 26.800 16.300 0.373
中使用group_by
变量(vars)并对三个结果运行分位数(作为测试)时,它并不能满足我的需求。它没有给我三个结果的置信区间,而是给了我一个置信区间,因为如下所示:
dplyr
简而言之,我正在寻找的是类似下表的表格,在该表格中,我获得了每个结果的置信区间。
+ group_by(vars) %>%
+ do(data.frame(t(quantile(c(.$w1, .$w2, .$w3), probs = c(0.025, 0.975)))))
# A tibble: 7 x 3
# Groups: variables [7]
variables X2.5 X97.5
1 V10 0.0202 0.103
2 V4 0.017 0.084
3 V5 0.032 0.0834
4 V6 0.0221 0.0748
5 V7 0.0201 0.0958
6 V8 0.0351 0.0876
7 V9 0.0162 0.110
任何朝正确方向的指针将不胜感激。我已经阅读过StackOverflow,但似乎找不到解决我想要做的问题的答案。
这里有两种方法。
Base R。
w1 w2 w3
vars X2.5 X97.5 vars X2.5 X97.5 vars X2.5 X97.5
V10 0.020 0.103 V10 0.020 0.103 V10 0.020 0.103
V4 0.017 0.084 V4 0.017 0.084 V4 0.017 0.084
V5 0.032 0.083 V5 0.032 0.083 V5 0.032 0.083
V6 0.022 0.075 V6 0.022 0.075 V6 0.022 0.075
V7 0.020 0.096 V7 0.020 0.096 V7 0.020 0.096
V8 0.035 0.088 V8 0.035 0.088 V8 0.035 0.088
V9 0.016 0.110 V9 0.016 0.110 V9 0.016 0.110
使用aggregate(df1[-1], list(df1[[1]]), quantile, probs = c(0.025, 0.975))
。
tidyverse
注意,在第二种方式中,输出格式是不同的,第一个分位数(library(dplyr)
df1 %>%
group_by(vars) %>%
mutate_at(vars(w1:p2), quantile, probs = c(0.025, 0.975))
)在第一行中,第二个分位数(0.025
)在最后一行中。
数据。
0.975
另一种可能性:融化/旋转为长格式;计算摘要;然后投射/旋转为宽格式
df1 <-
structure(list(vars = structure(c(2L, 3L, 4L,
5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L),
.Label = c("V10", "V4", "V5", "V6", "V7", "V8",
"V9"), class = "factor"), w1 = c(0.084, 0.032,
0.026, 0.097, 0.048, 0.018, 0.053, 0.084,
0.032, 0.03, 0.08, 0.053, 0.016, 0.06),
w2 = c(0.017, 0.085, 0.066, 0.02, 0.071, 0.111,
0.02, 0.017, 0.072, 0.076, 0.021, 0.09, 0.101,
0.022), w3 = c(0.061, 0.039, 0.022, 0.066,
0.043, 0.02, 0.103, 0.06, 0.036, 0.023, 0.087,
0.034, 0.025, 0.1), mw = c(0.054, 0.052, 0.038,
0.061, 0.054, 0.05, 0.058, 0.054, 0.047, 0.043,
0.063, 0.059, 0.048, 0.061), i1 = c(22.8, 8.84,
7.03, 26.3, 13.1, 4.8, 14.3, 22.4, 8.63, 8.08,
21.5, 14.1, 4.41, 16), i2 = c(4.57, 23.1, 18, 5.42,
19.3, 30.3, 5.33, 4.42, 19.3, 20.5, 5.72, 24, 27.1,
5.95), i3 = c(16.7, 10.7, 6.07, 18.1, 11.8, 5.44,
28, 16.2, 9.76, 6.07, 23.3, 9.11, 6.79, 26.8),
mi = c(14.7, 14.2, 10.4, 16.6, 14.7, 13.5, 15.9,
14.3, 12.5, 11.5, 16.8, 15.7, 12.8, 16.3),
p2 = c(0.367, 0.367, 0.367, 0.367, 0.367, 0.367,
0.367, 0.373, 0.373, 0.373, 0.373, 0.373, 0.373,
0.373)), class = "data.frame",
row.names = c(NA, -14L))
不幸的是,这些列的排列顺序不理想;我想不出[[quick修复(您可以按想要的顺序library(tidyverse)
df2 <- (df1
%>% pivot_longer(-vars,"outcome","value")
%>% group_by(vars,outcome)
%>% summarise(lwr=quantile(value,0.025),upr=quantile(value,0.975))
)
df2 %>% pivot_wider(names_from=outcome,values_from=c(lwr,upr))
使用变量...