对于一项研究,我需要在R.
中的
mice
软件包的帮助下为100个不完整的数据集生成五个完整的数据集。
该代码正常工作(当您拥有
df1
数据集时):
df1_imp <- mice(df1, m = 5, method = 'logreg', print = F)
然后,我们可以访问如下所示的完整数据集(5):
dataset1 <- complete(df1_imp, 1)
dataset2 <- complete(df1_imp, 2)
dataset3 <- complete(df1_imp, 3)
dataset4 <- complete(df1_imp, 4)
dataset5 <- complete(df1_imp, 5)
-fine。但是,我有100个不完整的数据集。每个将产生5个完整的数据集(总计500个)。如何查看这500个数据集?因为我要分析它们。
[DFS]我的数据集列表(每组必须产生5个完整的数据集,3x5 = 15)
list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1,
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA,
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA,
1, 0, 0, 0, 1, 1, 0), dim = 6:5))
complete
,选择
action='all'
和
include=FALSE
排除未输入的数据集。对于仿真研究,您可能需要指定
seed
。
> library(mice)
> seed. <- 42
> lapply(raw_data, mice, m=5, method='pmm', seed=seed., printFlag=FALSE) |>
+ lapply(complete, action='all', include=FALSE)
[[1]]
$`1`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 1 1
6 0 0 1 0 1
$`2`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 1 1
6 0 0 1 0 1
$`3`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 1 1
6 0 0 1 0 1
$`4`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 1 1
6 0 0 1 0 1
$`5`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 0 1
6 0 0 1 0 1
attr(,"class")
[1] "mild" "list"
[[2]]
$`1`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 0 1
5 1 1 0 0 1
6 0 0 1 1 1
$`2`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 0 1
5 1 1 0 0 1
6 0 0 1 1 1
$`3`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 0 1
5 1 1 0 0 1
6 0 0 1 1 1
$`4`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 0 1
5 1 1 0 0 1
6 0 0 1 1 1
$`5`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 1 1
5 1 1 0 0 1
6 0 0 1 1 1
attr(,"class")
[1] "mild" "list"
[[3]]
$`1`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
$`2`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
$`3`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
$`4`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
$`5`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
attr(,"class")
[1] "mild" "list"
Warning messages:
1: Number of logged events: 30
2: Number of logged events: 30
3: Number of logged events: 2
注意,在您的示例中,第三个数据集的归类由于共线性而失败。您可以通过设置
printFlag=TRUE
而不将管道调查为
complete
.进行调查。data:
> dput(raw_data)
list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1,
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA,
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA,
1, 0, 0, 0, 1, 1, 0), dim = 6:5))