如何根据 R 中每行中存在的缺失/NA 数量来计算一定数量的列中每行中有多少个 1?

问题描述 投票:0回答:1

我想创建一个新列“X11”,根据选定数量的列中有多少个 NA 有条件地对所有 1 求和。在本例中,我查看 4 个变量:X1、X2、X3 和 X4。

例如:如果有 1 个 NA,那么我想查看其余 3 个具有值的变量并计算有多少个 1。如果有 2 个 NA,那么我想查看剩余的 2 个变量并计算有多少个 1。如果我有 3 个 NA,那么我想查看剩余的 1 个变量并确定它是否为 1。如果我有全部 4 个 NA,那么这将给我 0。

我有这个数据:

df <- data.frame(replicate(10,sample(0:2, 10, rep=TRUE)))
df <- replace(df, df == 0, NA)

我的数据框如下所示:

   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1   1  1 NA  1 NA NA NA  1  1   2
2  NA  1  1 NA  2 NA  2  2 NA   1
3   1 NA  1  1 NA NA  1  2 NA   1
4   2  2  2  1  1  2  1 NA  2   2
5  NA  2 NA  2 NA  2  1 NA  1   1
6   2  2  1  1  2 NA  1  2  1   1
7   1  2 NA NA  2  1  1 NA NA   1
8   2  2 NA NA  1 NA NA  2 NA   1
9   1 NA  1  2  2  1  2 NA NA   1
10 NA  2  1 NA NA NA NA  2  2  NA

我希望我的输出看起来像这样:

   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
1   1  1 NA  1 NA NA NA  1  1   2   3
2  NA  1  1 NA  2 NA  2  2 NA   1   2
3   1 NA  1  1 NA NA  1  2 NA   1   3
4   2  2  2  1  1  2  1 NA  2   2   1
5  NA  2 NA  2 NA  2  1 NA  1   1   0
6   2  2  1  1  2 NA  1  2  1   1   2
7   1  2 NA NA  2  1  1 NA NA   1   1
8   2  2 NA NA  1 NA NA  2 NA   1   0
9   1 NA  1  2  2  1  2 NA NA   1   2
10 NA  2  1 NA NA NA NA  2  2  NA   1

这是我当前代码的示例:

vars <- c("X1", "X2", "X3", "X4")
df <- df %>%
   mutate(missing_vars = rowSums(across(vars, ~is.na(.))),
          nonmissing_vars = 7-vars)

df <- df %>%
  mutate(zero_na = case_when(missing_vars == 0 & (X1 == 2 & X2 == 2 & X3 == 2 & X4 == 2) ~ 1,
                                (missing_vars == 0 & (X1 == 1 & X2 == 2 & X3 == 2 & X4 == 2) |
                                   (X1 == 2 & X2 == 1 & X3 == 2 & X4 == 2) |
                                   (X1 == 2 & X2 == 2 & X3 == 1 & X4 == 2) |
                                   (X1 == 2 & X2 == 2 & X3 == 2 & X4 == 1)) ~ 2,
                                (missing_vars == 0 & (X1 == 1 & X2 == 1 & X3 == 2 & X4 == 2) |
                                   (X1 == 1 & X2 == 2 & X3 == 1 & X4 == 2) |
                                   (X1 == 1 & X2 == 2 & X3 == 2 & X4 == 1) |
                                   (X1 == 2 & X2 == 1 & X3 == 1 & X4 == 2) |
                                   (X1 == 2 & X2 == 2 & X3 == 1 & X4 == 1) |
                                   (X1 == 2 & X2 == 1 & X3 == 2 & X4 == 1)) ~ 3,
                                (missing_vars == 0 & (X1 == 1 & X2 == 1 & X3 == 1 & X4 == 2) |
                                   (X1 == 1 & X2 == 1 & X3 == 2 & X4 == 1) |
                                   (X1 == 1 & X2 == 2 & X3 == 1 & X4 == 1) |
                                   (X1 == 2 & X2 == 1 & X3 == 1 & X4 == 1)) ~ 4,
                                missing_vars == 0 & (X1 == 1 & X2 == 1 & X3 == 1 & X4 == 1) ~ 5))

brfss <- brfss %>%
  mutate(one_na = case_when(missing_vars == 1 & (is.na(X1) & X2 == 2 & X3 == 2 & X4 == 2) ~ 1,
                                       missing_vars == 1 & (X1 == 2 & is.na(X2) & X3 == 2 & X4 == 2) ~ 1,
                                       missing_vars == 1 & (X1 == 2 & X2 == 2 & is.na(X3) & X4 == 2) ~ 1,
                                       missing_vars == 1 & (X1 == 2 & X2 == 2 & X3 == 2 & is.na(X4)) ~ 1,
                                       missing_vars == 1 & (is.na(X1) & X2 == 1 & X3 == 2 & X4 == 2) ~ 2,
                                       missing_vars == 1 & (X1 == 1 & is.na(X2) & X3 == 2 & X4 == 2) ~ 2,
                                       missing_vars == 1 & (X1 == 1 & X2 == 2 & is.na(X3) & X4 == 2) ~ 2,
                                       missing_vars == 1 & (X1 == 1 & X2 == 2 & X3 == 2 & is.na(X4)) ~ 2,
                                       missing_vars == 1 & (is.na(X1) & X2 == 2 & X3 == 1 & X4 == 2) ~ 2,
                                       missing_vars == 1 & (X1 == 2 & is.na(X2) & X3 == 1 & X4 == 2) ~ 2,
                                       missing_vars == 1 & (X1 == 2 & X2 == 1 & is.na(X3) & X4 == 2) ~ 2,
                                       missing_vars == 1 & (X1 == 2 & X2 == 1 & X3 == 2 & is.na(X4)) ~ 2,
                                       missing_vars == 1 & (is.na(X1) & X2 == 2 & X3 == 2 & X4 == 1) ~ 2,
                                       missing_vars == 1 & (X1 == 2 & is.na(X2) & X3 == 2 & X4 == 1) ~ 2,
                                       missing_vars == 1 & (X1 == 2 & X2 == 2 & is.na(X3) & X4 == 1) ~ 2,
                                       missing_vars == 1 & (X1 == 2 & X2 == 2 & X3 == 1 & is.na(X4)) ~ 2,
                                       missing_vars == 1 & (is.na(X1) & X2 == 1 & X3 == 1 & X4 == 2) ~ 3,
                                       missing_vars == 1 & (X1 == 1 & is.na(X2) & X3 == 1 & X4 == 2) ~ 3,
                                       missing_vars == 1 & (X1 == 1 & X2 == 1 & is.na(X3) & X4 == 2) ~ 3,
                                       missing_vars == 1 & (X1 == 1 & X2 == 1 & X3 == 2 & is.na(X4)) ~ 3,
                                       missing_vars == 1 & (is.na(X1) & X2 == 2 & X3 == 1 & X4 == 1) ~ 3,
                                       missing_vars == 1 & (X1 == 2 & is.na(X2) & X3 == 1 & X4 == 1) ~ 3,
                                       missing_vars == 1 & (X1 == 2 & X2 == 1 & is.na(X3) & X4 == 1) ~ 3,
                                       missing_vars == 1 & (X1 == 2 & X2 == 1 & X3 == 1 & is.na(X4)) ~ 3,
                                       missing_vars == 1 & (is.na(X1) & X2 == 1 & X3 == 2 & X4 == 1) ~ 3,
                                       missing_vars == 1 & (X1 == 1 & is.na(X2) & X3 == 2 & X4 == 1) ~ 3,
                                       missing_vars == 1 & (X1 == 1 & X2 == 2 & is.na(X3) & X4 == 1) ~ 3,
                                       missing_vars == 1 & (X1 == 1 & X2 == 2 & X3 == 1 & is.na(X4)) ~ 3,
                                       missing_vars == 1 & (is.na(X1) & X2 == 1 & X3 == 1 & X4 == 1) ~ 4,
                                       missing_vars == 1 & (X1 == 1 & is.na(X2) & X3 == 1 & X4 == 1) ~ 4,
                                       missing_vars == 1 & (X1 == 1 & X2 == 1 & is.na(X3) & X4 == 1) ~ 4,
                                       missing_vars == 1 & (X1 == 1 & X2 == 1 & X3 == 1 & is.na(X4)) ~ 4))

我对 2 个 NA、3 个 NA、然后 4 个 NA 的每个组合重复此操作,然后对“zero_na”、“one_na”等求和以获得 X11 下值的最终计数。

但是,我目前有大约 300,000 个观测值,需要对 7 个具有不同数量的 NA、1 和 2 的不同变量执行此操作。我必须编写的组合数量非常可笑,我只是想知道是否有更有效的方法来编写此代码?

提前非常感谢!

dplyr case rstudio combinations mutate
1个回答
0
投票

试试这个:

df["X11"] = apply(df[,c(1:4)],1,\(s) sum(s==1,na.rm=T))

输出:

   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
1   1  1 NA  1 NA NA NA  1  1   2   3
2  NA  1  1 NA  2 NA  2  2 NA   1   2
3   1 NA  1  1 NA NA  1  2 NA   1   3
4   2  2  2  1  1  2  1 NA  2   2   1
5  NA  2 NA  2 NA  2  1 NA  1   1   0
6   2  2  1  1  2 NA  1  2  1   1   2
7   1  2 NA NA  2  1  1 NA NA   1   1
8   2  2 NA NA  1 NA NA  2 NA   1   0
9   1 NA  1  2  2  1  2 NA NA   1   2
10 NA  2  1 NA NA NA NA  2  2  NA   1
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.