我偶然发现了一个我无法解决的问题。我有一个人在两个不同的类别中,我需要将他们在每个类别中计为 0.5。这是示例数据。
请注意,ID 号 18 有一个斜杠,表示这两个部门。我需要输出如下所示。
我的第一个想法是拆分 DEPT 列,但我不确定从哪里开始,因为这两行是相同的。我的代码如下。有什么建议吗?
Grad_Applied_Formatted1 <- Grad_AppAccMat %>%
separate_wider_delim(DEPT,"/", names= c("DEPT1","DEPT2"),
too_few = "align_start",cols_remove = F)
您可以使用
tidyr::separate_longer_delim()
:
df <- data.frame(
ID = c(17, 18, 18),
DEPT = c("VZX", "EPI/ENH", "EPI/ENH"),
COUNT = c(1, 0.5, 0.5)
)
df %>%
tidyr::separate_longer_delim(DEPT, delim = '/') %>%
dplyr::distinct()
# ID DEPT COUNT
# 1 17 VZX 1.0
# 2 18 EPI 0.5
# 3 18 ENH 0.5
将
strsplit
与 row_number
一起使用
library(dplyr)
df %>%
mutate(DEPT = strsplit(DEPT, "/")[[1]][row_number()], .by = ID)
ID DEPT COUNT
1 1 BIO 1.0
2 2 EDU 1.0
3 3 PHYS 1.0
4 4 MAR 1.0
5 5 SPA 1.0
6 6 FRE 1.0
7 7 KWL 1.0
8 8 QED 1.0
9 9 XYZ 1.0
10 10 UNI 1.0
11 11 RED 1.0
12 12 KJH 1.0
13 13 LMS 1.0
14 14 OPU 1.0
15 15 RTY 1.0
16 16 GHF 1.0
17 17 VZX 1.0
18 18 EPI 0.5
19 18 ENH 0.5
df <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 18L), DEPT = c("BIO",
"EDU", "PHYS", "MAR", "SPA", "FRE", "KWL", "QED", "XYZ", "UNI",
"RED", "KJH", "LMS", "OPU", "RTY", "GHF", "VZX", "EPI/ENH", "EPI/ENH"
), COUNT = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 0.5, 0.5)), class = "data.frame", row.names = c(NA, -19L))
我首先会在
/
上拆分行(使它们之前不同),然后简单地计算 1 / n
的计数:
library(dplyr)
library(tidyr)
(sample_df <- tibble(ID = c(1:3, 3), DEPT = c(LETTERS[1:2], rep("D/E", 2L))))
# # A tibble: 4 × 2
# ID DEPT
# <dbl> <chr>
# 1 1 A
# 2 2 B
# 3 3 D/E
# 4 3 D/E
sample_df %>%
distinct() %>%
separate_rows(DEPT, sep = "/") %>%
mutate(COUNT = 1 / n(), .by = ID)
# # A tibble: 4 × 3
# ID DEPT COUNT
# <dbl> <chr> <dbl>
# 1 1 A 1
# 2 2 B 1
# 3 3 D 0.5
# 4 3 E 0.5