如何匹配多个逗号分隔的字符串并返回另一列

问题描述 投票:0回答:1

My

input1
包含两个不同的名称类别及其相应的名称变体,以分号 (;) 分隔。
NAME2
NAME1
的子类别。

input1 <- structure(list(NAME1 = c("A", "A", "A", "A", "J", "J", "M", "N"
), NAME1_VAR = c("AA; aa", "AA; aa", "AA; aa", "AA; aa", "JJ; jj", 
"JJ; jj", "MM; mmm; mm", ""), NAME2 = c("B", "D", "E", "I", "KL", 
"L", "", ""), NAME2_VAR = c("CB; CCB", "ED; EED", "", "ICH", 
"LK", "", "", "")), row.names = c(NA, -8L), class = c("data.table", 
"data.frame"))

head(input1)
    NAME1 NAME1_VAR  NAME2 NAME2_VAR
   <char>    <char> <char>    <char>
1:      A    AA; aa      B   CB; CCB
2:      A    AA; aa      D   ED; EED
3:      A    AA; aa      E          
4:      A    AA; aa      I       ICH
5:      J    JJ; jj     KL        LK
6:      J    JJ; jj      L  
        

我的

input2
包含ID和不同的
OLD_NAME

input2 <- structure(list(ID = 1:7, OLD_NAME = c("mmm", "M", "ED", "B", 
"A", "N", "LK")), row.names = c(NA, -7L), class = c("data.table", 
"data.frame"))

head(input2)
      ID OLD_NAME
   <int>   <char>
1:     1      mmm
2:     2        M
3:     3       ED
4:     4        B
5:     5        A
6:     6        N

这些旧名称需要与

NAME2_VAR
AND
NAME2
中的任何一个匹配并返回
NAME2
,与
NAME1
相同。仍然考虑到
NAME2
中的名称是
NAME1
中的子类别,因此对于某些名称来说,没有
NAME2
。 输出应如下所示:

 head(output)
      ID OLD_NAME  NAME1  NAME2
   <int>   <char> <char> <char>
1:     1      mmm      M       
2:     2        M      M       
3:     3       ED      A      D
4:     4        B      A      B
5:     5        A      A       
6:     6        N      N       

我已经尝试取消嵌套列并尝试按 NAME1 进行分组,但我还没有真正做到这一点。任何帮助将不胜感激!

r tidyverse
1个回答
0
投票
input2 %>%
   left_join(input1 %>%
   pivot_longer(ends_with('VAR'), values_to = 'OLD_NAME', names_to = NULL) %>% 
   separate_longer_delim(OLD_NAME, delim = regex('\\W+'))) %>%
   mutate(NAME2 = ifelse(is.na(NAME2), 
                 input1$NAME2[match(OLD_NAME, input1$NAME2)], NAME2),
          NAME1 =  ifelse(is.na(NAME1), 
                   input1$NAME1[match(NAME2, input1$NAME2)], NAME1), 
          NAME1 = ifelse(is.na(NAME1), OLD_NAME, NAME1))

      ID OLD_NAME  NAME1  NAME2
   <int>   <char> <char> <char>
1:     1      mmm      M       
2:     2        M      M   <NA>
3:     3       ED      A      D
4:     4        B      A      B
5:     5        A      A   <NA>
6:     6        N      N   <NA>
7:     7       LK      J     KL
© www.soinside.com 2019 - 2024. All rights reserved.