在R中,如何找到2个字符串之间不同值的数量?

问题描述 投票:0回答:1

我正在尝试查看经理在时间 1 和时间 2 之间获得的新员工数量。我有一串在该经理下汇总的所有员工 ID。

我的下面的代码总是说有 1 名新员工,但正如你所看到的,有 2 名。我如何知道有多少新员工?不保证 id 始终保持相同的顺序,但它们始终会被“,”分隔。

library(dplyr)
library(stringr)

#First data set
mydata_q2 <- tibble(
  leader = 1,
  reports_q2 = "2222, 3333, 4444"
) 

#Second dataset
mydata_q3 <- tibble(
  leader = 1,
  reports_q3 = "2222, 3333, 4444, 55555, 66666" 
) 

#Function to count number of new employees
calculate_number_new_emps <- function(reports_time1, reports_time2) {
  time_1_reports <- ifelse(is.na(reports_time1), character(0), str_split(reports_time1, " ,\\s*")[[1]])
  time_2_reports <- str_split(reports_time2, " ,\\s*")[[1]]
  num_new_employees <- length(setdiff(time_1_reports, time_2_reports))
  num_new_employees
}

#Join data and count number of new staff--get wrong answer
mydata_q2 %>%
  left_join(mydata_q3) %>%
  mutate(new_staff_count = calculate_number_new_emps(reports_q2, reports_q3))

编辑:

对于本示例,我想要的输出是 new_staff_count = 2。

这是因为第 3 季度有 2 名新员工(55555 和 66666)没有及时赶到第 2 季度。

r string function count stringr
1个回答
1
投票

您在 str_split 中的分隔不正确。 只需按“,”分开即可。 然后找出两个向量之间的长度差。

calculate_number_new_emps <- function(reports_time1, reports_time2) {
   if (is.na(reports_time1)) 
      {time_1_reports <-character(0)}
   else 
      {time_1_reports <- str_split(reports_time1, ", ")[[1]]}
   
   print(time_1_reports)
   time_2_reports <- str_split(reports_time2, ", ")[[1]]
   num_new_employees <- length(time_2_reports) - length(time_1_reports)
   num_new_employees
}

#Join data and count number of new staff--get wrong answer
mydata_q2 %>%
   left_join(mydata_q3) %>%
   mutate(new_staff_count = calculate_number_new_emps(reports_q2, reports_q3))
© www.soinside.com 2019 - 2024. All rights reserved.