我正在尝试查看经理在时间 1 和时间 2 之间获得的新员工数量。我有一串在该经理下汇总的所有员工 ID。
我的下面的代码总是说有 1 名新员工,但正如你所看到的,有 2 名。我如何知道有多少新员工?不保证 id 始终保持相同的顺序,但它们始终会被“,”分隔。
library(dplyr)
library(stringr)
#First data set
mydata_q2 <- tibble(
leader = 1,
reports_q2 = "2222, 3333, 4444"
)
#Second dataset
mydata_q3 <- tibble(
leader = 1,
reports_q3 = "2222, 3333, 4444, 55555, 66666"
)
#Function to count number of new employees
calculate_number_new_emps <- function(reports_time1, reports_time2) {
time_1_reports <- ifelse(is.na(reports_time1), character(0), str_split(reports_time1, " ,\\s*")[[1]])
time_2_reports <- str_split(reports_time2, " ,\\s*")[[1]]
num_new_employees <- length(setdiff(time_1_reports, time_2_reports))
num_new_employees
}
#Join data and count number of new staff--get wrong answer
mydata_q2 %>%
left_join(mydata_q3) %>%
mutate(new_staff_count = calculate_number_new_emps(reports_q2, reports_q3))
编辑:
对于本示例,我想要的输出是 new_staff_count = 2。
这是因为第 3 季度有 2 名新员工(55555 和 66666)没有及时赶到第 2 季度。
您在 str_split 中的分隔不正确。 只需按“,”分开即可。 然后找出两个向量之间的长度差。
calculate_number_new_emps <- function(reports_time1, reports_time2) {
if (is.na(reports_time1))
{time_1_reports <-character(0)}
else
{time_1_reports <- str_split(reports_time1, ", ")[[1]]}
print(time_1_reports)
time_2_reports <- str_split(reports_time2, ", ")[[1]]
num_new_employees <- length(time_2_reports) - length(time_1_reports)
num_new_employees
}
#Join data and count number of new staff--get wrong answer
mydata_q2 %>%
left_join(mydata_q3) %>%
mutate(new_staff_count = calculate_number_new_emps(reports_q2, reports_q3))