如果报名日期和考试日期相同,如何获得最高分的考试

问题描述 投票:0回答:1

我有一个数据集,我正在尝试测试预注册课程的效率。我有入学日期、考试日期、科目和结果。学生分为

  1. 第 1 组 - 入学前 30 天
  2. 第 2 组 - 注册后 30 天
  3. 第 3 组 - 注册前 45 天至注册后 35 天。

每个注册 ID 应该属于一组,组 1 是第一优先级,组 2 是第二优先级,组 3 是最后优先级。然而,一个学生可能在同一天有多次考试,但我们应该捕获考试成绩最高的注册 ID。如果学生在入学后的-45天到30天范围内,则应将其写为未分类

以下是数据:

data <- data.frame(
      student_id = c("a53e83bzz", "a53e83bzz", "a53e83bzz", "2034cccc", "2034cccc", "2034cccc", "2034cccc", "202353bbbb", "202353bbbb", "1980polkfbb", "1980polkfbb"),
 registration_id = c("a53-ffe9", "a53-ffe9", "a53-ffe9", "203-ffde", "203-ffde", "203-ffde", "203-ffde", "202-ffcc", "202-ffcc", "198-ffb", "198-ffb"),
 subject = c("maths", "maths", "maths", "maths", "maths", "maths", "maths", "english", "english", "english", "english"),
enrollment_date = as.Date(c("2021-02-28", "2021-02-28", "2021-02-28", "2019-03-25", "2019-03-25", "2019-03-25", "2019-03-25", "2021-05-22", "2021-05-22", "2019-07-04", "2019-07-04"), format="%Y-%m-%d"),
test_score_category = c(0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 2),
test_date = as.Date(c("2021-02-27", "2021-02-27", "2022-07-08", "2019-02-18", "2019-03-11", "2020-04-07", "2020-04-07", "2021-06-17", "2021-06-07", "2019-03-14", "2019-03-28"), format="%Y-%m-%d"),
difference = c(-1, -1, 495, -35, -14, 379, 379, 26, 16, -112, -98)
 )

这是我在 R 中尝试过的,但我没有得到我想要的确切结果。

result <- data %>%
group_by(student_id, registration_id) %>%
arrange(student_id, group) %>%  # Prioritize by group (1 > 2 > 3)
slice_max(order_by = test_score_category) %>%  
ungroup()

以下是我期待的结果

df <- data.frame(
  student_id = c("a53e83bzz", "2034cccc", "202353bbbb", "1980polkfbb"),
  registration_id = c("a53-ffe9", "203-ffde", "202-ffcc", "198-ffb"),
  subject = c("maths", "maths", "english", "english"),
  enrollment_date = as.Date(c("2021-02-28", "2019-03-25", "2021-05-22", "2019-07-04"), format="%Y-%m-%d"),
  test_score_category = c(1, 0, 0, NA),  # Use NA for not_classified
  test_date = as.Date(c("2021-02-27", "2019-03-11", "2021-06-07", NA), format="%Y-%m-%d"),  # Use NA for not_classified
  difference = c(-1, -14, 16, NA)  # Use NA for not_classified
  )
r dplyr lubridate datediff
1个回答
0
投票

基本要求

对于按日期

difference
进行条件分组的基本任务,并按照
student_id
保持单个组分配,应执行以下操作:

library(dplyr)

data |> 
  mutate(
    group = case_when(
      -30 <= difference & difference <= 0  ~ "1",
      0    < difference & difference <= 30 ~ "2",
      -45 <= difference & difference <= 35 ~ "3",
      .default = NA
    ) |> 
      factor(level = c("1", "2", "3"))
  ) |> 
  arrange(student_id,group) |> 
  slice_head(n = 1,by = student_id)

这里使用的技巧是依靠按级别排序的因素来确保它们按照

arrange()
的要求排序,并为每个
student_id
保留一行。

详细搭配

要完全复制您分享的内容

df
:

data |> 
  mutate(
    group = case_when(
      -30 <= difference & difference <= 0  ~ "1",
      0    < difference & difference <= 30 ~ "2",
      -45 <= difference & difference <= 35 ~ "3",
      .default = NA
    ) |> factor(level = c("1", "2", "3"))
  ) |> 
  arrange(desc(student_id),group,difference,desc(test_score_category)) |> 
  slice_head(n = 1,by = student_id) |> 
  mutate(
    test_score_category = if_else(is.na(group), NA, test_score_category),
    test_date = if_else(is.na(group), NA, test_date),
    difference = if_else(is.na(group), NA, difference)
  ) |> 
  select(-group)
© www.soinside.com 2019 - 2024. All rights reserved.