使用 R 中的数据透视表计算开始和结束状态

Question

我有一个包含学生 ID、TestingWindow 和 BenchmarkCategories 的数据表。 TestWindow 的值为“年初”或“年末”。测试数据的值为紧急干预、干预、观察、处于基准、高于基准。

这是一些示例数据。请注意，123 号学生缺少年终成绩。

学生证	ScreeningPeriodWindowName	DistrictBenchmarkCategoryName
123	新年伊始	紧急干预
456	新年伊始	干预
456	年末	在基准测试
789	新年伊始	观看中
789	年末	高于基准

我想显示学生在年初和年底之间在基准类别之间的变化。我想要一张表格来展示这一点。关键是显示个别学生的变化，而不是总数。 “有多少学生从紧急干预转为观察？”

表格应如下所示，其中 A 列是年初类别，第 1 行是年末类别。（中间单元格空白，但我认为你看到了模式）

v 开始结束>	紧急干预	干预	观看中	在基准测试	高于基准
紧急干预	紧急到紧急的计数	紧急干预计数	紧急观看次数	紧急达到基准的计数	紧急达到基准以上的计数
干预
观看中
在基准测试
高于基准	以上计数为紧急	上述干预次数	观看以上次数	以上计数到基准	以上计数

在 Excel 数据透视表中，这是微不足道的（定义行和列、计数 ID），但我很难在 R 中进行分组和汇总（或数据透视表）方面理解它。谢谢。

我尝试过pivot_wider和pivot_longer，但每个只能获取行或列标题和一组计数。

progress <- reading_data %>%
  select(studentID,TestingWindow,BenchmarkCategories) %>%
  group_by(TestingWindow,BenchmarkCategories) %>%
  summarize(count=n_distinct(studentID)) %>%
  pivot_wider(names_from=TestingWindow,values_from=count)

progress

Answer 1

您可以使用两个

pivot_wider()

调用来完成此操作：

首先
```
pivot_wider()
```
为TestingWindow中的每个值创建单独的列，例如年初和年末
为每个 StartofYear/EndofYear 配对创建计数，然后再次
```
pivot_wider()
```
，以便 EndofYear 值转向列

请注意，如果您希望所有 BenchmarkCategories 都包含在结果中，则需要执行额外的步骤，如下所述。我还提供了一个更大的示例数据集来说明结果。

首先，使用您的示例数据（修改后的列名称以匹配您的代码管道列名称）：

library(dplyr)
library(tidyr)

reading_data <- structure(list(studentID = c(123L, 456L, 456L, 789L, 789L), TestingWindow = c("StartofYear", 
"StartofYear", "EndofYear", "StartofYear", "EndofYear"), BenchmarkCategories = c("Urgent Intervention", 
"Intervention", "At Benchmark", "On Watch", "Above Benchmark"
)), class = "data.frame", row.names = c(NA, -5L))

progress <- reading_data |>
  pivot_wider(names_from = TestingWindow,
              values_from = BenchmarkCategories) |>
  count(StartofYear, EndofYear, .drop = FALSE) |>
  pivot_wider(names_from = EndofYear,
              values_from = n,
              values_fill = 0) |>
  rename(`v start end >` = StartofYear)

progress
# # A tibble: 3 × 4
#   `v start end >`     `At Benchmark` `Above Benchmark`  `NA`
#   <chr>                        <int>             <int> <int>
# 1 Intervention                     1                 0     0
# 2 On Watch                         0                 1     0
# 3 Urgent Intervention              0                 0     1

未表示的对将被丢弃，因此并非所有基准类别都存在。此外，NA 列还对不完整的记录进行计数，例如学生ID == 123.

如果您想要结果中的所有基准类别：

# Create vector of all unique BenchmarkCategories
bmc <- unique(reading_data$BenchmarkCategories)

progress <- reading_data |>
  pivot_wider(names_from = TestingWindow,
              values_from = BenchmarkCategories) |>
  count(StartofYear, EndofYear, .drop = FALSE) |>
  complete(StartofYear = bmc,
           EndofYear = bmc,
           fill = list(n = 0)) |>
  pivot_wider(names_from = EndofYear,
              values_from = n,
              values_fill = 0) |>
  rename(`v start end >` = StartofYear)
  

progress
# # A tibble: 5 × 7
#   StartofYear         `Above Benchmark` `At Benchmark` Intervention `On Watch` `Urgent Intervention`  `NA`
#   <chr>                           <int>          <int>        <int>      <int>                 <int> <int>
# 1 Above Benchmark                     0              0            0          0                     0     0
# 2 At Benchmark                        0              0            0          0                     0     0
# 3 Intervention                        0              1            0          0                     0     0
# 4 On Watch                            1              0            0          0                     0     0
# 5 Urgent Intervention                 0              0            0          0                     0     1

使用更大示例数据集的示例：

set.seed(42)
reading_data <- data.frame(
  studentID = c(123, rep(456:789, each = 2)),
  TestingWindow = c("StartofYear", rep(c("StartofYear", "EndofYear"), 334)),
  BenchmarkCategories = sample(c("Urgent Intervention","Intervention",
                                 "At Benchmark", "On Watch", "Above Benchmark"),
                               669, replace = TRUE))

progress <- reading_data |>
  pivot_wider(names_from = TestingWindow,
              values_from = BenchmarkCategories) |>
  count(StartofYear, EndofYear, .drop = FALSE) |>
  # complete(StartofYear = bmc,
  #          EndofYear = bmc,
  #          fill = list(n = 0)) |>
  pivot_wider(names_from = EndofYear,
              values_from = n,
              values_fill = 0) |>
  rename(`v start end >` = StartofYear)

progress
# # A tibble: 5 × 7
#   `v start end >`     `Above Benchmark` `At Benchmark` Intervention `On Watch` `Urgent Intervention`  `NA`
#   <chr>                           <int>          <int>        <int>      <int>                 <int> <int>
# 1 Above Benchmark                    12              8           13         13                     8     0
# 2 At Benchmark                       14              9           11         12                    15     0
# 3 Intervention                       18             17           18         15                    20     0
# 4 On Watch                           14             10           14          7                    15     0
# 5 Urgent Intervention                11             21           14         17                     8     1

使用 R 中的数据透视表计算开始和结束状态

问题描述投票：0回答：1

1个回答

最新问题

使用 R 中的数据透视表计算开始和结束状态

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1