如何使用 tidyverse 在 R 中转换和旋转相当混乱的表格?

问题描述 投票:0回答:1

我有一个包含大约 2000 个条目的表,其中包含

names
positions
field of expertise
addresses of professors
。该表格非常混乱,我正在努力寻找一种程序化方式将其转换并转换为整洁格式。

我的目标是创建一个整洁的表格,至少包含以下列:

names
positions
field of expertise
email address


这是数据示例:

数据
第 1 人,医学博士 A
A副教授 A
项目助理,A A
FGP 运营高级医疗总监 A
UMMG 门诊手术 A
驻留计划 A
核心教育领导 A
[电子邮件受保护] A
不适用 A
2 号人,医学博士 B
2 人 B
B 临床助理教授 B
医学生实习和核心教育领导 B
[电子邮件受保护] B
不适用 B
实验服 B
第 3 人,医学博士 B
B 临床助理教授 B
[电子邮件受保护] B
不适用 B
实验服 B
4 号人,医学博士 B
B教授 B
B教授 B
[电子邮件受保护] B
不适用 B
第 5 人,医学博士 C
5 人 C
C教授 C
C教授 C
副主席,质量 C
泌尿外科及服务主任 C
[电子邮件受保护] C
132-547-1321 C
不适用 C

这是

tibble
代码(重现):

tibble::tribble(
                                                  ~data, ~field,
                                         "Person 1, MD",    "A",
                             "Associate Professor of A",    "A",
                                 "Program Associate, A",    "A",
              "Senior Medical Director, FGP Operations",    "A",
                              "UMMG Ambulatory Surgery",    "A",
                                    "Residency Program",    "A",
                                "Core Educational Lead",    "A",
                                     "[email protected]",    "A",
                                                     NA,    "A",
                                         "Person 2, MD",    "B",
                                             "Person 2",    "B",
                    "Clinical Assistant Professor of B",    "B",
  "Medical Student Clerkship and Core Educational Lead",    "B",
                                     "[email protected]",    "B",
                                                     NA,    "B",
                                              "labcoat",    "B",
                                         "Person 3, MD",    "B",
                    "Clinical Assistant Professor of B",    "B",
                                     "[email protected]",    "B",
                                                     NA,    "B",
                                              "labcoat",    "B",
                                         "Person 4, MD",    "B",
                                       "Professor of B",    "B",
                                       "Professor of B",    "B",
                                     "[email protected]",    "B",
                                                     NA,    "B",
                                         "Person 5, MD",    "C",
                                             "Person 5",    "C",
                                       "Professor of C",    "C",
                                       "Professor of C",    "C",
                             "Associate Chair, Quality",    "C",
              "Department of Urology and Service Chief",    "C",
                                     "[email protected]",    "C",
                                         "132-547-1321",    "C",
                                                     NA,    "C"
  )
r tidyverse pivot-table tidyr tabular
1个回答
0
投票

逐步的方法:

library(dplyr)
library(tidyr)
library(stringr)

df_wide <- df |>
  mutate(names = case_when(str_detect(data, ", MD$") ~ "name",
                           str_detect(data, "@") ~ "email",
                           .default = NA)) |>
  mutate(data = str_remove(data, ", MD$")) |>
  mutate(tmp_start = cumsum(!is.na(names) & names == "name"),
         tmp_end = lag(cumsum(!is.na(names) & names == "email"), default = 0),
         id = if_else(tmp_start == tmp_end, NA, tmp_start)) |>
  group_by(id) |>
  fill(id, .direction = "downup") |>
  select(-starts_with("tmp")) |>
  filter(!is.na(id)) |>
  filter(!duplicated(data, fromFirst = TRUE)) |>
  mutate(names = if_else(is.na(names), paste0("position", 1:n() - 1), names)) |>
  pivot_wider(names_from = names,
              values_from = data) |>
  ungroup() |>
  select(-id)

data.frame(df_wide)
# field     name                         position1                                           position2                               position3               position4         position5             position6            email
#     A Person 1          Associate Professor of A                                Program Associate, A Senior Medical Director, FGP Operations UMMG Ambulatory Surgery Residency Program Core Educational Lead [email protected]
#     B Person 2 Clinical Assistant Professor of B Medical Student Clerkship and Core Educational Lead                                    <NA>                    <NA>              <NA>                  <NA> [email protected]
#     B Person 3 Clinical Assistant Professor of B                                                <NA>                                    <NA>                    <NA>              <NA>                  <NA> [email protected]
#     B Person 4                    Professor of B                                                <NA>                                    <NA>                    <NA>              <NA>                  <NA> [email protected]
#     C Person 5                    Professor of C                            Associate Chair, Quality Department of Urology and Service Chief                    <NA>              <NA>                  <NA> [email protected]
© www.soinside.com 2019 - 2024. All rights reserved.