我有一个具有重复 ID 的长列表格式的数据框。每个 ID 都有一个所谓的捐赠者号和时间点 (Tijdspunt)。一个 ID (Deelnemernr.) 可以有重复的时间点,如下所示:
Deelnemernr. donornrs_pbmc Tijdspunt
<chr> <chr> <chr>
1 449132 4491321 T1
2 449513 4495131 T1
3 449423 4494232 T2
4 449460 4494602 T2
5 449088 4490882 T2
6 449134 4491343 T3
7 449106 4491063 T3
8 449468 44946852 T5
9 449132 4491321 T1
10 449513 4495131 T1
11 449423 4494232 T2
12 449460 4494602 T2
13 449088 4490882 T2
14 449134 4491343 T3
15 449106 4491063 T3
16 449468 44946852 T5
我想从此数据框创建一个广泛的列表。我使用以下代码来执行此操作:
pbmc_total <- pbmc_total %>% group_by(Deelnemernr.) %>% mutate(id=row_number()) %>%
pivot_longer(-c(Deelnemernr.,id)) %>%
mutate(name=paste0(name,'.',id)) %>% select(-id) %>%
pivot_wider(names_from = name,values_from=value)
这给了我以下数据框:
Deelnemernr. donornrs_pbmc.1 Tijdspunt.1 donornrs_pbmc.2 Tijdspunt.2
<chr> <chr> <chr> <chr> <chr>
1 449132 4491321 T1 4491321 T1
2 449513 4495131 T1 4495131 T1
3 449423 4494232 T2 4494232 T2
4 449460 4494602 T2 4494602 T2
5 449088 4490882 T2 4490882 T2
6 449134 4491343 T3 4491343 T3
7 449106 4491063 T3 4491063 T3
8 449468 44946852 T5 44946852 T5
我只提供了数据帧的一部分,但它由大约 2300 行组成,我希望宽列表具有基于“Tijdspunt”列的特定顺序。我只想要每列一个时间点。这是我想要的输出:
Deelnemernr. donornrs_pbmc.1 Tijdspunt.1 donornrs_pbmc.2 Tijdspunt.2 donornrs_pbmc.3 Tijdspunt.3
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 449132 4491321 T1 4491321 T1 NA NA
2 449513 4495131 T1 4495131 T1 NA NA
3 449423 NA NA NA NA 4494232 T2
4 449460 NA NA NA NA 4494602 T2
5 449088 NA NA NA NA NA NA
6 449134 NA NA NA NA NA NA
7 449106 NA NA NA NA NA NA
8 449468 NA NA NA NA NA NA
# your example data frame
pmbc_total <- tibble::tribble(
~Deelnemernr., ~donornrs_pbmc, ~Tijdspunt,
"449132", "4491321", "T1",
"449513", "4495131", "T1",
"449423", "4494232", "T2",
"449460", "4494602", "T2",
"449088", "4490882", "T2",
"449134", "4491343", "T3",
"449106", "4491063", "T3",
"449468", "44946852", "T5",
"449132", "4491321", "T1",
"449513", "4495131", "T1",
"449423", "4494232", "T2",
"449460", "4494602", "T2",
"449088", "4490882", "T2",
"449134", "4491343", "T3",
"449106", "4491063", "T3",
"449468", "44946852", "T5"
)
列名称与所需输出示例中的名称不完全匹配,但值应该一致:
pmbc_total |>
group_by(donornrs_pbmc, Tijdspunt) |>
mutate(id = row_number()) |>
pivot_wider(id_cols = Deelnemernr., names_from = c(Tijdspunt, id), values_from = c(donornrs_pbmc, Tijdspunt), names_sort = TRUE, names_vary = "slowest")