我有以下 colname 的 df:
colname(df) 给出:
"SUBJID" "EoT_A" "EoT_B" "EoT_C" "EoT_D" "PR_A" "PR_B" "PR_C" "PR_D"
"PD_A" "PD_B" "PD_C" "PD_D" "CR_A" "CR_B" "CR_C" "CR_D"
我想重新排序 colname,例如:
"SUBJID"
"EoT_A" "PR_A" "PD_A" "CR_A"
"EoT_B" "PR_B" "PD_B" "CR_B"
"EoT_C" "PR_C" "PD_C" "CR_C"
"EoT_D" "PR_D" "PD_D" "CR_D"
是否有一种聪明的方法来实现这一目标?
您可以使用
dplyr::ends_with
,例如
df |>
dplyr::select(SUBJID, dplyr::ends_with(LETTERS[1:4])) |>
colnames()
[1] "SUBJID" "EoT_A" "PR_A" "PD_A" "CR_A" "EoT_B" "PR_B" "PD_B"
[9] "CR_B" "EoT_C" "PR_C" "PD_C" "CR_C" "EoT_D" "PR_D" "PD_D"
[17] "CR_D"
我不知道它有多聪明,但你可以做
df[c(1, order(sapply(strsplit(names(df), '_'), function(x) rev(x)[1])[-1]) + 1)]
例如,如果您的数据框如下所示:
df
#> SUBJID EoT_A EoT_B EoT_C EoT_D PR_A PR_B PR_C PR_D PD_A PD_B PD_C PD_D CR_A CR_B CR_C CR_D
#> 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
然后代码将您的数据放入所需的顺序:
df[c(1, order(sapply(strsplit(names(df), '_'), function(x) rev(x)[1])[-1]) + 1)]
#> SUBJID EoT_A PR_A PD_A CR_A EoT_B PR_B PD_B CR_B EoT_C PR_C PD_C CR_C EoT_D PR_D PD_D CR_D
#> 1 1 2 6 10 14 3 7 11 15 4 8 12 16 5 9 13 17
使用
sub
的另一个选项是按字母顺序提取最后一个下划线后面的最后一个字符并按字母顺序提取 sort
。为了确保第一列未被使用,您可以在排序中添加 +1 以使其按正确的顺序排列,如下所示:
df[c(1, 1+order(sub('.*_', '', colnames(df[,-1]))))]
#> SUBJID EoT_A PR_A PD_A CR_A EoT_B PR_B PD_B CR_B EoT_C PR_C PD_C CR_C EoT_D
#> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> PR_D PD_D CR_D
#> 1 1 1 1
创建于 2023-01-22,使用 reprex v2.0.2
假设
x
是您的昵称,您可以通过 order
来 nchar
。
c(x[1], x[-1][order(substring(x[-1], nchar(x[-1])))])
# [1] "SUBJID" "EoT_A" "PR_A" "PD_A" "CR_A" "EoT_B"
# [7] "PR_B" "PD_B" "CR_B" "EoT_C" "PR_C" "PD_C"
# [13] "CR_C" "EoT_D" "PR_D" "PD_D" "CR_D"
使用
matrix
的方法。它通过 byrow
以 4 组为一组转置数据。
df[, sapply(c(colnames(df)[1],
as.vector(matrix(colnames(df)[-1], nrow=4, byrow=T))), function(x)
which(colnames(df) == x))]
SUBJID EoT_A PR_A PD_A CR_A EoT_B PR_B PD_B CR_B EoT_C PR_C PD_C CR_C EoT_D
1 1 2 6 10 14 3 7 11 15 4 8 12 16 5
2 2 3 7 11 15 4 8 12 16 5 9 13 17 6
3 3 4 8 12 16 5 9 13 17 6 10 14 18 7
PR_D PD_D CR_D
1 9 13 17
2 10 14 18
3 11 15 19
df <- structure(list(SUBJID = 1:3, EoT_A = 2:4, EoT_B = 3:5, EoT_C = 4:6,
EoT_D = 5:7, PR_A = 6:8, PR_B = 7:9, PR_C = 8:10, PR_D = 9:11,
PD_A = 10:12, PD_B = 11:13, PD_C = 12:14, PD_D = 13:15, CR_A = 14:16,
CR_B = 15:17, CR_C = 16:18, CR_D = 17:19), class = "data.frame",
row.names = c(NA, -3L))
类似于@Quinten 的解决方案,但没有对列进行索引。在
sub
管道中使用 dplyr
。
df |>
dplyr::select(SUBJID, order(sub('.*_', '', names(df))))
# SUBJID EoT_A PR_A PD_A CR_A EoT_B PR_B PD_B CR_B EoT_C PR_C PD_C CR_C EoT_D PR_D PD_D CR_D
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA