如何在 dplyr 中对许多列进行变异而不需要多次重复变异？

Question

我正在用 R 编写一个非常不干的 dplyr 链。我需要在我的数据帧中的

很多

列上调用 dplyr::mutate() and dplyr::percent_rank() 函数，这对我来说没有一行会很有帮助每次调用的代码。我需要计算百分位数的数据框列具有以下模式：

regions <- c("atr2", "sht2", "mid2", "lng2", "all2", "sht3", "lng3", "all3")
suffixes <- c("Made", "Att", "AttFreq", "Pct")
for(i in regions) {
  for(j in suffixes) {
    print(paste0(i, j))
  }
}

在上面的示例中，我需要

8 * 4 == 32

不同的百分位数列。所有 32 个初始列

atr2Made

、

atr2Att

等都已在我的数据框中。为了计算百分位数，我一直在执行以下操作：

pctile.lineup.data <- pctile.lineup.data %>%
    dplyr::group_by(season) %>%
    # dplyr::group_by(season, homeConfId) %>%
    dplyr::mutate(atr2MadeRankNcaa = round(100 * dplyr::percent_rank(atr2Made))) %>%
    dplyr::mutate(atrAttRankNcaa = round(100 * dplyr::percent_rank(atr2Att))) %>%
    dplyr::mutate(atr2AttFreqRankNcaa = round(100 * dplyr::percent_rank(atr2AttFreq))) %>%
    dplyr::mutate(atr2PctRankNcaa = round(100 * dplyr::percent_rank(atr2Pct))) %>%
    dplyr::mutate(sht2MadeRankNcaa = round(100 * dplyr::percent_rank(sht2Made))) %>%
    dplyr::mutate(shtAttRankNcaa = round(100 * dplyr::percent_rank(sht2Att))) %>%
    dplyr::mutate(sht2AttFreqRankNcaa = round(100 * dplyr::percent_rank(sht2AttFreq))) %>%
    dplyr::mutate(sht2PctRankNcaa = round(100 * dplyr::percent_rank(sht2Pct))) %>%
    dplyr::mutate(mid2MadeRankNcaa = round(100 * dplyr::percent_rank(mid2Made))) %>%
    dplyr::mutate(midAttRankNcaa = round(100 * dplyr::percent_rank(mid2Att))) %>%
    dplyr::mutate(mid2AttFreqRankNcaa = round(100 * dplyr::percent_rank(mid2AttFreq))) %>%
    dplyr::mutate(mid2PctRankNcaa = round(100 * dplyr::percent_rank(mid2Pct))) %>%
    ... %>%
    dplyr::ungroup()

我不仅需要 32 个不同的

mutate()

函数，还需要为 2 个不同的

group_by()

运行此代码两次（请参阅注释掉的第二个）。还有比 64 行代码更好的方法吗？我有一个单独的数据帧，它有 21 个区域而不是 8 个区域，具有相同的 4 个后缀和相同的 2 个 group_by()，因此需要 21 * 4 * 2 == 168 行代码来计算这些百分位数。这不是干的 - 请帮忙！

编辑：我显然正在研究

mutate_at

，但是我对

_at

版本的 mutate 不是很熟悉/擅长。除了这 32 列之外，我的数据框中还有其他列，所以我认为

mutate_all

不起作用。

Answer 1

从 dplyr 版本 1.0.0 开始，动词的作用域版本（例如本答案的早期版本讨论的

mutate_at

）已被弃用，取而代之的是

dplyr::across

函数，该函数更简单，可以让您正确执行此操作在

dplyr::mutate

中，无需使用单独的函数。从一些示例数据开始：

df <- data.frame(name = LETTERS[1:5],
                 item1 = rnorm(5, mean=2),
                 item2 = rnorm(5, mean=5),
                 item3 = rnorm(5, mean=7))

dplyr::across

函数位于

mutate

内部并接受2个主要参数：

a
```
.cols
```
参数，接受
```
dplyr::select
```
使用的选择器函数。在本例中，我们使用
```
one_of
```
提供变量列表，但如果变量存在模式，我们可以使用
```
contains
```
或
```
starts_with
```
来简化它
a
```
.fns
```
参数，我们在其中放置要应用于每一列的一个或多个函数。这可以是函数对象（即
```
mean
```
）、函数调用（即
```
~mean(.x)
```
或
```
function(x) mean(x)
```
）或两者的列表。

df %>%
    mutate(across(one_of('item1', 'item2'),
                  .fns = list(rounded = ~ round(100 * percent_rank(.x)))))

  name     item1    item2    item3 item1_rounded item2_rounded
1    A 2.0825275 6.445983 7.373511            50            75
2    B 1.2568069 4.715137 8.282489            25            50
3    C 3.8895454 6.486809 5.426263           100           100
4    D 0.6094173 3.645558 6.975673             0             0
5    E 2.1202091 4.488883 6.168427            75            25

如果您想对这些列应用多个函数，只需向列表中添加更多函数即可。

由于

.fns

中的函数被命名为(

rounded = ...

)，因此该运算的结果将被放入新变量中，并以该名称作为后缀。如果未命名，则输出将被编号（即

item1_1

和

item2_1

）

您还可以使用新的

.names

参数提供粘合描述来指定如何组装新列名称

如何在 dplyr 中对许多列进行变异而不需要多次重复变异？

问题描述投票：0回答：1

1个回答

最新问题

如何在 dplyr 中对许多列进行变异而不需要多次重复变异？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1