使用dplyr在R中整洁的数据帧中进行多值排名

问题描述 投票:1回答:1

有时候我会尝试创建一个涵盖整个问题的标题,但我在处理这个问题时遇到了一些困难,并希望直接进入我想要完成的一个例子。首先,我的数据帧的一个子集,其中包括一些体育数据:

dput(mydf)
structure(list(team.Abbreviation = c("ATL", "BOS", "BRO", "CHA", 
"CHI", "ATL", "BOS", "BRO", "CHA", "CHI", "ATL", "BOS", "BRO", 
"CHA", "CHI"), stat = c("GP", "GP", "GP", "GP", "GP", "PTS", 
"PTS", "PTS", "PTS", "PTS", "REB", "REB", "REB", "REB", "REB"
), value = c(28, 30, 27, 27, 27, 103.5, 103.9, 108.2, 104.7, 
97.6, 47.6, 53, 54.7, 56.8, 51.7), foragainst = c("for", "for", 
"for", "for", "for", "for", "for", "for", "for", "for", "for", 
"for", "for", "for", "for")), .Names = c("team.Abbreviation", 
"stat", "value", "foragainst"), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

mydf
# A tibble: 15 x 4
    team.Abbreviation  stat value foragainst
               <chr> <chr> <dbl>      <chr>
 1               ATL    GP  28.0        for
 2               BOS    GP  30.0        for
 3               BRO    GP  27.0        for
 4               CHA    GP  27.0        for
 5               CHI    GP  27.0        for
 6               ATL   PTS 103.5        for
 7               BOS   PTS 103.9        for
 8               BRO   PTS 108.2        for
 9               CHA   PTS 104.7        for
10               CHI   PTS  97.6        for
11               ATL   REB  47.6        for
12               BOS   REB  53.0        for
13               BRO   REB  54.7        for
14               CHA   REB  56.8        for
15               CHI   REB  51.7        for

目前可以忽略最后一栏。对于每个统计数据(在这种情况下为GP,PTS,REB),我想计算每个团队在该统计数据中的排名。这个例子中有5个团队。我很确定我想要的是一个与mydf具有相同尺寸的数据框,看起来像这样:

outputdf
# A tibble: 15 x 4
    team.Abbreviation  stat rank  foragainst
               <chr> <chr> <dbl>      <chr>
 1               ATL    GP     2        for
 2               BOS    GP     1        for
 3               BRO    GP     3        for
 4               CHA    GP     3        for
 5               CHI    GP     3        for
 6               ATL   PTS     4        for
 7               BOS   PTS     3        for
 8               BRO   PTS     1        for
 9               CHA   PTS     2        for
10               CHI   PTS     5        for
11               ATL   REB     5        for
12               BOS   REB     3        for
13               BRO   REB     2        for
14               CHA   REB     1        for
15               CHI   REB     4        for

检查此数据的5行切片,其中stat == PTS,注意team.Abbrevation == BRO具有最高的PTS数,因此它的等级为1. CHI具有最低的PTS数,因此它的等级为5。我并不特别在意如何处理关系,因此对于统计数据== GP而言,BRO,CHA和CHI的排名必须= = 3。

我可以通过for循环以相当低效的方式完成这项工作,但我想在这里找到一个dplyr(或其他好的软件包)解决方案。提前致谢!

r dplyr
1个回答
3
投票

我们可以使用min_rank

library(dplyr)
mydf %>% 
    group_by(stat) %>% 
    mutate(rank = min_rank(-value)) %>% 
    select(team.Abbreviation, stat, rank, foragainst)
# A tibble: 15 x 4
# Groups:   stat [3]
#   team.Abbreviation  stat  rank foragainst
#               <chr> <chr> <int>      <chr>
# 1               ATL    GP     2        for
# 2               BOS    GP     1        for
# 3               BRO    GP     3        for
# 4               CHA    GP     3        for
# 5               CHI    GP     3        for
# 6               ATL   PTS     4        for
# 7               BOS   PTS     3        for
# 8               BRO   PTS     1        for
# 9               CHA   PTS     2        for
#10               CHI   PTS     5        for
#11               ATL   REB     5        for
#12               BOS   REB     3        for
#13               BRO   REB     2        for
#14               CHA   REB     1        for
#15               CHI   REB     4        for

或者使用avebase R

with(mydf, ave(-value, stat, FUN = function(x) rank(x, ties.method = "min")))
© www.soinside.com 2019 - 2024. All rights reserved.