有时候我会尝试创建一个涵盖整个问题的标题,但我在处理这个问题时遇到了一些困难,并希望直接进入我想要完成的一个例子。首先,我的数据帧的一个子集,其中包括一些体育数据:
dput(mydf)
structure(list(team.Abbreviation = c("ATL", "BOS", "BRO", "CHA",
"CHI", "ATL", "BOS", "BRO", "CHA", "CHI", "ATL", "BOS", "BRO",
"CHA", "CHI"), stat = c("GP", "GP", "GP", "GP", "GP", "PTS",
"PTS", "PTS", "PTS", "PTS", "REB", "REB", "REB", "REB", "REB"
), value = c(28, 30, 27, 27, 27, 103.5, 103.9, 108.2, 104.7,
97.6, 47.6, 53, 54.7, 56.8, 51.7), foragainst = c("for", "for",
"for", "for", "for", "for", "for", "for", "for", "for", "for",
"for", "for", "for", "for")), .Names = c("team.Abbreviation",
"stat", "value", "foragainst"), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
mydf
# A tibble: 15 x 4
team.Abbreviation stat value foragainst
<chr> <chr> <dbl> <chr>
1 ATL GP 28.0 for
2 BOS GP 30.0 for
3 BRO GP 27.0 for
4 CHA GP 27.0 for
5 CHI GP 27.0 for
6 ATL PTS 103.5 for
7 BOS PTS 103.9 for
8 BRO PTS 108.2 for
9 CHA PTS 104.7 for
10 CHI PTS 97.6 for
11 ATL REB 47.6 for
12 BOS REB 53.0 for
13 BRO REB 54.7 for
14 CHA REB 56.8 for
15 CHI REB 51.7 for
目前可以忽略最后一栏。对于每个统计数据(在这种情况下为GP,PTS,REB),我想计算每个团队在该统计数据中的排名。这个例子中有5个团队。我很确定我想要的是一个与mydf具有相同尺寸的数据框,看起来像这样:
outputdf
# A tibble: 15 x 4
team.Abbreviation stat rank foragainst
<chr> <chr> <dbl> <chr>
1 ATL GP 2 for
2 BOS GP 1 for
3 BRO GP 3 for
4 CHA GP 3 for
5 CHI GP 3 for
6 ATL PTS 4 for
7 BOS PTS 3 for
8 BRO PTS 1 for
9 CHA PTS 2 for
10 CHI PTS 5 for
11 ATL REB 5 for
12 BOS REB 3 for
13 BRO REB 2 for
14 CHA REB 1 for
15 CHI REB 4 for
检查此数据的5行切片,其中stat == PTS,注意team.Abbrevation == BRO具有最高的PTS数,因此它的等级为1. CHI具有最低的PTS数,因此它的等级为5。我并不特别在意如何处理关系,因此对于统计数据== GP而言,BRO,CHA和CHI的排名必须= = 3。
我可以通过for循环以相当低效的方式完成这项工作,但我想在这里找到一个dplyr(或其他好的软件包)解决方案。提前致谢!
我们可以使用min_rank
library(dplyr)
mydf %>%
group_by(stat) %>%
mutate(rank = min_rank(-value)) %>%
select(team.Abbreviation, stat, rank, foragainst)
# A tibble: 15 x 4
# Groups: stat [3]
# team.Abbreviation stat rank foragainst
# <chr> <chr> <int> <chr>
# 1 ATL GP 2 for
# 2 BOS GP 1 for
# 3 BRO GP 3 for
# 4 CHA GP 3 for
# 5 CHI GP 3 for
# 6 ATL PTS 4 for
# 7 BOS PTS 3 for
# 8 BRO PTS 1 for
# 9 CHA PTS 2 for
#10 CHI PTS 5 for
#11 ATL REB 5 for
#12 BOS REB 3 for
#13 BRO REB 2 for
#14 CHA REB 1 for
#15 CHI REB 4 for
或者使用ave
的base R
with(mydf, ave(-value, stat, FUN = function(x) rank(x, ties.method = "min")))