使用Ntile将子集分类为分位数

问题描述 投票:0回答:1

我有一个包含42840个观测值的数据集,共有119个独特月份(数据集$ date)。我的想法是,我想为每个月内的每个数据集$ Value分配一个分位数,并将它们从1(最低值)到5(最高值)“排名”。

 Date     Name(ID)    Value    Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
 2009-03  1          35        (1-5)
 2009-04  1          20        ...
 2009-05  1          65        ...
 2009-03  2          24        ...
 2009-04  2          77        ...
 2009-03  3          110       ...
.
.
.
 2018-12  3          125       ...
 2009-03  56          24       ...
 2009-04  56          65       ...
 2009-03  57          26       ...
 2009-04  57          67       ...
 2009-03  58          99       ...

我试图使用Ntile函数,它对整个数据集很有用,但似乎没有我可以为日期子集指定的函数。

有什么建议?

r subset quantile
1个回答
0
投票

您可以使用rank的基本dplyr函数group_by

library(dplyr)

# Create some data
N <- 3
dat <- tibble(
  date = rep(1:12,N),
  value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx) {
  nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
  return(ceiling(nx))
}

# What you want
dat %>% 
  group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
  mutate(rank_detail = rank(value), # ranks the values within each group
         rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>%   # rescales the ranking to be on you 1 to 5 scale
  arrange(date)

# A tibble: 36 x 4
# # Groups:   date [12]
# date value rank_detail rank_group
# <int> <dbl>       <dbl>      <dbl>
# 1     1 92.7            3          5
# 2     1 53.6            2          3
# 3     1 47.8            1          1
# 4     2 24.6            2          3
# 5     2 72.2            3          5
# 6     2 11.5            1          1
© www.soinside.com 2019 - 2024. All rights reserved.