使用R中的数据表计算数字序列的分位数的问题

问题描述 投票:0回答:1

我需要为数据表中的每一行计算数据表R中的数字序列的分位数。

Table:  

       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
         NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
       11.7 10.7 10.8 11.8 12.2 13.8  7.0 10.2 11.2  6.8  7.4  9.1  9.5  9.4  9.3 15.6 11.3 13.0 10.9 10.5
         NA   NA  9.5 11.3 16.6 12.2   NA   NA 69.6   NA   NA 12.4 10.8 10.5  8.8  9.9   NA  7.7 12.1   NA
        9.1  8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2
       14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6  9.9 12.6  6.4  9.6 18.5 14.3 26.2 10.7  6.4  6.9  7.1  9.0

我想为以上表格的每一行计算分位数。请在下面查看我的代码,但是我需要将每一行的值放置在“输出”中。

year_cols <- c(2000:2019) 
Table[, c("10","25","50","75","100") := quantile(.SD, na.rm = TRUE, c(0.1,0.25,0.5,0.75,1.0)), .SDcols = as.character(year_cols)]

[如下所示如何计算每一行的分位数,或者如果有人可以帮助修改我的代码,以便可以使用数据表R显示每一行的分位数,我将不胜感激。

Output:

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10%  25%  50%  75%  100% 
  NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA  NA  NA   NA   NA   NA
11.7 10.7 10.8 11.8 12.2 13.8  7.0 10.2 11.2  6.8  7.4  9.1  9.5  9.4  9.3 15.6 11.3 13.0 10.9 10.5 7.36 9.37 10.75 11.72 15.60 
  NA   NA  9.5 11.3 16.6 12.2   NA   NA 69.6   NA   NA 12.4 10.8 10.5  8.8  9.9   NA  7.7 12.1   NA
 9.1  8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2
14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6  9.9 12.6  6.4  9.6 18.5 14.3 26.2 10.7  6.4  6.9  7.1  9.0
r datatable quantile
1个回答
1
投票

一个选项是按行分组

year_cols <- as.character(2000:2019)
Table[, c("10%", "25%", "50%", "75%", "100%") := 
   as.list(quantile(unlist(.SD), na.rm = TRUE, 
       c(0.1,0.25,0.5,0.75,1.0))), by = seq_len(nrow(Table)), 
        .SDcols = year_cols]
Table
#   2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019   10%
#1:   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#2: 11.7 10.7 10.8 11.8 12.2 13.8  7.0 10.2 11.2  6.8  7.4  9.1  9.5  9.4  9.3 15.6 11.3 13.0 10.9 10.5  7.36
#3:   NA   NA  9.5 11.3 16.6 12.2   NA   NA 69.6   NA   NA 12.4 10.8 10.5  8.8  9.9   NA  7.7 12.1   NA  8.87
#4:  9.1  8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2 15.67
#5: 14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6  9.9 12.6  6.4  9.6 18.5 14.3 26.2 10.7  6.4  6.9  7.1  9.0  6.85
#      25%   50%    75% 100%
#1:     NA    NA     NA   NA
#2:  9.375 10.75 11.725 15.6
#3:  9.800 11.05 12.250 69.6
#4: 18.275 23.05 26.850 39.9
#5:  9.450 13.35 17.750 26.2

另一种方法是从rowQuantiles转换为matrixStats后的matrix

library(matrixStats)
Table[, c("10%", "25%", "50%", "75%", "100%") := 
    as.data.frame(rowQuantiles(as.matrix(.SD), na.rm = TRUE,
     probs = c(0.1,0.25,0.5,0.75,1.0))), .SDcols = as.character(year_cols)]



Table
#   2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019   10%
#1:   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#2: 11.7 10.7 10.8 11.8 12.2 13.8  7.0 10.2 11.2  6.8  7.4  9.1  9.5  9.4  9.3 15.6 11.3 13.0 10.9 10.5  7.36
#3:   NA   NA  9.5 11.3 16.6 12.2   NA   NA 69.6   NA   NA 12.4 10.8 10.5  8.8  9.9   NA  7.7 12.1   NA  8.87
#4:  9.1  8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2 15.67
#5: 14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6  9.9 12.6  6.4  9.6 18.5 14.3 26.2 10.7  6.4  6.9  7.1  9.0  6.85
#      25%   50%    75% 100%
#1:     NA    NA     NA   NA
#2:  9.375 10.75 11.725 15.6
#3:  9.800 11.05 12.250 69.6
#4: 18.275 23.05 26.850 39.9
#5:  9.450 13.35 17.750 26.2

数据

Table <- structure(list(`2000` = c(NA, 11.7, NA, 9.1, 14.7), `2001` = c(NA, 
 10.7, NA, 8.7, 17.5), `2002` = c(NA, 10.8, 9.5, 29.9, 21.1), 
     `2003` = c(NA, 11.8, 11.3, 23.1, 19.4), `2004` = c(NA, 12.2, 
     16.6, 18.3, 20), `2005` = c(NA, 13.8, 12.2, 23.5, 14.5), 
     `2006` = c(NA, 7, NA, 21.5, 14.1), `2007` = c(NA, 10.2, NA, 
     23, 12.6), `2008` = c(NA, 11.2, 69.6, 18.2, 9.9), `2009` = c(NA, 
     6.8, NA, 28.8, 12.6), `2010` = c(NA, 7.4, NA, 39.9, 6.4), 
     `2011` = c(NA, 9.1, 12.4, 16.4, 9.6), `2012` = c(NA, 9.5, 
     10.8, 16.9, 18.5), `2013` = c(NA, 9.4, 10.5, 23.4, 14.3), 
     `2014` = c(NA, 9.3, 8.8, 18.8, 26.2), `2015` = c(NA, 15.6, 
     9.9, 31.9, 10.7), `2016` = c(NA, 11.3, NA, 26.2, 6.4), `2017` = c(NA, 
     13, 7.7, 22.4, 6.9), `2018` = c(NA, 10.9, 12.1, 29.2, 7.1
     ), `2019` = c(NA, 10.5, NA, 25.2, 9)), class = c("data.table", 
 "data.frame"), row.names = c(NA, -5L))
© www.soinside.com 2019 - 2024. All rights reserved.