我需要为数据表中的每一行计算数据表R中的数字序列的分位数。
Table:
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5
NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA
9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2
14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0
我想为以上表格的每一行计算分位数。请在下面查看我的代码,但是我需要将每一行的值放置在“输出”中。
year_cols <- c(2000:2019)
Table[, c("10","25","50","75","100") := quantile(.SD, na.rm = TRUE, c(0.1,0.25,0.5,0.75,1.0)), .SDcols = as.character(year_cols)]
[如下所示如何计算每一行的分位数,或者如果有人可以帮助修改我的代码,以便可以使用数据表R显示每一行的分位数,我将不胜感激。
Output:
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10% 25% 50% 75% 100%
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36 9.37 10.75 11.72 15.60
NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA
9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2
14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0
一个选项是按行分组
year_cols <- as.character(2000:2019)
Table[, c("10%", "25%", "50%", "75%", "100%") :=
as.list(quantile(unlist(.SD), na.rm = TRUE,
c(0.1,0.25,0.5,0.75,1.0))), by = seq_len(nrow(Table)),
.SDcols = year_cols]
Table
# 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10%
#1: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#2: 11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36
#3: NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA 8.87
#4: 9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2 15.67
#5: 14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0 6.85
# 25% 50% 75% 100%
#1: NA NA NA NA
#2: 9.375 10.75 11.725 15.6
#3: 9.800 11.05 12.250 69.6
#4: 18.275 23.05 26.850 39.9
#5: 9.450 13.35 17.750 26.2
另一种方法是从rowQuantiles
转换为matrixStats
后的matrix
library(matrixStats)
Table[, c("10%", "25%", "50%", "75%", "100%") :=
as.data.frame(rowQuantiles(as.matrix(.SD), na.rm = TRUE,
probs = c(0.1,0.25,0.5,0.75,1.0))), .SDcols = as.character(year_cols)]
Table
# 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 10%
#1: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#2: 11.7 10.7 10.8 11.8 12.2 13.8 7.0 10.2 11.2 6.8 7.4 9.1 9.5 9.4 9.3 15.6 11.3 13.0 10.9 10.5 7.36
#3: NA NA 9.5 11.3 16.6 12.2 NA NA 69.6 NA NA 12.4 10.8 10.5 8.8 9.9 NA 7.7 12.1 NA 8.87
#4: 9.1 8.7 29.9 23.1 18.3 23.5 21.5 23.0 18.2 28.8 39.9 16.4 16.9 23.4 18.8 31.9 26.2 22.4 29.2 25.2 15.67
#5: 14.7 17.5 21.1 19.4 20.0 14.5 14.1 12.6 9.9 12.6 6.4 9.6 18.5 14.3 26.2 10.7 6.4 6.9 7.1 9.0 6.85
# 25% 50% 75% 100%
#1: NA NA NA NA
#2: 9.375 10.75 11.725 15.6
#3: 9.800 11.05 12.250 69.6
#4: 18.275 23.05 26.850 39.9
#5: 9.450 13.35 17.750 26.2
Table <- structure(list(`2000` = c(NA, 11.7, NA, 9.1, 14.7), `2001` = c(NA,
10.7, NA, 8.7, 17.5), `2002` = c(NA, 10.8, 9.5, 29.9, 21.1),
`2003` = c(NA, 11.8, 11.3, 23.1, 19.4), `2004` = c(NA, 12.2,
16.6, 18.3, 20), `2005` = c(NA, 13.8, 12.2, 23.5, 14.5),
`2006` = c(NA, 7, NA, 21.5, 14.1), `2007` = c(NA, 10.2, NA,
23, 12.6), `2008` = c(NA, 11.2, 69.6, 18.2, 9.9), `2009` = c(NA,
6.8, NA, 28.8, 12.6), `2010` = c(NA, 7.4, NA, 39.9, 6.4),
`2011` = c(NA, 9.1, 12.4, 16.4, 9.6), `2012` = c(NA, 9.5,
10.8, 16.9, 18.5), `2013` = c(NA, 9.4, 10.5, 23.4, 14.3),
`2014` = c(NA, 9.3, 8.8, 18.8, 26.2), `2015` = c(NA, 15.6,
9.9, 31.9, 10.7), `2016` = c(NA, 11.3, NA, 26.2, 6.4), `2017` = c(NA,
13, 7.7, 22.4, 6.9), `2018` = c(NA, 10.9, 12.1, 29.2, 7.1
), `2019` = c(NA, 10.5, NA, 25.2, 9)), class = c("data.table",
"data.frame"), row.names = c(NA, -5L))