按组滚动数据集两列之间的协方差(或其他函数)

问题描述 投票:0回答:1

所以,我有一个像这样的长格式的 data.table

#sample data
sample_size = 10
DT0 <- data.table(
  YEAR = seq(2021, by=-1, length.out = sample_size),
  a1 = seq(5, by=0.035, length.out = sample_size),
  a2 = seq(12, by=0.6, length.out = sample_size),
  a3 = seq(10, by=0.01, length.out = sample_size)
)

#melting to long size
DT <- melt(DT0, 
           id.vars = c("YEAR"), 
           variable.name = "ITEM",
           value.name = "VARIATION")

setkeyv(DT, cols=c("ITEM", "YEAR"))
> print(DT, 100)
     YEAR   ITEM VARIATION
    <num> <fctr>     <num>
 1:  2021     a1     5.000
 2:  2020     a1     5.035
 3:  2019     a1     5.070
 4:  2018     a1     5.105
 5:  2017     a1     5.140
 6:  2016     a1     5.175
 7:  2015     a1     5.210
 8:  2014     a1     5.245
 9:  2013     a1     5.280
10:  2012     a1     5.315
11:  2021     a2    12.000
12:  2020     a2    12.600
13:  2019     a2    13.200
14:  2018     a2    13.800
15:  2017     a2    14.400
16:  2016     a2    15.000
17:  2015     a2    15.600
18:  2014     a2    16.200
19:  2013     a2    16.800
20:  2012     a2    17.400
21:  2021     a3    10.000
22:  2020     a3    10.010
23:  2019     a3    10.020
24:  2018     a3    10.030
25:  2017     a3    10.040
26:  2016     a3    10.050
27:  2015     a3    10.060
28:  2014     a3    10.070
29:  2013     a3    10.080
30:  2012     a3    10.090
     YEAR   ITEM VARIATION

我正在尝试使用 rolling(假设 n = 5)covariance 计算新闻列,对于 ITEM (a1, a2, a3) 的每个元素成对

我尝试手动创建每一对,执行联合 DT[DT[ITEM==(a1,a2,a3)] 并使用 cov(a,b) 和滚动函数 data.table::frollapply,像下面这样:

#joint
DT2 <- DT[DT[ITEM == "a1"], on=.(YEAR)]
> print(DT2, 100)
     YEAR   ITEM VARIATION i.ITEM i.VARIATION
    <num> <fctr>     <num> <fctr>       <num>
 1:  2012     a1     5.315     a1       5.315
 2:  2012     a2    17.400     a1       5.315
 3:  2012     a3    10.090     a1       5.315
 4:  2013     a1     5.280     a1       5.280
 5:  2013     a2    16.800     a1       5.280
 6:  2013     a3    10.080     a1       5.280
 7:  2014     a1     5.245     a1       5.245
 8:  2014     a2    16.200     a1       5.245
 9:  2014     a3    10.070     a1       5.245
10:  2015     a1     5.210     a1       5.210
11:  2015     a2    15.600     a1       5.210
12:  2015     a3    10.060     a1       5.210
13:  2016     a1     5.175     a1       5.175
14:  2016     a2    15.000     a1       5.175
15:  2016     a3    10.050     a1       5.175
16:  2017     a1     5.140     a1       5.140
17:  2017     a2    14.400     a1       5.140
18:  2017     a3    10.040     a1       5.140
19:  2018     a1     5.105     a1       5.105
20:  2018     a2    13.800     a1       5.105
21:  2018     a3    10.030     a1       5.105
22:  2019     a1     5.070     a1       5.070
23:  2019     a2    13.200     a1       5.070
24:  2019     a3    10.020     a1       5.070
25:  2020     a1     5.035     a1       5.035
26:  2020     a2    12.600     a1       5.035
27:  2020     a3    10.010     a1       5.035
28:  2021     a1     5.000     a1       5.000
29:  2021     a2    12.000     a1       5.000
30:  2021     a3    10.000     a1       5.000
     YEAR   ITEM VARIATION i.ITEM i.VARIATION
#computing cov pairs for "a1": cov(a1, a1); cov(a2, a1) and cov(a3, a1)..

DT2[, 
    "Cov(ITEM, a1)" := frollapply(.SD, n=5, FUN=cov(x= VARIATION, y= i.VARIATION)),
    by=.(ITEM)]

但是我得到了这个结果:

>Error in match.fun(FUN) : 
  'cov(x = VARIATION, y = i.VARIATION)' is not a function, character or symbol

编辑: 尝试了@IRTFM 建议,方法是:

DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = VARIATION, y = i.VARIATION)}),    by = .(ITEM)]

并收到此错误:

Error in frollapply(.SD, n = 5, FUN = function(x, y) { :   x must be list, data.frame or data.table of numeric or logical types

VARIATION 和 i.VARIATION 的元素都是数字,因此,我尝试通过执行以下操作将它们作为列表返回:

DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = .(VARIATION), y = .(i.VARIATION))}),    by = .(ITEM)]

但是返回了同样的错误。

对于如何正确执行此操作使用 frollapply,您有任何提示或建议吗?

r data.table covariance rolling-computation
1个回答
0
投票

我认为这样的事情可能会起作用希望它有帮助

库(数据.表)

    sample_size <- 10
    DT0 <- data.table(
      YEAR = seq(2021, by = -1, length.out = sample_size),
      a1 = seq(5, by = 0.035, length.out = sample_size),
      a2 = seq(12, by = 0.6, length.out = sample_size),
      a3 = seq(10, by = 0.01, length.out = sample_size)
    )
    
    DT <- melt(DT0, id.vars = "YEAR", variable.name = "ITEM", value.name = "VARIATION")
    setkey(DT, ITEM, YEAR)
  
    items <- unique(DT$ITEM)
    pairs <- CJ(item1 = items, item2 = items)
 
    cov_answer <- pairs[, {
      DT1 <- DT[ITEM == item1]
      DT2 <- DT[ITEM == item2]
      DT_merged <- DT1[DT2, on = "YEAR", nomatch = 0][, .(YEAR, item1, item2, VARIATION, i.VARIATION)]
    
    
      DT_merged[, cov := frollapply(.SD, n = 5, FUN = function(x) cov(x[, 1], x[, 2]), na.rm = TRUE), .SDcols = c("VARIATION", "i.VARIATION")]
    }, by = .(item1, item2)]
    
  
© www.soinside.com 2019 - 2024. All rights reserved.