所以,我有一个像这样的长格式的 data.table
#sample data
sample_size = 10
DT0 <- data.table(
YEAR = seq(2021, by=-1, length.out = sample_size),
a1 = seq(5, by=0.035, length.out = sample_size),
a2 = seq(12, by=0.6, length.out = sample_size),
a3 = seq(10, by=0.01, length.out = sample_size)
)
#melting to long size
DT <- melt(DT0,
id.vars = c("YEAR"),
variable.name = "ITEM",
value.name = "VARIATION")
setkeyv(DT, cols=c("ITEM", "YEAR"))
> print(DT, 100)
YEAR ITEM VARIATION
<num> <fctr> <num>
1: 2021 a1 5.000
2: 2020 a1 5.035
3: 2019 a1 5.070
4: 2018 a1 5.105
5: 2017 a1 5.140
6: 2016 a1 5.175
7: 2015 a1 5.210
8: 2014 a1 5.245
9: 2013 a1 5.280
10: 2012 a1 5.315
11: 2021 a2 12.000
12: 2020 a2 12.600
13: 2019 a2 13.200
14: 2018 a2 13.800
15: 2017 a2 14.400
16: 2016 a2 15.000
17: 2015 a2 15.600
18: 2014 a2 16.200
19: 2013 a2 16.800
20: 2012 a2 17.400
21: 2021 a3 10.000
22: 2020 a3 10.010
23: 2019 a3 10.020
24: 2018 a3 10.030
25: 2017 a3 10.040
26: 2016 a3 10.050
27: 2015 a3 10.060
28: 2014 a3 10.070
29: 2013 a3 10.080
30: 2012 a3 10.090
YEAR ITEM VARIATION
我正在尝试使用 rolling(假设 n = 5)covariance 计算新闻列,对于 ITEM (a1, a2, a3) 的每个元素成对
我尝试手动创建每一对,执行联合 DT[DT[ITEM==(a1,a2,a3)] 并使用 cov(a,b) 和滚动函数 data.table::frollapply,像下面这样:
#joint
DT2 <- DT[DT[ITEM == "a1"], on=.(YEAR)]
> print(DT2, 100)
YEAR ITEM VARIATION i.ITEM i.VARIATION
<num> <fctr> <num> <fctr> <num>
1: 2012 a1 5.315 a1 5.315
2: 2012 a2 17.400 a1 5.315
3: 2012 a3 10.090 a1 5.315
4: 2013 a1 5.280 a1 5.280
5: 2013 a2 16.800 a1 5.280
6: 2013 a3 10.080 a1 5.280
7: 2014 a1 5.245 a1 5.245
8: 2014 a2 16.200 a1 5.245
9: 2014 a3 10.070 a1 5.245
10: 2015 a1 5.210 a1 5.210
11: 2015 a2 15.600 a1 5.210
12: 2015 a3 10.060 a1 5.210
13: 2016 a1 5.175 a1 5.175
14: 2016 a2 15.000 a1 5.175
15: 2016 a3 10.050 a1 5.175
16: 2017 a1 5.140 a1 5.140
17: 2017 a2 14.400 a1 5.140
18: 2017 a3 10.040 a1 5.140
19: 2018 a1 5.105 a1 5.105
20: 2018 a2 13.800 a1 5.105
21: 2018 a3 10.030 a1 5.105
22: 2019 a1 5.070 a1 5.070
23: 2019 a2 13.200 a1 5.070
24: 2019 a3 10.020 a1 5.070
25: 2020 a1 5.035 a1 5.035
26: 2020 a2 12.600 a1 5.035
27: 2020 a3 10.010 a1 5.035
28: 2021 a1 5.000 a1 5.000
29: 2021 a2 12.000 a1 5.000
30: 2021 a3 10.000 a1 5.000
YEAR ITEM VARIATION i.ITEM i.VARIATION
#computing cov pairs for "a1": cov(a1, a1); cov(a2, a1) and cov(a3, a1)..
DT2[,
"Cov(ITEM, a1)" := frollapply(.SD, n=5, FUN=cov(x= VARIATION, y= i.VARIATION)),
by=.(ITEM)]
但是我得到了这个结果:
>Error in match.fun(FUN) :
'cov(x = VARIATION, y = i.VARIATION)' is not a function, character or symbol
编辑: 尝试了@IRTFM 建议,方法是:
DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = VARIATION, y = i.VARIATION)}), by = .(ITEM)]
并收到此错误:
Error in frollapply(.SD, n = 5, FUN = function(x, y) { : x must be list, data.frame or data.table of numeric or logical types
VARIATION 和 i.VARIATION 的元素都是数字,因此,我尝试通过执行以下操作将它们作为列表返回:
DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = .(VARIATION), y = .(i.VARIATION))}), by = .(ITEM)]
但是返回了同样的错误。
对于如何正确执行此操作使用 frollapply,您有任何提示或建议吗?
我认为这样的事情可能会起作用希望它有帮助
库(数据.表)
sample_size <- 10
DT0 <- data.table(
YEAR = seq(2021, by = -1, length.out = sample_size),
a1 = seq(5, by = 0.035, length.out = sample_size),
a2 = seq(12, by = 0.6, length.out = sample_size),
a3 = seq(10, by = 0.01, length.out = sample_size)
)
DT <- melt(DT0, id.vars = "YEAR", variable.name = "ITEM", value.name = "VARIATION")
setkey(DT, ITEM, YEAR)
items <- unique(DT$ITEM)
pairs <- CJ(item1 = items, item2 = items)
cov_answer <- pairs[, {
DT1 <- DT[ITEM == item1]
DT2 <- DT[ITEM == item2]
DT_merged <- DT1[DT2, on = "YEAR", nomatch = 0][, .(YEAR, item1, item2, VARIATION, i.VARIATION)]
DT_merged[, cov := frollapply(.SD, n = 5, FUN = function(x) cov(x[, 1], x[, 2]), na.rm = TRUE), .SDcols = c("VARIATION", "i.VARIATION")]
}, by = .(item1, item2)]