拥有一个具有任意数量的列 N 和行 T 的数据集,我想获得将列总和提升到任意程度 d 的多项式展开所隐含的所有列。
更清楚地说:以以下数据集为例,N = 3 和 T = 10,列名称为
a
、b
、c
set.seed(123)
ds <- cbind("a"=rnorm(10),"b"=rnorm(10),"c"=rnorm(10)); ds
> ds
a b c
[1,] -0.56047565 1.2240818 -1.0678237
[2,] -0.23017749 0.3598138 -0.2179749
[3,] 1.55870831 0.4007715 -1.0260044
[4,] 0.07050839 0.1106827 -0.7288912
[5,] 0.12928774 -0.5558411 -0.6250393
[6,] 1.71506499 1.7869131 -1.6866933
[7,] 0.46091621 0.4978505 0.8377870
[8,] -1.26506123 -1.9666172 0.1533731
[9,] -0.68685285 0.7013559 -1.1381369
[10,] -0.44566197 -0.4727914 1.2538149
度 d = 2 所需的输出将是一个包含列的数据集 {
a
^2, b
^2, c
^2, a
* b
, a
* c
, b
* c
},在本例中我可以手动指定为
out <- cbind(ds[,"a"]^2, ds[,"b"]^2, ds[,"c"]^2, ds[,"a"]*ds[,"b"], ds[,"a"]*ds[,"c"], ds[,"b"]*ds[,"c"])
我想知道有什么聪明的方法可以自动执行此操作,也许使用仅接受
ds
和 d 作为参数的函数。
编辑:正如 MWE 所建议的,我对多项系数并不真正感兴趣,所以请随意考虑它们的完整性与否。
编辑2:正如我在评论中所写,poly 函数似乎正是我所寻找的。然而,虽然对于少量列工作良好,但对于 26 列数据集,它已经在 2 级停止工作,并出现“错误:无法分配大小为 9469.2 Gb 的向量”。这对我来说看起来很奇怪,因为我们只讨论 350 列的输出。我的问题需要对此问题的解释或解决方案。
在基础 R 中,以下内容就足够了:
poly(ds, degree = 2, raw = TRUE)[,]
1.0.0 2.0.0 0.1.0 1.1.0 0.2.0 0.0.1 1.0.1 0.1.1 0.0.2
[1,] -0.56047565 0.314132950 1.2240818 -0.68606804 1.49837625 -1.0678237 0.59848918 -1.30710356 1.14024747
[2,] -0.23017749 0.052981677 0.3598138 -0.08282104 0.12946599 -0.2179749 0.05017292 -0.07843039 0.04751306
[3,] 1.55870831 2.429571609 0.4007715 0.62468579 0.16061776 -1.0260044 -1.59924166 -0.41119329 1.05268513
[4,] 0.07050839 0.004971433 0.1106827 0.00780406 0.01225066 -0.7288912 -0.05139295 -0.08067566 0.53128242
[5,] 0.12928774 0.016715318 -0.5558411 -0.07186344 0.30895937 -0.6250393 -0.08080991 0.34742254 0.39067409
[6,] 1.71506499 2.941447909 1.7869131 3.06467216 3.19305856 -1.6866933 -2.89278864 -3.01397443 2.84493432
[7,] 0.46091621 0.212443749 0.4978505 0.22946735 0.24785510 0.8377870 0.38614963 0.41709268 0.70188713
[8,] -1.26506123 1.600379927 -1.9666172 2.48789113 3.86758304 0.1533731 -0.19402639 -0.30162620 0.02352331
[9,] -0.68685285 0.471766840 0.7013559 -0.48172830 0.49190010 -1.1381369 0.78173260 -0.79823906 1.29535569
[10,] -0.44566197 0.198614592 -0.4727914 0.21070515 0.22353172 1.2538149 -0.55877763 -0.59279292 1.57205186
请注意,列名称显示了程度。即
1.0.0 = a
2.0.0 = a^2
1.1.0=a*b
等
您当然可以创建一个小函数来相应地更改名称:
namedPoly <- function(d, degree){
x <- poly(d, degree = degree, raw = TRUE)[,]
nms <- colnames(d)
a <- t(read.table(text=colnames(x), sep='.'))
b <- ifelse(a==0, "", ifelse(a==1, nms, paste0(nms, "^", a)))
colnames(x) <- apply(b, 2, \(y)paste(y[nzchar(y)], collapse = "*"))
x
}
namedPoly(ds, 3)
可以用
x^2+y^2+z^2+xy+yz+zx
得到多项式 spray::homog(3, power = 2)
。
不幸的是,spray包中没有提取多项式项(“喷雾”)的功能。或者我没找到。于是我自己做了一个。我们还需要将每个术语作为字符串获取:
"a^2"
、"ab"
等。所以我还做了一个函数来获取这些字符串。也许使用 mpoly 包或 mvp 包可以提供这样的功能,我没有检查。
最后,
spray中的函数
as.function
可以将多项式转换为函数。所以我们拥有所需的一切。
set.seed(123)
ds <- cbind("a" = rnorm(10), "b" = rnorm(10), "c" = rnorm(10))
library(spray)
P <- homog(ncol(ds), power = 2)
# get a polynomial term like xy as "a*b"
as_character_term <- function(trm) {
ops <- options(polyform = TRUE, sprayvars = colnames(ds))
string <- capture.output(print_spray_polyform(trm))
options(ops)
substring(string, 2L)
}
# make list of terms of polynomial
terms <- function(P) {
exponents <- index(P)
coefficients <- coeffs(P)
out <- lapply(1L:length(P), function(i) {
as.spray(list(exponents[i, , drop = FALSE], coefficients[i]))
})
names(out) <- lapply(out, as_character_term)
out
}
sapply(terms(P), function(trm) {
f <- as.function(trm)
f(ds)
})
# c^2 b*c a*c b^2 a*b a^2
# [1,] 1.14024747 -1.30710356 0.59848918 1.49837625 -0.68606804 0.314132950
# [2,] 0.04751306 -0.07843039 0.05017292 0.12946599 -0.08282104 0.052981677
# [3,] 1.05268513 -0.41119329 -1.59924166 0.16061776 0.62468579 2.429571609
# [4,] 0.53128242 -0.08067566 -0.05139295 0.01225066 0.00780406 0.004971433
# [5,] 0.39067409 0.34742254 -0.08080991 0.30895937 -0.07186344 0.016715318
# [6,] 2.84493432 -3.01397443 -2.89278864 3.19305856 3.06467216 2.941447909
# [7,] 0.70188713 0.41709268 0.38614963 0.24785510 0.22946735 0.212443749
# [8,] 0.02352331 -0.30162620 -0.19402639 3.86758304 2.48789113 1.600379927
# [9,] 1.29535569 -0.79823906 0.78173260 0.49190010 -0.48172830 0.471766840
# [10,] 1.57205186 -0.59279292 -0.55877763 0.22353172 0.21070515 0.198614592
此循环针对 1 到 d 之间的每个值返回
a^d, b^d, c^d, (a*b)^(d-1), (a*c)^(d-1), (b*c)^(d-1)
的矩阵。
set.seed(123)
ds <- cbind("a" = rnorm(10), "b" = rnorm(10), "c" = rnorm(10))
d <- 2
out <- numeric()
while (d > 0) {
out <- cbind(ds[, "a"]^d, ds[, "b"]^d, ds[, "c"]^d)
if (d > 1) {
out <- cbind(out, (ds[, "a"] * ds[, "b"])^(d - 1), (ds[, "a"] * ds[, "c"])^(d - 1), (ds[, "b"] * ds[, "c"])^(d - 1))
}
print(out)
d <- d - 1
}
因此,d=2 的矩阵将是第一个,d=1 的矩阵将是第二个。
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.314132950 1.49837625 1.14024747 -0.68606804 0.59848918 -1.30710356
[2,] 0.052981677 0.12946599 0.04751306 -0.08282104 0.05017292 -0.07843039
[3,] 2.429571609 0.16061776 1.05268513 0.62468579 -1.59924166 -0.41119329
[4,] 0.004971433 0.01225066 0.53128242 0.00780406 -0.05139295 -0.08067566
[5,] 0.016715318 0.30895937 0.39067409 -0.07186344 -0.08080991 0.34742254
[6,] 2.941447909 3.19305856 2.84493432 3.06467216 -2.89278864 -3.01397443
[7,] 0.212443749 0.24785510 0.70188713 0.22946735 0.38614963 0.41709268
[8,] 1.600379927 3.86758304 0.02352331 2.48789113 -0.19402639 -0.30162620
[9,] 0.471766840 0.49190010 1.29535569 -0.48172830 0.78173260 -0.79823906
[10,] 0.198614592 0.22353172 1.57205186 0.21070515 -0.55877763 -0.59279292
[,1] [,2] [,3]
[1,] -0.56047565 1.2240818 -1.0678237
[2,] -0.23017749 0.3598138 -0.2179749
[3,] 1.55870831 0.4007715 -1.0260044
[4,] 0.07050839 0.1106827 -0.7288912
[5,] 0.12928774 -0.5558411 -0.6250393
[6,] 1.71506499 1.7869131 -1.6866933
[7,] 0.46091621 0.4978505 0.8377870
[8,] -1.26506123 -1.9666172 0.1533731
[9,] -0.68685285 0.7013559 -1.1381369
[10,] -0.44566197 -0.4727914 1.2538149
您还可以将矩阵保存在列表中:
set.seed(123)
ds <- cbind("a" = rnorm(10), "b" = rnorm(10), "c" = rnorm(10))
d <- 2
out <- numeric()
res <- list()
while (d > 0) {
out <- cbind(ds[, "a"]^d, ds[, "b"]^d, ds[, "c"]^d)
if (d > 1) {
out <- cbind(out, (ds[, "a"] * ds[, "b"])^(d - 1), (ds[, "a"] * ds[, "c"])^(d - 1), (ds[, "b"] * ds[, "c"])^(d - 1))
}
res[[d]] <- out
d <- d - 1
}
希望这有帮助! 😃