R 中数据集的“多项式展开”

Question

拥有一个具有任意数量的列 N 和行 T 的数据集，我想获得将列总和提升到任意程度 d 的多项式展开所隐含的所有列。

更清楚地说：以以下数据集为例，N = 3 和 T = 10，列名称为

、

set.seed(123)
ds <- cbind("a"=rnorm(10),"b"=rnorm(10),"c"=rnorm(10)); ds 

> ds
                a          b          c
 [1,] -0.56047565  1.2240818 -1.0678237
 [2,] -0.23017749  0.3598138 -0.2179749
 [3,]  1.55870831  0.4007715 -1.0260044
 [4,]  0.07050839  0.1106827 -0.7288912
 [5,]  0.12928774 -0.5558411 -0.6250393
 [6,]  1.71506499  1.7869131 -1.6866933
 [7,]  0.46091621  0.4978505  0.8377870
 [8,] -1.26506123 -1.9666172  0.1533731
 [9,] -0.68685285  0.7013559 -1.1381369
[10,] -0.44566197 -0.4727914  1.2538149

度 d = 2 所需的输出将是一个包含列的数据集 {

^2,

*

,

*

,

*

}，在本例中我可以手动指定为

out <- cbind(ds[,"a"]^2, ds[,"b"]^2, ds[,"c"]^2, ds[,"a"]*ds[,"b"], ds[,"a"]*ds[,"c"], ds[,"b"]*ds[,"c"])

我想知道有什么聪明的方法可以自动执行此操作，也许使用仅接受

ds

和 d 作为参数的函数。

编辑：正如 MWE 所建议的，我对多项系数并不真正感兴趣，所以请随意考虑它们的完整性与否。

编辑2：正如我在评论中所写，poly 函数似乎正是我所寻找的。然而，虽然对于少量列工作良好，但对于 26 列数据集，它已经在 2 级停止工作，并出现“错误：无法分配大小为 9469.2 Gb 的向量”。这对我来说看起来很奇怪，因为我们只讨论 350 列的输出。我的问题需要对此问题的解释或解决方案。

Answer 1

在基础 R 中，以下内容就足够了：

poly(ds, degree = 2, raw = TRUE)[,]
            1.0.0       2.0.0      0.1.0       1.1.0      0.2.0      0.0.1       1.0.1       0.1.1      0.0.2
 [1,] -0.56047565 0.314132950  1.2240818 -0.68606804 1.49837625 -1.0678237  0.59848918 -1.30710356 1.14024747
 [2,] -0.23017749 0.052981677  0.3598138 -0.08282104 0.12946599 -0.2179749  0.05017292 -0.07843039 0.04751306
 [3,]  1.55870831 2.429571609  0.4007715  0.62468579 0.16061776 -1.0260044 -1.59924166 -0.41119329 1.05268513
 [4,]  0.07050839 0.004971433  0.1106827  0.00780406 0.01225066 -0.7288912 -0.05139295 -0.08067566 0.53128242
 [5,]  0.12928774 0.016715318 -0.5558411 -0.07186344 0.30895937 -0.6250393 -0.08080991  0.34742254 0.39067409
 [6,]  1.71506499 2.941447909  1.7869131  3.06467216 3.19305856 -1.6866933 -2.89278864 -3.01397443 2.84493432
 [7,]  0.46091621 0.212443749  0.4978505  0.22946735 0.24785510  0.8377870  0.38614963  0.41709268 0.70188713
 [8,] -1.26506123 1.600379927 -1.9666172  2.48789113 3.86758304  0.1533731 -0.19402639 -0.30162620 0.02352331
 [9,] -0.68685285 0.471766840  0.7013559 -0.48172830 0.49190010 -1.1381369  0.78173260 -0.79823906 1.29535569
[10,] -0.44566197 0.198614592 -0.4727914  0.21070515 0.22353172  1.2538149 -0.55877763 -0.59279292 1.57205186

请注意，列名称显示了程度。即

1.0.0 = a

2.0.0 = a^2

1.1.0=a*b

等

您当然可以创建一个小函数来相应地更改名称：

namedPoly <- function(d, degree){
  x <- poly(d, degree = degree, raw = TRUE)[,]
  nms <- colnames(d)
  a <- t(read.table(text=colnames(x), sep='.'))
  b <- ifelse(a==0, "", ifelse(a==1, nms, paste0(nms, "^", a)))
  colnames(x) <- apply(b, 2, \(y)paste(y[nzchar(y)], collapse = "*"))
  x
}
 
 namedPoly(ds, 3)

Answer 2

可以用

x^2+y^2+z^2+xy+yz+zx

得到多项式

spray::homog(3, power = 2)

。

不幸的是，spray包中没有提取多项式项（“喷雾”）的功能。或者我没找到。于是我自己做了一个。我们还需要将每个术语作为字符串获取：

"a^2"

、

"ab"

等。所以我还做了一个函数来获取这些字符串。也许使用 mpoly 包或 mvp 包可以提供这样的功能，我没有检查。

最后，

spray

中的函数as.function可以将多项式转换为函数。所以我们拥有所需的一切。

set.seed(123)
ds <- cbind("a" = rnorm(10), "b" = rnorm(10), "c" = rnorm(10))

library(spray)
P <- homog(ncol(ds), power = 2)

# get a polynomial term like  xy  as  "a*b"
as_character_term <- function(trm) {
  ops <- options(polyform = TRUE, sprayvars = colnames(ds))
  string <- capture.output(print_spray_polyform(trm))
  options(ops)
  substring(string, 2L)
}

# make list of terms of polynomial
terms <- function(P) {
  exponents <- index(P)
  coefficients <- coeffs(P)
  out <- lapply(1L:length(P), function(i) {
    as.spray(list(exponents[i, , drop = FALSE], coefficients[i]))
  })
  names(out) <- lapply(out, as_character_term)
  out
}

sapply(terms(P), function(trm) {
  f <- as.function(trm)
  f(ds)
})

#              c^2         b*c         a*c        b^2         a*b         a^2
#  [1,] 1.14024747 -1.30710356  0.59848918 1.49837625 -0.68606804 0.314132950
#  [2,] 0.04751306 -0.07843039  0.05017292 0.12946599 -0.08282104 0.052981677
#  [3,] 1.05268513 -0.41119329 -1.59924166 0.16061776  0.62468579 2.429571609
#  [4,] 0.53128242 -0.08067566 -0.05139295 0.01225066  0.00780406 0.004971433
#  [5,] 0.39067409  0.34742254 -0.08080991 0.30895937 -0.07186344 0.016715318
#  [6,] 2.84493432 -3.01397443 -2.89278864 3.19305856  3.06467216 2.941447909
#  [7,] 0.70188713  0.41709268  0.38614963 0.24785510  0.22946735 0.212443749
#  [8,] 0.02352331 -0.30162620 -0.19402639 3.86758304  2.48789113 1.600379927
#  [9,] 1.29535569 -0.79823906  0.78173260 0.49190010 -0.48172830 0.471766840
# [10,] 1.57205186 -0.59279292 -0.55877763 0.22353172  0.21070515 0.198614592

Answer 3

此循环针对 1 到 d 之间的每个值返回

a^d, b^d, c^d, (a*b)^(d-1), (a*c)^(d-1), (b*c)^(d-1)

的矩阵。

set.seed(123)
ds <- cbind("a" = rnorm(10), "b" = rnorm(10), "c" = rnorm(10))
d <- 2
out <- numeric()

while (d > 0) {
  out <- cbind(ds[, "a"]^d, ds[, "b"]^d, ds[, "c"]^d)
  if (d > 1) {
    out <- cbind(out, (ds[, "a"] * ds[, "b"])^(d - 1), (ds[, "a"] * ds[, "c"])^(d - 1), (ds[, "b"] * ds[, "c"])^(d - 1))
  }
  print(out)
  d <- d - 1
}

因此，d=2 的矩阵将是第一个，d=1 的矩阵将是第二个。

             [,1]       [,2]       [,3]        [,4]        [,5]        [,6]
 [1,] 0.314132950 1.49837625 1.14024747 -0.68606804  0.59848918 -1.30710356
 [2,] 0.052981677 0.12946599 0.04751306 -0.08282104  0.05017292 -0.07843039
 [3,] 2.429571609 0.16061776 1.05268513  0.62468579 -1.59924166 -0.41119329
 [4,] 0.004971433 0.01225066 0.53128242  0.00780406 -0.05139295 -0.08067566
 [5,] 0.016715318 0.30895937 0.39067409 -0.07186344 -0.08080991  0.34742254
 [6,] 2.941447909 3.19305856 2.84493432  3.06467216 -2.89278864 -3.01397443
 [7,] 0.212443749 0.24785510 0.70188713  0.22946735  0.38614963  0.41709268
 [8,] 1.600379927 3.86758304 0.02352331  2.48789113 -0.19402639 -0.30162620
 [9,] 0.471766840 0.49190010 1.29535569 -0.48172830  0.78173260 -0.79823906
[10,] 0.198614592 0.22353172 1.57205186  0.21070515 -0.55877763 -0.59279292
             [,1]       [,2]       [,3]
 [1,] -0.56047565  1.2240818 -1.0678237
 [2,] -0.23017749  0.3598138 -0.2179749
 [3,]  1.55870831  0.4007715 -1.0260044
 [4,]  0.07050839  0.1106827 -0.7288912
 [5,]  0.12928774 -0.5558411 -0.6250393
 [6,]  1.71506499  1.7869131 -1.6866933
 [7,]  0.46091621  0.4978505  0.8377870
 [8,] -1.26506123 -1.9666172  0.1533731
 [9,] -0.68685285  0.7013559 -1.1381369
[10,] -0.44566197 -0.4727914  1.2538149

您还可以将矩阵保存在列表中：

set.seed(123)
ds <- cbind("a" = rnorm(10), "b" = rnorm(10), "c" = rnorm(10))
d <- 2
out <- numeric()
res <- list()

while (d > 0) {
  out <- cbind(ds[, "a"]^d, ds[, "b"]^d, ds[, "c"]^d)
  if (d > 1) {
    out <- cbind(out, (ds[, "a"] * ds[, "b"])^(d - 1), (ds[, "a"] * ds[, "c"])^(d - 1), (ds[, "b"] * ds[, "c"])^(d - 1))
  }
  res[[d]] <- out
  d <- d - 1
}

希望这有帮助！ 😃

R 中数据集的“多项式展开”

问题描述投票：0回答：3

3个回答

最新问题

R 中数据集的“多项式展开”

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3