在 Rcpp 或 RcppArmadillo 中排列矩阵行的快速方法？

Question

我正在 N x M 矩阵 X 上运行固定引导算法，其中 N 和 M 的数量级均为 1500 到 3000。

索引排列的引导矩阵 Y 为 N x B，其中 B 为 10,000。

在 R 语法中，目标是计算：

sapply(1:B, function(b) colSums(X[Y[,b],]))

也就是说，我们重新排列 X 的行（可能有重复）并对列求和 -10,000 次。

上面的代码大约需要 3 分钟，其中 N = 1500，M = 2000，B = 10000。

将代码转换为 Rcpp 将其缩短到大约 25 秒：

// [[Rcpp::export]]
NumericVector get_bootstrap_statistic_mean(NumericMatrix x, IntegerMatrix theta){
    
    int nr = x.nrow();
    int nc = x.ncol(); 
    int nb = theta.ncol();
    NumericMatrix res(nb, nc);
    
        
    for(int j = 0; j < nc; j++){
        
        NumericMatrix::Column x_col_j = x.column(j);
        NumericMatrix::Column res_col_j = res.column(j);
        
        for(int b = 0; b < nb; b++){
    
            IntegerMatrix::Column theta_col_b = theta.column(b);
            
            double sum = 0.0;
            for(int i = 0; i < nr; i++){
            
                sum += x_col_j[theta_col_b[i]-1]; //Have to subtract one to map the R indices (start at 1) to the C++ indices (start at 0)
            
            }
            res_col_j[b] = sum / nr;
        }
    }
    return res;
}

但是这篇post展示了比上面的嵌套循环更快的获取列总和的方法。

因此，有没有一种方法可以在 C++ 中创建排列矩阵（Rcpp、RcppArmadillo），并且比这样做更快：

sapply(1:B, function(b) Arma_colSums(X[Y[,b],])

N = 1500、M = 2000、B = 10000 大约需要 20 秒（其中 Arma_colSums（或多或少）是链接帖子中最快的方法），以便我们可以将 Arma_colSums 应用于置换矩阵？

我查看了 RcppArmadillo 子集文档，其中大部分似乎是在获取行或列向量，而不是重新排列整个矩阵。并且“sub2ind”功能似乎不适用，或者如果适用，则与使用上述更快的方法之一相比，将排列矩阵置于所需的形式需要更长的时间。

我还查看了 RcppArmadillo 介绍中的引导程序示例，但它在 X 的单列上使用 IID 引导程序（而这里的 X 有数千列）。

我尝试了chatGPT，但它无法提供任何可编译的代码。

Answer 1

改用矩阵乘法：

library(Rfast) # for `colTabulate` and `Crossprod`

system.time(Z1 <- get_bootstrap_statistic_mean(X, Y))
#>    user  system elapsed 
#>   28.17    0.01   28.19
system.time(Z2 <- Crossprod(X, colTabulate(Y, N)))
#>    user  system elapsed 
#>   26.30    1.36    3.81

all.equal(Z1, t(Z2)/N)
#> [1] TRUE

数据：

N <- 15e2L
M <- 2e3L
B <- 1e4L

X <- matrix(runif(N*M), N, M)
Y <- matrix(sample(N, N*B, 1), N, B)

在 Rcpp 或 RcppArmadillo 中排列矩阵行的快速方法？

问题描述投票：0回答：1

1个回答

最新问题

在 Rcpp 或 RcppArmadillo 中排列矩阵行的快速方法？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1