我正在开发一个大学项目,该项目模拟 6/49 乐透运气游戏。
通过选择 1 到 49 之间的 6 个不同的数字来创建一张票。 我要模拟10^6张彩票并将其与幸运彩票进行比较。 我想找出随机彩票的 6 个数字与幸运彩票的 6 个数字的交集的基数。但如果我对 10^6 张彩票做一次,得到一张与幸运彩票完美匹配的彩票的概率就很低了。所以我尝试用 10^6 张随机新票执行此过程“nr_esantion”次并计算平均值。
这是我的方法:
nr_esantion<-30
es<-10^6
nr_49<-1:49
table_1 <- rep(0,7)
table_1
for(i in 1:nr_esantion)
{
lucky <- sample(nr_49,6,replace=FALSE)
inters <- replicate(es,length(intersect(sample(nr_49,6,replace=FALSE),lucky)),simplify="array")
aux <- array(table(inters))
# There are situations in which at one step "i", there are no intersections of cardinal 5 or 6.
if(length(aux)==5)
{
aux <- c(aux,0,0)
}
else{
if(length(aux)==6)
{
aux <- c(aux,0)
}
}
table_1 <- table_1+aux
}
table_1 <- table_1/nr_esantion
我对得到的结果很满意,但我的问题是,对于 10^6$ 的门票,一次迭代大约需要 20 秒。因此总共 30 次迭代大约需要 10 分钟。对于某些项目任务,我必须将“nr_esantion”更改为 300。
我的问题:是否有更快的方法来计算这些 $10^6 * 300$ 样本?
library(parallel)
nr_esantion <- 30
es <- 10^6
nr_49 <- 1:49
table_1 <- integer(7) # Initialize table with integer values for better performance
# Parallel processing setup
library(parallel)
cl <- makeCluster(detectCores() - 1) # Leave one core free for system processes
clusterExport(cl, varlist = c("nr_49", "es")) # Export variables to each node
clusterEvalQ(cl, {library(methods)}) # Load any required packages on each node
system.time({
# Use parSapply for parallel computation
results <- parSapply(cl, 1:nr_esantion, function(x) {
# Generate a new set of tickets and a lucky ticket for each iteration
lucky_ticket <- sample(nr_49, 6, replace = FALSE)
all_tickets <- matrix(sample(nr_49, es * 6, replace = TRUE), ncol = 6, byrow = TRUE)
# Calculate the intersections
intersections <- apply(all_tickets, 1, function(ticket) {
length(intersect(ticket, lucky_ticket))
})
# Tabulate the results
tabulate(intersections + 1, nbins = 7)
})
stopCluster(cl) # Stop the cluster after computation is done
})
table_1 <- colSums(results) / nr_esantion
table_1
这将使其速度提高一个数量级以上(您也可以并行执行此操作):
set.seed(895636177)
nr_esantion <- 30L
es <- 1e6L
nr_49 <- 49L
library(RcppAlgos) # for `permuteSample`
table_1 <- integer(6)
system.time({
for (i in 1:nr_esantion) {
lucky <- sample(nr_49, 6)
table_1 <- table_1 + tabulate(
rowSums(
matrix(match(permuteSample(nr_49, 6, n = 1e6), lucky, 0L), 1e6, 6) != 0L
), 6
)
}
table_1 <- c(nr_esantion*es - sum(table_1), table_1)
})
#> user system elapsed
#> 22.00 1.16 24.11
table_1
#> [1] 13079255 12390731 3970451 530064 28966 529 4