我正在 R 中运行蒙特卡罗模拟,由于内存和时间限制,需要批量运行。我希望结果是可重复的。为此,我希望能够在每个批次结束时保存 RNG 的“随机状态”,然后在下一个批次开始时加载它以继续伪随机序列。我在执行此操作时遇到问题,特别是如果我在批次中生成多种类型的随机数(unif、norm、lognorm),再现性似乎不起作用。
我提供了一个我的理解的例子以及一个它何时不起作用的例子:
# check session rng_state
state_0 <- .Random.seed
# changes when a seed is specified
set.seed(42)
state_1 <- .Random.seed
all.equal(state_1,state_2)
# generate random numbers
unif_1 <- runif(10)
state_2 <- .Random.seed
# random state has now changed
all.equal(state_1,state_2)
# new random numbers are generated continuing the random sequence
unif_2 <- runif(10)
all.equal(unif_1,unif_2)
# reset random state to regenerate the random sequence
.Random.seed <- state_1
unif_1_2 <- runif(20)
all.equal(unif_1_2, c(unif_1,unif_2))
# i want to generate random numbers in batches and continue the random sequence
# whilst making sure it is reproducable
generate_rn <- function(num, rng_state){
.Random.seed <- rng_state
unif <- runif(num)
norm <- rnorm(num)
random_numbers <- cbind(unif,norm)
rng_state <- .Random.seed
return(list(random_numbers,rng_state))
}
# call function to generate random numbers for first batch using the starting random state
batch_1 <- generate_rn(10,state_1)
# call function to generate random numbers for second batch using the ending random state from batch 1
batch_2 <- generate_rn(10,batch_1[[2]])
batch_1_2 <- generate_rn(20,state_1)
# the random state after both functions are the same as we have generated 40 random numbers in each
all.equal(batch_1_2[[2]],batch_2[[2]])
# but the random numbers produced are not the same
all.equal(batch_1_2[[1]],rbind(batch_1[[1]],batch_2[[1]]))
# and the first 10 uniform random numbers are not the same as the first 10 uniform numbers generated
# above whilst supposedly using the same random state
all.equal(unif_1,batch_1[[1]][,1])
来自
?.Random.seed
:
可以保存和恢复,但用户不得更改。
你可以像这样使用
set.seed
吗?它将是完全可重现的。
generate_rn <- function(seed, num){
set.seed(seed)
cbind(runif(num), rnorm(num))
}
(seed0 <- sample(.Machine$integer.max, 1))
#> [1] 1394740963
set.seed(seed0)
nBatches <- 5
seeds <- sample(.Machine$integer.max, nBatches, 1)
res_1 <- lapply(seeds, generate_rn, num = 10)
res_2 <- lapply(seeds, generate_rn, num = 10)
identical(res_1, res_2)
#> [1] TRUE