这个问题在这里已有答案:
我想生成随机样本而无需替换N
次,如下所示:
import numpy as np
sample = np.zeros([100000, 4], int)
for i in range(100000):
sample[i] = np.random.choice(128, 4, replace=False)
如果迭代变得非常大,则整体采样将是耗时的。有没有办法加快这个采样?
你的方法
In [16]: sample = np.zeros([100000, 4], int)
In [17]: %timeit for i in range(100000):sample[i] = np.random.choice(128, 4, rep
...: lace=False)
1 loop, best of 3: 2.5 s per loop
虽然你可以写:
In [149]: %timeit d=np.random.choice(128,100000);sample1=np.array([(d+x)%128 for x in np.random.choice(128,4)])
The slowest run took 4.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 4.11 ms per loop
这在我的机器上更快
这可能不那么随机,但这取决于您的应用程序。毕竟for
循环在香草python中非常慢。你可能对Cython或Numba感兴趣
这将给你一个随机的int范围(0,128)形状(100000,4)
np.random.randint(128, size=(100000,4))
使用random.sample
而不是np.random.choice
In [16]: import time
...: start_time = time.time()
...: sample = np.zeros([100000, 4], int)
...: for i in range(100000):
...: sample[i] = random.sample(range(128), 4)
...: print("--- %s seconds ---" % (time.time() - start_time))
...:
--- 0.7096474170684814 seconds ---
In [17]: import time
...: start_time = time.time()
...: sample = np.zeros([100000, 4], int)
...: for i in range(100000):
...: sample[i] = np.random.choice(128, 4, replace=False)
...: print("--- %s seconds ---" % (time.time() - start_time))
...:
--- 5.2036824226379395 seconds ---