将 itertools 数组转换为 numpy 数组

Question

我正在创建这个数组：

A=itertools.combinations(range(6),2)

我必须用 numpy 操作这个数组，例如：

A.reshape(..

如果尺寸 A 较高，则命令

list(A)

太慢。

如何将 itertools 数组“转换”为 numpy 数组？

更新1：我尝试过 hpaulj 的解决方案，在这种特定情况下有点慢，有什么想法吗？

start=time.clock()

A=it.combinations(range(495),3)
A=np.array(list(A))
print A

stop=time.clock()
print stop-start
start=time.clock()

A=np.fromiter(it.chain(*it.combinations(range(495),3)),dtype=int).reshape (-1,3)
print A

stop=time.clock()
print stop-start

结果：

[[  0   1   2]
 [  0   1   3]
 [  0   1   4]
 ..., 
 [491 492 494]
 [491 493 494]
 [492 493 494]]
10.323822
[[  0   1   2]
 [  0   1   3]
 [  0   1   4]
 ..., 
 [491 492 494]
 [491 493 494]
 [492 493 494]]
12.289898

Answer 1

我重新打开此问题是因为我不喜欢链接的答案。接受的答案建议使用

np.array(list(A))  # producing a (15,2) array

但是OP显然已经尝试过

list(A)

，并且发现它很慢。

另一个答案建议使用

np.fromiter

。但隐藏在其注释中的是

fromiter

需要一维数组。

In [102]: A=itertools.combinations(range(6),2)
In [103]: np.fromiter(A,dtype=int)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-103-29db40e69c08> in <module>()
----> 1 np.fromiter(A,dtype=int)

ValueError: setting an array element with a sequence.

因此，将

fromiter

与此 itertools 一起使用需要以某种方式展平迭代器。

一组快速的计时表明

list

并不是慢一步。它将列表转换为数组的速度很慢：

In [104]: timeit itertools.combinations(range(6),2)
1000000 loops, best of 3: 1.1 µs per loop
In [105]: timeit list(itertools.combinations(range(6),2))
100000 loops, best of 3: 3.1 µs per loop
In [106]: timeit np.array(list(itertools.combinations(range(6),2)))
100000 loops, best of 3: 14.7 µs per loop

我认为使用

fromiter

最快的方法是用

combinations

的惯用用法来压平

itertools.chain

:

In [112]: timeit
np.fromiter(itertools.chain(*itertools.combinations(range(6),2)),dtype=int)
   .reshape(-1,2)
100000 loops, best of 3: 12.1 µs per loop

并没有节省多少时间，至少在这么小的尺寸上是这样。（

fromiter

还需要一个

count

，这又减少了 µs。对于较大的情况，

range(60)

，

fromiter

需要

array

的一半时间。

对

[numpy] itertools

进行快速搜索会发现许多生成所有组合的纯 numpy 方法的建议。

itertools

速度很快，用于生成纯 Python 结构，但将它们转换为数组是一个缓慢的步骤。

这个问题有一个挑剔的地方。

是一个生成器，而不是一个数组。

list(A)

确实会生成一个嵌套列表，可以将其宽松地描述为数组。但它不是

np.array

，并且没有

reshape

方法。

Answer 2

获取

元素的每个成对组合的另一种方法是使用

(N, N)

生成

np.triu_indices(N, k=1)

矩阵的上三角形的索引，例如：

np.vstack(np.triu_indices(6, k=1)).T

对于小数组，

itertools.combinations

会获胜，但对于大N，

triu_indices

技巧可以更快：

In [1]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(6), 2)), np.int)
The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 4.04 µs per loop

In [2]: %timeit np.array(np.triu_indices(6, 1)).T
The slowest run took 10.97 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 22.3 µs per loop

In [3]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(1000), 2)), np.int)
10 loops, best of 3: 69.7 ms per loop

In [4]: %timeit np.array(np.triu_indices(1000, 1)).T
100 loops, best of 3: 10.6 ms per loop

Answer 3

我知道问题提出已经快 10 年了，但我想把它放在这里，以防其他人正在寻找它。

我记得在我之前的项目中对 itertools.product 做了类似的事情，所以我回去发现了类似这样的代码：

import numpy as np
import itertools

def wrapper(range1, range2):
    iterable = itertools.product(range1, range2)
    for combo in iterable:
        yield from combo
array = np.fromiter(wrapper(range(1,5), range(1,5)), dtype=int).reshape(-1, 2)
print(array)

对于OP的情况，可以是这样的：

import numpy as np
import itertools

def wrapper():
    A = itertools.combinations(range(6),2)
    for combo in A:
        yield from combo
array = np.fromiter(wrapper(), dtype=int).reshape(-1,2)
print(array)

我不确定这是否会比不使用列表运行得更快。我用它只是为了防止为大型产品创建大型中间列表。

将 itertools 数组转换为 numpy 数组

问题描述投票：0回答：3

如何将 itertools 数组“转换”为 numpy 数组？

3个回答

最新问题

将 itertools 数组转换为 numpy 数组

问题描述 投票：0回答：3

如何将 itertools 数组“转换”为 numpy 数组？

3个回答

最新问题

问题描述投票：0回答：3