强制numpy创建对象数组

Question

我有一个数组：

x = np.array([[1, 2, 3], [4, 5, 6]])

我想创建另一个shape=(1, 1)和dtype=np.object数组，其中唯一的元素是x。

我试过这段代码：

a = np.array([[x]], dtype=np.object)

但它产生一个形状(1, 1, 2, 3)阵列。

我当然可以这样做：

a = np.zeros(shape=(1, 1), dtype=np.object)
a[0, 0] = x

但我希望解决方案能够轻松扩展到更大的a形状，例如：

[[x, x], [x, x]]

无需在所有索引上运行for循环。

有任何建议如何实现这一目标？

UPD1

阵列可能不同，如：

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[7, 8, 9], [0, 1, 2]])
u = np.array([[3, 4, 5], [6, 7, 8]])
v = np.array([[9, 0, 1], [2, 3, 4]])
[[x, y], [u, v]]

它们也可能有不同的形状，但对于这种情况，一个简单的np.array([[x, y], [u, v]])构造函数工作正常

UPD2

我真的想要一个适用于任意x, y, u, v形状的解决方案，不一定都是一样的。

Answer 1

这是一个非常通用的方法：它适用于嵌套列表，数组列表列表 - 无论这些数组的形状是否不同或相等。它也适用于数据在一个阵列中聚集在一起的情况，这实际上是最棘手的情况。（到目前为止发布的其他方法在这种情况下不起作用。）

让我们从困难的案例开始，一个大阵列：

# create example
# pick outer shape and inner shape
>>> osh, ish = (2, 3), (2, 5)
# total shape
>>> tsh = (*osh, *ish)
# make data
>>> data = np.arange(np.prod(tsh)).reshape(tsh)
>>>
# recalculate inner shape to cater for different inner shapes
# this will return the consensus bit of all inner shapes
>>> ish = np.shape(data)[len(osh):]
>>> 
# block them
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> 
# admire
>>> data_blocked
array([[array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]),
        array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]]),
        array([[20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])],
       [array([[30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39]]),
        array([[40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]]),
        array([[50, 51, 52, 53, 54],
       [55, 56, 57, 58, 59]])]], dtype=object)

使用OP的示例，它是一个数组列表列表：

>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> y = np.array([[7, 8, 9], [0, 1, 2]])
>>> u = np.array([[3, 4, 5], [6, 7, 8]])
>>> v = np.array([[9, 0, 1], [2, 3, 4]])
>>> data = [[x, y], [u, v]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> 
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> data_blocked
array([[array([[1, 2, 3],
       [4, 5, 6]]),
        array([[7, 8, 9],
       [0, 1, 2]])],
       [array([[3, 4, 5],
       [6, 7, 8]]),
        array([[9, 0, 1],
       [2, 3, 4]])]], dtype=object)

并且有一个不同形状子阵列的例子（注意v.T）：

>>> data = [[x, y], [u, v.T]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)>>> data_blocked
array([[array([[1, 2, 3],
       [4, 5, 6]]),
        array([[7, 8, 9],
       [0, 1, 2]])],
       [array([[3, 4, 5],
       [6, 7, 8]]),
        array([[9, 2],
       [0, 3],
       [1, 4]])]], dtype=object)

Answer 2

4
投票

a = np.empty(shape=(2, 2), dtype=np.object)
a.fill(x)

Answer 3

@PaperPanzer使用np.frompyfunc很聪明，但所有reshaping和使用__getitem__都让人难以理解：

将函数创建与应用程序分离可能会有所帮助：

func = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)
newarr = func(range(np.prod(osh))).reshape(osh)

这凸显了ish尺寸与osh尺寸之间的分离。

我还怀疑lambda功能可以替代__getitem__。

这是有效的，因为frompyfunc返回一个对象dtype数组。 np.vectorize也使用frompyfunc，但让我们指定一个不同的otype。但两者都将标量传递给函数，这就是保罗的方法使用扁平的range和getitem的原因。带有np.vectorize的signature让我们将数组传递给函数，但它使用ndindex迭代而不是frompyfunc。

灵感来自于此，这是一个np.empty加填充方法 - 但是使用ndindex作为迭代器：

In [385]: >>> osh, ish = (2, 3), (2, 5)
     ...: >>> tsh = (*osh, *ish)
     ...: >>> data = np.arange(np.prod(tsh)).reshape(tsh)
     ...: >>> ish = np.shape(data)[len(osh):]
     ...: 
In [386]: tsh
Out[386]: (2, 3, 2, 5)
In [387]: ish
Out[387]: (2, 5)
In [388]: osh
Out[388]: (2, 3)
In [389]: res = np.empty(osh, object)
In [390]: for idx in np.ndindex(osh):
     ...:     res[idx] = data[idx]
     ...:     
In [391]: res
Out[391]: 
array([[array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]),
       ....
       [55, 56, 57, 58, 59]])]], dtype=object)

对于第二个例子：

In [399]: arr = np.array(data)
In [400]: arr.shape
Out[400]: (2, 2, 2, 3)
In [401]: res = np.empty(osh, object)
In [402]: for idx in np.ndindex(osh):
     ...:     res[idx] = arr[idx]

在第三种情况下，np.array(data)已经创建了所需的（2,2）对象dtype数组。这个res创建和填充仍然有效，即使它产生相同的东西。

速度差别不大（虽然这个例子很小）

In [415]: timeit data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__get
     ...: item__, 1, 1)(range(np.prod(osh))).reshape(osh)
49.8 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [416]: %%timeit
     ...: arr = np.array(data)
     ...: res = np.empty(osh, object)
     ...: for idx in np.ndindex(osh): res[idx] = arr[idx]
     ...: 
54.7 µs ± 68.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

请注意，当data是（嵌套）列表时，np.reshape(data, (-1, *ish)实际上是np.array(data).reshape(-1 *ish)。该列表必须首先变成一个数组。

除了速度之外，看一种方法是否比另一种方法更通用会很有趣。是否有一个处理的情况，但另一个不能？

Answer 4

自己找到了解决方案：

a=np.zeros(shape=(2, 2), dtype=np.object)
a[:] = [[x, x], [x, x]]

强制numpy创建对象数组

问题描述投票：4回答：4

4个回答

最新问题

强制numpy创建对象数组

问题描述 投票：4回答：4

4个回答

最新问题

问题描述投票：4回答：4