Polars 如何将 list[list[...]] 类型的列转换为 numpy ndarray

Question

我知道我可以通过

.to_numpy()

将普通的极坐标系转换为 numpy 数组。

import polars as pl

s = pl.Series("a", [1,2,3])

s.to_numpy()
# array([1, 2, 3])

但是这不适用于列表类型。将这样的构造转变成二维数组的方法是什么。

更一般的是，有没有办法将一系列列表[list[whatever]] 转换为 3-D 等等？

s = pl.Series("a", [[1,1,1],[1,2,3],[1,0,1]])

s.to_numpy()  
# exceptions.ComputeError: 'to_numpy' not supported for dtype: List(Int64)

期望的输出是：

array([[1, 1, 1],
       [1, 2, 3],
       [1, 0, 1]])

或者更进一步

s = pl.Series("a", [[[1,1],[1,2]],[[1,1],[1,1]]])

s.to_numpy()  
# exceptions.ComputeError: 'to_numpy' not supported for dtype: List(Int64)

期望的输出是：

array([[[1, 1],
        [1, 2]],

       [[1, 1],
        [1, 1]]])

Answer 1

您可以

explode

该系列，然后重塑 numpy 数组。这可能是当前

ComputeError

指定它在极坐标中不受支持的唯一方法。

list

dtype 可以具有行与行不同的列表长度，这会破坏任何像这样的计算，因此不支持它是有道理的。

也就是说，如果您知道列表列的每行长度都是统一的，则通常可以为

list

类型的任意嵌套编写此操作。它只涉及跟踪每个

explode

更改的尺寸，然后计算正确的新尺寸：

from itertools import pairwise

def multidimensional_to_numpy(s):
    dimensions = [1, len(s)]
    while s.dtype == pl.List:
        s = s.explode()
        dimensions.append(len(s))
    dimensions = [p[1] // p[0] for p in pairwise(dimensions)]
    return s.to_numpy().reshape(dimensions)

multidimensional_to_numpy(pl.Series("a", [1,2,3]))
array([1, 2, 3], dtype=int64

multidimensional_to_numpy(pl.Series("a", [[1,1,1],[1,2,3],[1,0,1]]))

array([[1, 1, 1],
       [1, 2, 3],
       [1, 0, 1]], dtype=int64)

multidimensional_to_numpy(pl.Series("a", [[[1,1],[1,2]], [[1,1],[1,1]]]))

array([[[1, 1],
        [1, 2]],

       [[1, 1],
        [1, 1]]], dtype=int64)

注意即将发布的Array dtype保证整个列的长度相同（并且当前的

arr

将变成

list

），这个答案可以在适当的时候得到改进（也许那里直接支持to_numpy）？）。特别是，上面的维度计算应该能够简化为跟踪每个内部数组 dtype 的

dtype.width

。

Polars 如何将 list[list[...]] 类型的列转换为 numpy ndarray

问题描述投票：0回答：1

1个回答

最新问题

Polars 如何将 list[list[...]] 类型的列转换为 numpy ndarray

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1