使用 Python Polars 读取 JSON 文件时出错

问题描述 投票:0回答:1

我正在尝试使用 Python Polars 读取 GeoJSON,如下所示:

import polars as pl
myfile = '{"type":"GeometryCollection","geometries":[{"type":"Linestring","coordinates":[[10,11.2],[10.5,11.9]]},{"type":"Point","coordinates":[10,20]}]}'
pl.read_json(myfile) 

我得到的错误是:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\local-packages\Python39\site-packages\polars\functions.py", line 631, in read_json    return DataFrame.read_json(source)  # type: ignore
  File "...\local-packages\Python39\site-packages\polars\frame.py", line 346, in read_json    
    self._df = PyDataFrame.read_json(file)
RuntimeError: Other("Error(\"missing field `columns`\", line: 1, column: 143)")

我也尝试过将相同的内容放入文件中,但遇到了类似的错误。

按照 GitHub 中的建议,我尝试通过 Pandas 读取文件,如下所示:

import pandas as pd initial_df = pl.from_pandas(pd.read_json(file_path))
我得到的错误是:

File "...\file_splitter.py", line 13, in split_file initial_df = pl.from_pandas(pd.read_json(file_path)) File "...\local-packages\Python39\site-packages\polars\functions.py", line 566, in from_pandas data[name] = _from_pandas_helper(s) File "...\local-packages\Python39\site-packages\polars\functions.py", line 534, in _from_pandas_helper return pa.array(a) File "pyarrow\array.pxi", line 302, in pyarrow.lib.array File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array File "pyarrow\error.pxi", line 97, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: cannot mix list and non-list, non-null values
如何读取 GeoJSON 文件?

python json python-polars
1个回答
2
投票

更新: 该示例现在可以按预期在 Polars 中运行。

pl.read_json(myfile.encode())
shape: (1, 2)
┌────────────────────┬─────────────────────────────────┐
│ type               ┆ geometries                      │
│ ---                ┆ ---                             │
│ str                ┆ list[struct[2]]                 │
╞════════════════════╪═════════════════════════════════╡
│ GeometryCollection ┆ [{"Linestring",[[10.0, 11.2], … │
└────────────────────┴─────────────────────────────────┘


如果你用 pandas 读取文件,你会得到

Object

 类型的列,其中 
Arrow
 不知道(它可以是任何东西)。

如果我们将列转换为字符串类型,我们就知道箭头和极坐标可以处理它。

myfile = '{"type":"GeometryCollection","geometries":[{"type":"Linestring","coordinates":[[10,11.2],[10.5,11.9]]},{"type":"Point","coordinates":[10,20]}]}' print(pl.from_pandas(pd.read_json(myfile).astype(str)))
shape: (2, 2)
┌────────────────────┬─────────────────────────────────────┐
│ type               ┆ geometries                          │
│ ---                ┆ ---                                 │
│ str                ┆ str                                 │
╞════════════════════╪═════════════════════════════════════╡
│ GeometryCollection ┆ {'type': 'Linestring', 'coordina... │
│ GeometryCollection ┆ {'type': 'Point', 'coordinates':... │
└────────────────────┴─────────────────────────────────────┘

    
© www.soinside.com 2019 - 2024. All rights reserved.