Pandas / Polars：将 JSON 列表写入数据库失败，并显示“ndarray 不是 json 可序列化”

Question

我有多个 json 列，我将它们连接到一个 json 列数组。 DataFarme 看起来像这样

┌─────────────────────────────────┐
│ json_concat                     │
│ ---                             │
│ list[str]                       │
╞═════════════════════════════════╡
│ ["{"integer_col":52,"string_co… │
│ ["{"integer_col":93,"string_co… │
│ ["{"integer_col":15,"string_co… │
│ ["{"integer_col":72,"string_co… │
│ ["{"integer_col":61,"string_co… │
│ ["{"integer_col":21,"string_co… │
│ ["{"integer_col":83,"string_co… │
│ ["{"integer_col":87,"string_co… │
│ ["{"integer_col":75,"string_co… │
│ ["{"integer_col":75,"string_co… │
└─────────────────────────────────┘

这是极坐标的输出

glimpse

Rows: 10
Columns: 1
$ json_concat <list[str]> ['{"integer_col":52,"string_col":"v"}', '{"float_col":86.61761457749351,"bool_col":true}', '{"datetime_col":"2021-01-01 00:00:00","categorical_col":"Category3"}'], ['{"integer_col":93,"string_col":"l"}', '{"float_col":60.11150117432088,"bool_col":false}', '{"datetime_col":"2021-01-02 00:00:00","categorical_col":"Category2"}'], ['{"integer_col":15,"string_col":"y"}', '{"float_col":70.80725777960456,"bool_col":false}', '{"datetime_col":"2021-01-03 00:00:00","categorical_col":"Category1"}'], ['{"integer_col":72,"string_col":"q"}', '{"float_col":2.0584494295802447,"bool_col":true}', '{"datetime_col":"2021-01-04 00:00:00","categorical_col":"Category2"}'], ['{"integer_col":61,"string_col":"j"}', '{"float_col":96.99098521619943,"bool_col":true}', '{"datetime_col":"2021-01-05 00:00:00","categorical_col":"Category2"}'], ['{"integer_col":21,"string_col":"p"}', '{"float_col":83.24426408004217,"bool_col":true}', '{"datetime_col":"2021-01-06 00:00:00","categorical_col":"Category2"}'], ['{"integer_col":83,"string_col":"o"}', '{"float_col":21.233911067827616,"bool_col":true}', '{"datetime_col":"2021-01-07 00:00:00","categorical_col":"Category1"}'], ['{"integer_col":87,"string_col":"o"}', '{"float_col":18.182496720710063,"bool_col":true}', '{"datetime_col":"2021-01-08 00:00:00","categorical_col":"Category2"}'], ['{"integer_col":75,"string_col":"s"}', '{"float_col":18.34045098534338,"bool_col":true}', '{"datetime_col":"2021-01-09 00:00:00","categorical_col":"Category1"}'], ['{"integer_col":75,"string_col":"l"}', '{"float_col":30.42422429595377,"bool_col":true}', '{"datetime_col":"2021-01-10 00:00:00","categorical_col":"Category2"}']

我想将 json 列写入名为

testing

的表中。我尝试了

pd.DataFrame.to_sql()

和

pl.DataFrame.write_database()

都失败了，并出现类似的错误

错误

最重要的部分是这个 sqlalchemy.exc.StatementError: (builtins.TypeError) ndarray 类型的对象不是 JSON 可序列化的

File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
sqlalchemy.exc.StatementError: (builtins.TypeError) Object of type ndarray is not JSON serializable
[SQL: INSERT INTO some_schema.testing (json_concat) VALUES (%(json_concat)s)]
[parameters: [{'json_concat': array(['{"integer_col":52,"string_col":"v"}',
       '{"float_col":86.61761457749351,"bool_col":true}',
       '{"datetime_col":"2021-01-01 00:00:00","categorical_col":"Category3"}'],
      dtype=object)}, 
      # ... abbreviated
      dtype=object)}, {'json_concat': array(['{"integer_col":75,"string_col":"l"}',
       '{"float_col":30.42422429595377,"bool_col":true}',
       '{"datetime_col":"2021-01-10 00:00:00","categorical_col":"Category2"}'],
      dtype=object)}]]

产生错误的代码

（以熊猫为例）

df_pandas.to_sql(
    "testing",
    con=engines.engine,
    schema=schema,
    index=False,
    if_exists="append",
    dtype=DTYPE,
)

问题

我需要如何准备连接的 json 列才能使其可 json 序列化？

MRE（创建示例数据）

from typing import Any
import numpy as np
import pandas as pd
import polars as pl
from myengines import engines
from sqlalchemy import dialects, text

schema = "some_schema"
# Seed for reproducibility
np.random.seed(42)

n = 10

# Generate random data
integer_col = np.random.randint(1, 100, n)
float_col = np.random.random(n) * 100
string_col = np.random.choice(list("abcdefghijklmnopqrstuvwxyz"), n)
bool_col = np.random.choice([True, False], n)
datetime_col = pd.date_range(start="2021-01-01", periods=n, freq="D")
categorical_col = np.random.choice(["Category1", "Category2", "Category3"], n)

# Creating the DataFrame
df = pl.DataFrame(
    {
        "integer_col": integer_col,
        "float_col": float_col,
        "string_col": string_col,
        "bool_col": bool_col,
        "datetime_col": datetime_col,
        "categorical_col": categorical_col,
    }
)



df = df.select(
    pl.struct(pl.col("integer_col", "string_col")).struct.json_encode().alias("json1"),
    pl.struct(pl.col("float_col", "bool_col")).struct.json_encode().alias("json2"),
    pl.struct(pl.col("datetime_col", "categorical_col"))
    .struct.json_encode()
    .alias("json3"),
).select(pl.concat_list(pl.col(["json1", "json2", "json3"])).alias("json_concat"))


DTYPE: dict[str, Any] = {"json_concat": dialects.postgresql.JSONB}

Answer 1

不幸的是，没有 Polars 函数可以将列表序列化为 JSON 数组。以下是手动操作的方法：

df = df.select(
    pl.struct(pl.col("integer_col", "string_col")).struct.json_encode().alias("json1"),
    pl.struct(pl.col("float_col", "bool_col")).struct.json_encode().alias("json2"),
    pl.struct(pl.col("datetime_col", "categorical_col")).struct.json_encode().alias("json3"),
).select(
    pl.format("[{}]", pl.concat_list(pl.col(["json1", "json2", "json3"])).list.join(",")).alias("json_concat")
)

engine = create_engine("postgresql+psycopg2://postgres:postgres@localhost:5432/postgres", echo=True)
df.write_database(
    "testing",
    connection=engine,
    if_table_exists="append",
)

此外，在表达式中，字符串被读取为列名称，因此不需要

pl.col

。这是清理后的代码：

df = df.select(
    pl.struct("integer_col", "string_col").struct.json_encode().alias("json1"),
    pl.struct("float_col", "bool_col").struct.json_encode().alias("json2"),
    pl.struct("datetime_col", "categorical_col").struct.json_encode().alias("json3"),
).select(
    pl.format("[{}]", pl.concat_list("json1", "json2", "json3").list.join(",")).alias("json_concat")
)

Pandas / Polars：将 JSON 列表写入数据库失败，并显示“ndarray 不是 json 可序列化”

问题描述投票：0回答：1

错误

产生错误的代码

问题

MRE（创建示例数据）

1个回答

最新问题

Pandas / Polars：将 JSON 列表写入数据库失败，并显示“ndarray 不是 json 可序列化”

问题描述 投票：0回答：1

错误

产生错误的代码

问题

MRE（创建示例数据）

1个回答

最新问题

问题描述投票：0回答：1