Polars 可以导入 SQLite BLOB 字段吗?

问题描述 投票:0回答:1

我正在尝试将 SQLite 表导入到 Polars df 中,如下所示:

import polars as pl

uri = "sqlite:///tmp/dbname.db"
query = ("SELECT * from alib")
df = pl.read_database_uri(query=query, uri=uri)

SQLite 表中的第一列称为 __path,其数据类型为 BLOB。

当我尝试将其导入 Polars df 时,它会抛出:

Traceback (most recent call last):
  File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/_utils.py", line 54, in _read_sql_connectorx
    tbl = cx.read_sql(
          ^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/connectorx/__init__.py", line 386, in read_sql
    result = _read_sql(
             ^^^^^^^^^^
RuntimeError: Invalid column type Text at index: 0, name: __path
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/functions.py", line 396, in read_database_uri
    return _read_sql_connectorx(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/_utils.py", line 66, in _read_sql_connectorx
    raise type(err)(errmsg) from err
RuntimeError: Invalid column type Text at index: 0, name: __path
query = ("SELECT * from alib")
df = pl.read_database_uri(query=query, uri=uri)
Traceback (most recent call last):
  File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/_utils.py", line 54, in _read_sql_connectorx
    tbl = cx.read_sql(
          ^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/connectorx/__init__.py", line 386, in read_sql
    result = _read_sql(
             ^^^^^^^^^^
RuntimeError: Invalid column type Text at index: 0, name: __path
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/functions.py", line 396, in read_database_uri
    return _read_sql_connectorx(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/_utils.py", line 66, in _read_sql_connectorx
    raise type(err)(errmsg) from err
RuntimeError: Invalid column type Text at index: 0, name: __path

Polars 根本不处理 SQLite BLOB 字段吗?或者我需要做些什么才能工作?

根据要求的架构:

CREATE TABLE alib (
    __path                            BLOB UNIQUE,
    __filename                        BLOB,
    __dirpath                         BLOB,
    __filename_no_ext                 BLOB,
    __ext                             BLOB,
    __accessed                        TEXT,
    __app                             TEXT,
    __bitrate                         TEXT,
    __bitspersample                   TEXT,
    __bitrate_num                     TEXT,
    __frequency_num                   TEXT,
    __frequency                       TEXT,
    __channels                        TEXT,
    __created                         TEXT,
    __dirname                         TEXT,
    __file_access_date                TEXT,
    __file_access_datetime            TEXT,
    __file_access_datetime_raw        TEXT,
    __file_create_date                TEXT,
    __file_create_datetime            TEXT,
    __file_create_datetime_raw        TEXT,
    __file_mod_date                   TEXT,
    __file_mod_datetime               TEXT,
    __file_mod_datetime_raw           TEXT,
    __file_size                       TEXT,
    __file_size_bytes                 TEXT,
    __file_size_kb                    TEXT,
    __file_size_mb                    TEXT,
    __filetype                        TEXT,
    __image_mimetype                  TEXT,
    __image_type                      TEXT,
    __layer                           TEXT,
    __length                          TEXT,
    __length_seconds                  TEXT,
    __mode                            TEXT,
    __modified                        TEXT,
    __num_images                      TEXT,
    __parent_dir                      TEXT,
    __size                            TEXT,
    __tag                             TEXT,
    __tag_read                        TEXT,
    __version                         TEXT,
    __vendorstring                    TEXT,
    __md5sig                          TEXT,
    tagminder_uuid                    BLOB,
    sqlmodded                         TEXT,
    reflac                            TEXT,
    discnumber                        TEXT,
    track                             TEXT,
    title                             TEXT,
    subtitle                          TEXT,
    work                              TEXT,
    part                              TEXT,
    live                              TEXT,
    composer                          TEXT,
    arranger                          TEXT,
    lyricist                          TEXT,
    writer                            TEXT,
    artist                            TEXT,
    performer                         TEXT,
    personnel                         TEXT,
    conductor                         TEXT,
    engineer                          TEXT,
    producer                          TEXT,
    mixer                             TEXT,
    remixer                           TEXT,
    albumartist                       TEXT,
    discsubtitle                      TEXT,
    album                             TEXT,
    version                           TEXT,
    releasetype                       TEXT,
    year                              TEXT,
    originaldate                      TEXT,
    originalreleasedate               TEXT,
    originalyear                      TEXT,
    genre                             TEXT,
    style                             TEXT,
    mood                              TEXT,
    theme                             TEXT,
    rating                            TEXT,
    compilation                       TEXT,
    bootleg                           TEXT,
    label                             TEXT,
    musicbrainz_albumartistid         TEXT,
    musicbrainz_albumid               TEXT,
    musicbrainz_artistid              TEXT,
    musicbrainz_composerid            TEXT,
    musicbrainz_discid                TEXT,
    musicbrainz_producerid            TEXT,
    musicbrainz_releasegroupid        TEXT,
    musicbrainz_releasetrackid        TEXT,
    musicbrainz_trackid               TEXT,
    musicbrainz_workid                TEXT,
    lyrics                            TEXT,
    unsyncedlyrics                    TEXT,
    performancedate                   TEXT,
    acoustid_fingerprint              TEXT,
    acoustid_id                       TEXT,
    analysis                          TEXT,
    asin                              TEXT,
    barcode                           TEXT,
    catalog                           TEXT,
    catalognumber                     TEXT,
    country                           TEXT,
    discogs_artist_url                TEXT,
    discogs_release_url               TEXT,
    fingerprint                       TEXT,
    isrc                              TEXT,
    recordinglocation                 TEXT,
    recordingstartdate                TEXT,
    replaygain_album_gain             TEXT,
    replaygain_album_peak             TEXT,
    replaygain_track_gain             TEXT,
    replaygain_track_peak             TEXT,
    review                            TEXT,
    upc                               TEXT,
    musicip_data                      TEXT,
    type                              TEXT,
    musicip_puid                      TEXT,
    movement                          TEXT
);
blob python-polars
1个回答
0
投票

您的

BLOB
列中有无效(可能是文本)数据。 Sqlite(不幸的是)允许列中混合类型。

看下面的示例,

t1
仅包含
BLOB
值,但
t2
包含
BLOB
TEXT
值的混合:

sqlite> .open tmp.db
sqlite> CREATE TABLE t1 (b BLOB);
sqlite> CREATE TABLE t2 (b BLOB);
sqlite> INSERT INTO t1 VALUES (x'0001');
sqlite> INSERT INTO t2 VALUES (x'0001'), ('text');
sqlite> .exit
>>> import polars as pl
>>> pl.read_database_uri(query="SELECT * FROM t1", uri="sqlite://tmp.db")
shape: (1, 1)
┌─────────────┐
│ b           │
│ ---         │
│ binary      │
╞═════════════╡
│ b"\x00\x01" │
└─────────────┘
>>> pl.read_database_uri(query="SELECT * FROM t2", uri="sqlite://tmp.db")
Traceback (most recent call last):
  File "/Users/orlp/programming/rust/polars/py-polars/polars/io/database/_utils.py", line 54, in _read_sql_connectorx
    tbl = cx.read_sql(
          ^^^^^^^^^^^^
  File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/connectorx/__init__.py", line 386, in read_sql
    result = _read_sql(
             ^^^^^^^^^^
RuntimeError: Invalid column type Text at index: 0, name: b

您可以通过将列转换为 blob 来修复此问题,然后将其读取为二进制:

>>> pl.read_database_uri(query="SELECT cast(b as BLOB) FROM t2", uri="sqlite://tmp.db")
shape: (2, 1)
┌─────────────────┐
│ CAST(b AS BLOB) │
│ ---             │
│ binary          │
╞═════════════════╡
│ b"\x00\x01"     │
│ b"text"         │
└─────────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.