我正在尝试将 SQLite 表导入到 Polars df 中,如下所示:
import polars as pl
uri = "sqlite:///tmp/dbname.db"
query = ("SELECT * from alib")
df = pl.read_database_uri(query=query, uri=uri)
SQLite 表中的第一列称为 __path,其数据类型为 BLOB。
当我尝试将其导入 Polars df 时,它会抛出:
Traceback (most recent call last):
File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/_utils.py", line 54, in _read_sql_connectorx
tbl = cx.read_sql(
^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/connectorx/__init__.py", line 386, in read_sql
result = _read_sql(
^^^^^^^^^^
RuntimeError: Invalid column type Text at index: 0, name: __path
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/functions.py", line 396, in read_database_uri
return _read_sql_connectorx(
^^^^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/_utils.py", line 66, in _read_sql_connectorx
raise type(err)(errmsg) from err
RuntimeError: Invalid column type Text at index: 0, name: __path
query = ("SELECT * from alib")
df = pl.read_database_uri(query=query, uri=uri)
Traceback (most recent call last):
File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/_utils.py", line 54, in _read_sql_connectorx
tbl = cx.read_sql(
^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/connectorx/__init__.py", line 386, in read_sql
result = _read_sql(
^^^^^^^^^^
RuntimeError: Invalid column type Text at index: 0, name: __path
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/functions.py", line 396, in read_database_uri
return _read_sql_connectorx(
^^^^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/polars/io/database/_utils.py", line 66, in _read_sql_connectorx
raise type(err)(errmsg) from err
RuntimeError: Invalid column type Text at index: 0, name: __path
Polars 根本不处理 SQLite BLOB 字段吗?或者我需要做些什么才能工作?
根据要求的架构:
CREATE TABLE alib (
__path BLOB UNIQUE,
__filename BLOB,
__dirpath BLOB,
__filename_no_ext BLOB,
__ext BLOB,
__accessed TEXT,
__app TEXT,
__bitrate TEXT,
__bitspersample TEXT,
__bitrate_num TEXT,
__frequency_num TEXT,
__frequency TEXT,
__channels TEXT,
__created TEXT,
__dirname TEXT,
__file_access_date TEXT,
__file_access_datetime TEXT,
__file_access_datetime_raw TEXT,
__file_create_date TEXT,
__file_create_datetime TEXT,
__file_create_datetime_raw TEXT,
__file_mod_date TEXT,
__file_mod_datetime TEXT,
__file_mod_datetime_raw TEXT,
__file_size TEXT,
__file_size_bytes TEXT,
__file_size_kb TEXT,
__file_size_mb TEXT,
__filetype TEXT,
__image_mimetype TEXT,
__image_type TEXT,
__layer TEXT,
__length TEXT,
__length_seconds TEXT,
__mode TEXT,
__modified TEXT,
__num_images TEXT,
__parent_dir TEXT,
__size TEXT,
__tag TEXT,
__tag_read TEXT,
__version TEXT,
__vendorstring TEXT,
__md5sig TEXT,
tagminder_uuid BLOB,
sqlmodded TEXT,
reflac TEXT,
discnumber TEXT,
track TEXT,
title TEXT,
subtitle TEXT,
work TEXT,
part TEXT,
live TEXT,
composer TEXT,
arranger TEXT,
lyricist TEXT,
writer TEXT,
artist TEXT,
performer TEXT,
personnel TEXT,
conductor TEXT,
engineer TEXT,
producer TEXT,
mixer TEXT,
remixer TEXT,
albumartist TEXT,
discsubtitle TEXT,
album TEXT,
version TEXT,
releasetype TEXT,
year TEXT,
originaldate TEXT,
originalreleasedate TEXT,
originalyear TEXT,
genre TEXT,
style TEXT,
mood TEXT,
theme TEXT,
rating TEXT,
compilation TEXT,
bootleg TEXT,
label TEXT,
musicbrainz_albumartistid TEXT,
musicbrainz_albumid TEXT,
musicbrainz_artistid TEXT,
musicbrainz_composerid TEXT,
musicbrainz_discid TEXT,
musicbrainz_producerid TEXT,
musicbrainz_releasegroupid TEXT,
musicbrainz_releasetrackid TEXT,
musicbrainz_trackid TEXT,
musicbrainz_workid TEXT,
lyrics TEXT,
unsyncedlyrics TEXT,
performancedate TEXT,
acoustid_fingerprint TEXT,
acoustid_id TEXT,
analysis TEXT,
asin TEXT,
barcode TEXT,
catalog TEXT,
catalognumber TEXT,
country TEXT,
discogs_artist_url TEXT,
discogs_release_url TEXT,
fingerprint TEXT,
isrc TEXT,
recordinglocation TEXT,
recordingstartdate TEXT,
replaygain_album_gain TEXT,
replaygain_album_peak TEXT,
replaygain_track_gain TEXT,
replaygain_track_peak TEXT,
review TEXT,
upc TEXT,
musicip_data TEXT,
type TEXT,
musicip_puid TEXT,
movement TEXT
);
您的
BLOB
列中有无效(可能是文本)数据。 Sqlite(不幸的是)允许列中混合类型。
看下面的示例,
t1
仅包含BLOB
值,但t2
包含BLOB
和TEXT
值的混合:
sqlite> .open tmp.db
sqlite> CREATE TABLE t1 (b BLOB);
sqlite> CREATE TABLE t2 (b BLOB);
sqlite> INSERT INTO t1 VALUES (x'0001');
sqlite> INSERT INTO t2 VALUES (x'0001'), ('text');
sqlite> .exit
>>> import polars as pl
>>> pl.read_database_uri(query="SELECT * FROM t1", uri="sqlite://tmp.db")
shape: (1, 1)
┌─────────────┐
│ b │
│ --- │
│ binary │
╞═════════════╡
│ b"\x00\x01" │
└─────────────┘
>>> pl.read_database_uri(query="SELECT * FROM t2", uri="sqlite://tmp.db")
Traceback (most recent call last):
File "/Users/orlp/programming/rust/polars/py-polars/polars/io/database/_utils.py", line 54, in _read_sql_connectorx
tbl = cx.read_sql(
^^^^^^^^^^^^
File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/connectorx/__init__.py", line 386, in read_sql
result = _read_sql(
^^^^^^^^^^
RuntimeError: Invalid column type Text at index: 0, name: b
您可以通过将列转换为 blob 来修复此问题,然后将其读取为二进制:
>>> pl.read_database_uri(query="SELECT cast(b as BLOB) FROM t2", uri="sqlite://tmp.db")
shape: (2, 1)
┌─────────────────┐
│ CAST(b AS BLOB) │
│ --- │
│ binary │
╞═════════════════╡
│ b"\x00\x01" │
│ b"text" │
└─────────────────┘