使用 Python Polars,如何修改以下脚本以将 Parquet 文件的内容作为 CSV 文本流式传输到标准输出?
import polars as pl
import sys
pl.scan_parquet("BTCUSDT-trades-2022-01.parquet").sink_csv(sys.stdout)
Python 抱怨
Lazyframe.sink_csv
需要一个字符串参数而不是 TextIOWrapper
:
Traceback (most recent call last):
File "/mnt/storage/Data/Binance/Market Data/Polars/select_btcusdt_aggtrades.py", line 4, in <module>
pl.scan_parquet("../BTCUSDT/2022/BTCUSDT-trades-2022-01.parquet").sink_csv(sys.stdout)
File "/home/derek/.cache/uv/archive-v0/bOaA51uU_dQEF2peOOxqI/lib/python3.12/site-packages/polars/_utils/unstable.py", line 58, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/derek/.cache/uv/archive-v0/bOaA51uU_dQEF2peOOxqI/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 2717, in sink_csv
path=normalize_filepath(path),
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/derek/.cache/uv/archive-v0/bOaA51uU_dQEF2peOOxqI/lib/python3.12/site-packages/polars/_utils/various.py", line 225, in normalize_filepath
path = os.path.expanduser(path) # noqa: PTH111
^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen posixpath>", line 259, in expanduser
TypeError: expected str, bytes or os.PathLike object, not TextIOWrapper
BTCUSDT-trades-2022-01.zip
包含 2022 年 1 月在Binance加密货币交易所上比特币和 USDT 之间的 CSV 格式交易。BTCUSDT-trades-2022-01.parquet
包含 Parquet 格式的此交易数据。
虽然不是理想的解决方案,但作为解决方法,我修改了脚本以将 CSV 文本流式传输到我使用命令
output.csv
在 Linux 中创建的命名管道 mkfifo output.csv
。 在另一个 shell 进程中,cat output.csv
将输出流式传输到标准输出。
import polars as pl
import sys
pl.scan_parquet("BTCUSDT-trades-2022-01.parquet").sink_csv("output.csv")
$ cat output.csv | head
trade_id,price,qty,quote_qty,time,is_buyer_maker,is_best_match
1207691977,46216.93,0.00709000,327.67803370,1640995200000,false,true
1207691978,46216.92,0.00041000,18.94893720,1640995200000,true,true
1207691979,46216.93,0.00056000,25.88148080,1640995200000,false,true
1207691980,46216.92,0.00066000,30.50316720,1640995200000,true,true
1207691981,46216.92,0.00523000,241.71449160,1640995200002,true,true
1207691982,46216.93,0.00631000,291.62882830,1640995200003,false,true
1207691983,46216.93,0.00443000,204.74099990,1640995200003,false,true
1207691984,46216.92,0.00573000,264.82295160,1640995200004,true,true
1207691985,46216.92,0.00718000,331.83748560,1640995200005,true,true