Python在pandas csv reader中解压缩gzip csv

问题描述 投票:1回答:1

以下代码适用于Python3,但在Python2中失败

r = requests.get("http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz", stream=True)
decompressed_file = gzip.GzipFile(fileobj=r.raw)
data = pd.read_csv(decompressed_file, sep=',')
data.columns = ["timestamp", "price" , "volume"]  # set df col headers
return data

我在Python2中得到的错误如下:

TypeError: 'int' object has no attribute '__getitem__'

错误发生在我将数据设置为pd.read_csv(...)的行上

对我来说似乎是一个熊猫错误

堆栈跟踪:

Traceback (most recent call last):
  File "fetch.py", line 51, in <module>
    print(f.get_historical())
  File "fetch.py", line 36, in get_historical
    data = pd.read_csv(f, sep=',')
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine

    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 562, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 760, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2197, in pandas._libs.parsers.raise_parser_error
io.UnsupportedOperation: seek
python python-2.7 pandas python-requests gzip
1个回答
3
投票

您发布的回溯问题与Response对象的原始属性是类文件对象这一事实有关,该对象不支持典型文件对象支持的.seek方法。但是,当使用pd.read_csv摄取文件对象时,pandas(在python2中)似乎正在使用所提供文件对象的seek方法。

您可以通过调用r.raw.seekable()来确认返回的响应的原始数据是不可查找的,False通常应返回io.BytesIO

解决此问题的方法可能是将返回的数据包装到import gzip import io import pandas as pd import requests # file_url = "http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz" file_url = "http://api.bitcoincharts.com/v1/csv/aqoinEUR.csv.gz" r = requests.get(file_url, stream=True) dfile = gzip.GzipFile(fileobj=io.BytesIO(r.raw.read())) data = pd.read_csv(dfile, sep=',') print(data) 0 1 2 0 1314964052 2.60 0.4 1 1316277154 3.75 0.5 2 1316300526 4.00 4.0 3 1316300612 3.80 1.0 4 1316300622 3.75 1.5 对象中,如下所示:

io.BytesIO(r.raw.read())

如您所见,我使用了可用文件目录中的较小文件。您可以将其切换到所需的文件。在任何情况下,io.UnsupportedOperation应该是可寻找的,因此应该有助于避免你遇到的TypeError例外。

至于qazxswpoi异常,它在这段代码中是不存在的。

我希望这有帮助。

© www.soinside.com 2019 - 2024. All rights reserved.