如何在Python中下载zip文件并从中解析csv文件

Question

我编写了一个脚本，它点击一个 URL 并下载一个 zip 文件，并将其解压缩。现在我在解析解压后得到的 CSV 文件时遇到问题。

import csv
from requests import get
from io import BytesIO
from zipfile import ZipFile

request = get('https://example.com/some_file.zip')
zip_file = ZipFile(BytesIO(request.content))
files = zip_file.namelist()
with open(files[0], 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        print(row)

Answer 1

Python 3 的现代答案

参见上面@joe-heffer的答案：https://stackoverflow.com/a/53187751/223424

旧的不完整答案

当您执行

files = zip_file.namelist()

时，您只需列出 zip 存档中的文件名称即可；这些文件尚未从 zip 中提取，您无法像您正在做的那样将它们

open

作为 local 文件。

您可以使用

ZipFile.open

直接从 zip 文件中读取数据流。

所以这应该有效：

zip_file = ZipFile(BytesIO(request.content))
files = zip_file.namelist()
with zip_file.open(files[0], 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    ...

Answer 2

response = requests.get(url)
with io.BytesIO(response.content) as zip_file:
    with zipfile.ZipFile() as zip_file:
        # Get first file in the archive
        for zip_info in zip_file.infolist():
            logger.debug(zip_info)
            # Open file
            with zip_file.open(zip_info) as file:
                # Load CSV file, decode binary to text
                with io.TextIOWrapper(file) as text:
                    return csv.DictReader(text)

Answer 3

看起来您还没有导入

csv

模块。尝试将

import csv

放在导入的顶部。

Answer 4

所以。经过几个小时的搜索和尝试，我终于得到了一些有用的东西。这是我的脚本。

所以我的需求是：

下载 ZIP 文件。
在该 zip 文件中查找特定文本文件名称中的“任意字符串”
从该文本文件中提取包含字符串“csv”的第一个 URL

#!/bin/env python
from io import BytesIO
from zipfile import ZipFile
import requests
import re
import sys

# define url value
url = "https://whateverurlyouneed"

# Define string to be found in the file name to be extracted
filestr = "anystring"

# Define string to be found in URL
urlstr = "anystring"

# Define regex to extract URL
regularex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'\".,<>?«»“”‘’]))"

# download zip file
content = requests.get(url)

# Open stream
zipfile = ZipFile(BytesIO(content.content))

# Open first file from the ZIP archive containing 
# the filestr string in the name
data = [zipfile.open(file_name) for file_name in zipfile.namelist() if filestr in file_name][0]

# read lines from the file. If csv found, print URL and exit
# This will return the 1st URL containing CSV in the opened file
for line in data.readlines():
    if urlstr in line.decode("latin-1"):
        urls = re.findall(regularex,line.decode("latin-1"))
        print([url[0] for url in urls])
        break
sys.exit(0)

如何在Python中下载zip文件并从中解析csv文件

问题描述投票：0回答：4

4个回答

Python 3 的现代答案

旧的不完整答案

最新问题

如何在Python中下载zip文件并从中解析csv文件

问题描述 投票：0回答：4

4个回答

Python 3 的现代答案

旧的不完整答案

最新问题

问题描述投票：0回答：4