我想从公共谷歌驱动器文件夹下载数据集(另存为 zip)。
url = https://drive.google.com/drive/folders/1TzwfNA5JRFTPO-kHMU___kILmOEodoBo
由于我希望其他人可以复制它,因此我不想将其复制到我的驱动器(最好也不要将我的驱动器安装在笔记本中)。
如何才能做到?
到目前为止我尝试过:
import requests
import io
import zipfile
zip_url = 'https://drive.google.com/file/d/1fdFu5NGXe4rTLYKD5wOqk9dl-eJOefXo'
response = requests.get(zip_url)
file_contents = io.BytesIO(response.content)
print(file_contents)
with zipfile.ZipFile(file_contents, 'r') as zip_ref:
zip_ref.extractall('/content/') # Replace with your desired extraction path
但出现此错误(并之前打印“file_contents”):
<_io.BytesIO object at 0x7ad7efbf27f0>
---------------------------------------------------------------------------
BadZipFile Traceback (most recent call last)
<ipython-input-18-56d2c8f2bfe8> in <cell line: 14>()
12 print(file_contents)
13 # Extract the zip file (if needed)
---> 14 with zipfile.ZipFile(file_contents, 'r') as zip_ref:
15 zip_ref.extractall('/content/') # Replace with your desired extraction path
1 frames
/usr/lib/python3.10/zipfile.py in _RealGetContents(self)
1334 raise BadZipFile("File is not a zip file")
1335 if not endrec:
-> 1336 raise BadZipFile("File is not a zip file")
1337 if self.debug > 1:
1338 print(endrec)
BadZipFile: File is not a zip file
如果我尝试以下方法,我会得到一个空的 zip 文件:
file_id = '1fdFu5NGXe4rTLYKD5wOqk9dl-eJOefXo'
download_url = f'https://drive.google.com/uc?export=download&id={file_id}'
!wget --no-check-certificate -O '/content/file.zip' 'https://drive.google.com/uc?export=download&id=1fdFu5NGXe4rTLYKD5wOqk9dl-eJOefXo'
如有任何帮助,我们将不胜感激。
返回的内容类型是 html,而不是 zip 文件。
import requests
import io
file_id = '1fdFu5NGXe4rTLYKD5wOqk9dl-eJOefXo'
download_url = f'https://drive.google.com/uc?export=download&id={file_id}'
response = requests.get(download_url)
print(response.headers.get("Content-Type"))
这应该告诉您服务器返回的内容。在本例中,它的 text/html 不是 zip 文件。
检查 url 是否指向实际的 zip 文件。