如何使用 MediaIoBaseDownload 与 Google Drive 和 Python 恢复下载？

Question

对于大文件，我会遇到各种停止下载的错误，所以我想通过正确地附加到磁盘上的文件来从停止的地方恢复。

我看到 FileIO 必须使用“ab”模式： fh = io.FileIO(fname, mode='ab')

但我找不到如何指定从何处继续使用 MediaIoBaseDownload。

关于如何实现这个的任何想法？

Answer 1

我看不到你的代码，所以我会为你提供一些可以帮助你解决问题的选项的一般信息。您可以使用

MediaIoBaseDownload

以 Chunks 的形式下载文件，您可以在 here.

看到一些关于此的文档

例子：

  request = farms.animals().get_media(id='cow')
  fh = io.FileIO('cow.png', mode='wb')
  downloader = MediaIoBaseDownload(fh, request, chunksize=1024*1024)

  done = False
  while done is False:
    status, done = downloader.next_chunk()
    if status:
      print "Download %d%%." % int(status.progress() * 100)
  print "Download Complete!"

获取下一个下载块。

Args: num_retries: Integer, 随机重试的次数指数退避。如果所有重试都失败，则引发 HttpError 代表最后一个请求。如果为零（默认），我们尝试只请求一次。

返回：（状态，完成）：（MediaDownloadProgress，布尔值）当媒体已完全完成时，“完成”的值将为 True 下载或媒体的总大小未知。

引发：googleapiclient.errors.HttpError 如果响应不是 2xx。 httplib2.HttpLib2Error 如果发生传输错误。

我也在谷歌文档中找到了这个例子here.

from __future__ import print_function

import io

import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload


def download_file(real_file_id):
    """Downloads a file
    Args:
        real_file_id: ID of the file to download
    Returns : IO object with location.

    Load pre-authorized user credentials from the environment.
    TODO(developer) - See https://developers.google.com/identity
    for guides on implementing OAuth2 for the application.
    """
    creds, _ = google.auth.default()

    try:
        # create drive api client
        service = build('drive', 'v3', credentials=creds)

        file_id = real_file_id

        # pylint: disable=maybe-no-member
        request = service.files().get_media(fileId=file_id)
        file = io.BytesIO()
        downloader = MediaIoBaseDownload(file, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print(F'Download {int(status.progress() * 100)}.')

    except HttpError as error:
        print(F'An error occurred: {error}')
        file = None

    return file.getvalue()


if __name__ == '__main__':
    download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')

最后，您可以在这 2 个博客中查看几个关于如何使用

MediaIoBaseDownload

块的示例。

更新

许多客户端库通过媒体下载服务提供部分下载功能。您可以在here 和here 中参阅客户端库文档以获取详细信息。但是，文档不是很清楚。

Java 的API 客户端库有更多信息并指出：

“可续传媒体下载协议类似于可续传媒体上传协议，这在 Google Drive API 文档中有描述。”

在 Google Drive API 文档中，您将找到一些使用 python 进行可恢复上传的示例。您可以使用 Python google-resumable-media 库、Java 可续传媒体下载和可续传上传的文档作为代码的基础，以便在上传失败时重新开始上传。

Answer 2

当我看到你的问题时，我觉得这个帖子可能会有用。 Ref 我已经发布了我对这个主题的回答。

为了实现Google Drive的部分下载，需要在请求头中包含

Range: bytes=500-999

的属性。但是，不幸的是，在现阶段，

MediaIoBaseDownload

不能使用这个属性。使用

MediaIoBaseDownload

时，下载所有数据。

因此，为了实现您的目标，需要使用变通方法。在此解决方法中，我提出了以下流程。

检索您要下载的 Google Drive 文件的文件名和文件大小。
按文件名检查现有文件。
- 当没有现有文件时，将文件作为新文件下载。
- 当存在现有文件时，该文件将作为可恢复下载进行下载。
通过
```
requests
```
下载文件内容。

这个流程反映到python的一个示例脚本中，就变成了如下

示例脚本：

service = build("drive", "v3", credentials=creds) # Here, please use your client.
file_id = "###" # Please set the file ID of the file you want to download.

access_token = creds.token # Acces token is retrieved from creds of service = build("drive", "v3", credentials=creds)

# Get the filename and file size.
obj = service.files().get(fileId=file_id, fields="name,size").execute()
filename = obj.get("name", "sampleName")
size = obj.get("size", None)
if not size:
    sys.exit("No file size.")
else:
    size = int(size)

# Check existing file.
file_path = os.path.join("./", filename) # Please set your path.
o = {}
if os.path.exists(file_path):
    o["start_byte"] = os.path.getsize(file_path)
    o["mode"] = "ab"
    o["download"] = "As resume"
else:
    o["start_byte"] = 0
    o["mode"] = "wb"
    o["download"] = "As a new file"
if o["start_byte"] == size:
    sys.exit("The download of this file has already been finished.")

# Download process
print(o["download"])
headers = {
    "Authorization": f"Bearer {access_token}",
    "Range": f'bytes={o["start_byte"]}-',
}
url = f"https://www.googleapis.com/drive/v3/files/{file_id}?alt=media"
with requests.get(url, headers=headers, stream=True) as r:
    r.raise_for_status()
    with open(file_path, o["mode"]) as f:
        for chunk in r.iter_content(chunk_size=10240):
            f.write(chunk)

当这个脚本运行时，会下载一个
```
file_id
```
的文件。当下载中途停止下载时，再次运行脚本时，下载作为恢复运行。这样，文件内容将附加到现有文件中。我认为这可能是您预期的情况。

注：

在这种情况下，假设下载文件不是Google Docs文件（Document、Spreadsheet、Slides等）。请注意这一点。
此脚本假定您的客户端
```
service = build("drive", "v3", credentials=creds)
```
可用于从Google Drive下载文件。请注意这一点。

参考资料：

相关线程。
- 如何在 Google Drive Api v3 中进行部分下载？
部分下载

如何使用 MediaIoBaseDownload 与 Google Drive 和 Python 恢复下载？

问题描述投票：0回答：2

2个回答

示例脚本：

注：

参考资料：

最新问题

如何使用 MediaIoBaseDownload 与 Google Drive 和 Python 恢复下载？

问题描述 投票：0回答：2

2个回答

示例脚本：

注：

参考资料：

最新问题

问题描述投票：0回答：2