同时下载多个文件

Question

我使用 fsspec 与远程文件系统交互，在我的例子中是 GCS，但我相信解决方案是通用的。

对于单个文件，我使用以下代码（如果您需要辅助功能代码，请在here）

def open_any_file(filepath: str, mode: str = "r", **kwargs) -> t.Generator[t.IO, None, None]:
    """
    Open file and close it after use. Works for local, remote, http, https, s3, gcs, etc.

    :param filepath: Filepath.
    :param mode: Mode.
    :param kwargs: Keyword arguments.
    :return: File object.
    """

    protocol, path = get_protocol_and_path(filepath)
    filepath = PurePosixPath(path)
    filesystem = fsspec.filesystem(protocol)

    load_path = get_filepath_str(filepath, protocol)

    # Figure out content type
    if "content_type" not in kwargs and filepath.suffix == ".json":
        kwargs["content_type"] = "application/json"

    with filesystem.open(load_path, mode=mode, **kwargs) as f:
        yield f

假设我有一千个 JSON 需要下载，最有效的方法是什么？我应该进行并行化吗？线程？异步？

就执行时间而言，最佳选择是什么，以及它的实现方式是什么？

Answer 1

你想要的功能在这里：https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.generic.rsync

您将向其传递两个目录：源目录和目标目录，fsspec 将确定每个目录使用哪个文件系统实现，并在后端支持的情况下执行并发复制。 fsspec 在 s3、gcs、abfs 和 http 内部是异步的。

要从模式（“*.json”）上的特定后端复制一堆文件，您将需要特定于实现的 get() 方法（复制到本地文件）或 cat() （抓取内存字节））。这是因为

rsync

不支持模式（还吗？）。

同时下载多个文件

问题描述投票：0回答：1

1个回答

最新问题

同时下载多个文件

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1