用于 CPP DownloadTo 的 Azure SDK 功能非常慢

问题描述 投票:0回答:1

我已将适用于 CPP 的 Azure SDK 集成到我的应用程序中,与旧的 Azure SDK 相比,速度明显变慢。 升级上传Azure-sdk-for-cpp并行性后,上传效果更好,但下载仍然很慢。

可以通过简单的示例来重现它,只需尝试从 Azure 存储下载 1Gb 文件到本地文件系统即可。

  • 旧 SDK ~1 分钟
  • 新 SDK ~5 分钟

旧的 SDK 使用 CPP REST,它使用 concurrency::streams::istream m_stream;新的 SDK 中没有这样的东西,除了 TransferOptions.Concurrency 几乎什么都不做。 有什么办法可以加快 DownloadTo 的速度吗?或者应该在库之上实现并行性?

// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.

#include <azure/storage/blobs.hpp>

#include <cstdio>
#include <iostream>
#include <stdexcept>

std::string GetConnectionString()
{
  const static std::string ConnectionString = "";

  if (!ConnectionString.empty())
  {
    return ConnectionString;
  }
  const static std::string envConnectionString = std::getenv("AZURE_STORAGE_CONNECTION_STRING");
  if (!envConnectionString.empty())
  {
    return envConnectionString;
  }
  throw std::runtime_error("Cannot find connection string.");
}

int main()
{
  using namespace Azure::Storage::Blobs;

  const std::string containerName = "sample-container";
  const std::string blobName = "sample-blob";
  const std::string blobContent = "Hello Azure!";

  auto containerClient
      = BlobContainerClient::CreateFromConnectionString(GetConnectionString(), containerName);

  containerClient.CreateIfNotExists();

  BlockBlobClient blobClient = containerClient.GetBlockBlobClient(blobName);

  std::vector<uint8_t> buffer(blobContent.begin(), blobContent.end());
  blobClient.UploadFrom(buffer.data(), buffer.size());

  Azure::Storage::Metadata blobMetadata = {{"key1", "value1"}, {"key2", "value2"}};
  blobClient.SetMetadata(blobMetadata);

  auto properties = blobClient.GetProperties().Value;
  for (auto metadata : properties.Metadata)
  {
    std::cout << metadata.first << ":" << metadata.second << std::endl;
  }
  // We know blob size is small, so it's safe to cast here.
  buffer.resize(static_cast<size_t>(properties.BlobSize));

  blobClient.DownloadTo(buffer.data(), buffer.size());

  std::cout << std::string(buffer.begin(), buffer.end()) << std::endl;

  return 0;
}
c++ azure multithreading azure-blob-storage azure-sdk
1个回答
0
投票

有什么办法可以加快 DownloadTo 的速度吗?或者应该在库之上实现并行性?.

我建议将您的下载拆分为

chunks
并手动并行化它们。这种方法类似于某些 HTTP 客户端并行下载文件的方法。

您可以使用以下代码使用 C++ SDK 更快地下载。

代码:

#include <azure/storage/blobs.hpp>
#include <chrono>
#include <fstream>
#include <future>
#include <iostream>
#include <stdexcept>
#include <vector>
#include <cstring>
#include <mutex>

int main()
{
    using namespace Azure::Storage::Blobs;

    const std::string containerName = "result";
    const std::string blobName = "test.mp4";
    const std::string outputFileName = "demo1.mp4"; // Output file

    auto containerClient = BlobContainerClient::CreateFromConnectionString("DefaultEndpointsProtocol=https;AccountName=venkat326123;AccountKey=redacted;EndpointSuffix=core.windows.net", containerName);
    containerClient.CreateIfNotExists();

    BlockBlobClient blobClient = containerClient.GetBlockBlobClient(blobName);

    auto properties = blobClient.GetProperties().Value;
    size_t blobSize = static_cast<size_t>(properties.BlobSize);

    const size_t chunkSize = 4 * 1024 * 1024;
    size_t totalChunks = (blobSize + chunkSize - 1) / chunkSize; 

    std::ofstream outputFile(outputFileName, std::ios::binary);
    if (!outputFile.is_open())
    {
        std::cerr << "Failed to open output file: " << outputFileName << std::endl;
        return 1;
    }

    std::mutex fileMutex;

    std::vector<std::future<void>> futures;

    auto start = std::chrono::high_resolution_clock::now();

    // Start downloading each chunk in parallel
    for (size_t i = 0; i < totalChunks; ++i)
    {
        futures.push_back(std::async(std::launch::async, [&, i]()
            {
                try
                {
                    // Calculate start and length for each chunk
                    size_t start = i * chunkSize;
                    size_t length = std::min(chunkSize, blobSize - start);

                    // Define download options with range for each chunk
                    Azure::Storage::Blobs::DownloadBlobToOptions rangeOptions;
                    rangeOptions.Range = Azure::Core::Http::HttpRange{ static_cast<int64_t>(start), static_cast<int64_t>(length) };

                    // Temporary buffer for chunk
                    std::vector<uint8_t> buffer(length);

                    // Download chunk data into the temporary buffer
                    blobClient.DownloadTo(buffer.data(), length, rangeOptions);

                    // Lock and write the buffer to the file at the correct position
                    std::lock_guard<std::mutex> lock(fileMutex);
                    outputFile.seekp(start);
                    outputFile.write(reinterpret_cast<char*>(buffer.data()), length);
                }
                catch (const std::exception& e)
                {
                    std::cerr << "Error downloading chunk " << i << ": " << e.what() << std::endl;
                }
            }));
    }

    // Wait for all chunks to finish downloading
    for (auto& f : futures)
    {
        f.get();
    }

    // Stop timing the download
    auto end = std::chrono::high_resolution_clock::now();

    // Close the file stream
    outputFile.close();

    // Calculate time taken in seconds
    std::chrono::duration<double> elapsedSeconds = end - start;

    // Calculate download speed in MBps
    double downloadSpeed = (blobSize / (1024.0 * 1024.0)) / elapsedSeconds.count();

    std::cout << "Downloaded blob '" << blobName << "' to file '" << outputFileName << "' of size " << blobSize << " bytes." << std::endl;
    std::cout << "Time taken: " << elapsedSeconds.count() << " seconds" << std::endl;
    std::cout << "Download speed: " << downloadSpeed << " MBps" << std::endl;

    return 0;
}

上面的代码将文件分成 4 MB 的块,并使用

std::async
并发下载它们,以实现高效的多线程,确保使用
std::mutex
的线程安全写入。

输出:

Downloaded blob 'test.mp4' to file 'demo1.mp4' of size 69632912 bytes.
Time taken: 15.3713 seconds
Download speed: 4.32019 MBps

enter image description here

文件:

enter image description here

另请检查您创建的 GitHub 链接,他们还可以提供很好的建议来帮助使用 C++ SDK。

© www.soinside.com 2019 - 2024. All rights reserved.