我已将适用于 CPP 的 Azure SDK 集成到我的应用程序中,与旧的 Azure SDK 相比,速度明显变慢。 升级上传Azure-sdk-for-cpp并行性后,上传效果更好,但下载仍然很慢。
可以通过简单的示例来重现它,只需尝试从 Azure 存储下载 1Gb 文件到本地文件系统即可。
旧的 SDK 使用 CPP REST,它使用 concurrency::streams::istream m_stream;新的 SDK 中没有这样的东西,除了 TransferOptions.Concurrency 几乎什么都不做。 有什么办法可以加快 DownloadTo 的速度吗?或者应该在库之上实现并行性?
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#include <azure/storage/blobs.hpp>
#include <cstdio>
#include <iostream>
#include <stdexcept>
std::string GetConnectionString()
{
const static std::string ConnectionString = "";
if (!ConnectionString.empty())
{
return ConnectionString;
}
const static std::string envConnectionString = std::getenv("AZURE_STORAGE_CONNECTION_STRING");
if (!envConnectionString.empty())
{
return envConnectionString;
}
throw std::runtime_error("Cannot find connection string.");
}
int main()
{
using namespace Azure::Storage::Blobs;
const std::string containerName = "sample-container";
const std::string blobName = "sample-blob";
const std::string blobContent = "Hello Azure!";
auto containerClient
= BlobContainerClient::CreateFromConnectionString(GetConnectionString(), containerName);
containerClient.CreateIfNotExists();
BlockBlobClient blobClient = containerClient.GetBlockBlobClient(blobName);
std::vector<uint8_t> buffer(blobContent.begin(), blobContent.end());
blobClient.UploadFrom(buffer.data(), buffer.size());
Azure::Storage::Metadata blobMetadata = {{"key1", "value1"}, {"key2", "value2"}};
blobClient.SetMetadata(blobMetadata);
auto properties = blobClient.GetProperties().Value;
for (auto metadata : properties.Metadata)
{
std::cout << metadata.first << ":" << metadata.second << std::endl;
}
// We know blob size is small, so it's safe to cast here.
buffer.resize(static_cast<size_t>(properties.BlobSize));
blobClient.DownloadTo(buffer.data(), buffer.size());
std::cout << std::string(buffer.begin(), buffer.end()) << std::endl;
return 0;
}
有什么办法可以加快 DownloadTo 的速度吗?或者应该在库之上实现并行性?.
我建议将您的下载拆分为
chunks
并手动并行化它们。这种方法类似于某些 HTTP 客户端并行下载文件的方法。
您可以使用以下代码使用 C++ SDK 更快地下载。
代码:
#include <azure/storage/blobs.hpp>
#include <chrono>
#include <fstream>
#include <future>
#include <iostream>
#include <stdexcept>
#include <vector>
#include <cstring>
#include <mutex>
int main()
{
using namespace Azure::Storage::Blobs;
const std::string containerName = "result";
const std::string blobName = "test.mp4";
const std::string outputFileName = "demo1.mp4"; // Output file
auto containerClient = BlobContainerClient::CreateFromConnectionString("DefaultEndpointsProtocol=https;AccountName=venkat326123;AccountKey=redacted;EndpointSuffix=core.windows.net", containerName);
containerClient.CreateIfNotExists();
BlockBlobClient blobClient = containerClient.GetBlockBlobClient(blobName);
auto properties = blobClient.GetProperties().Value;
size_t blobSize = static_cast<size_t>(properties.BlobSize);
const size_t chunkSize = 4 * 1024 * 1024;
size_t totalChunks = (blobSize + chunkSize - 1) / chunkSize;
std::ofstream outputFile(outputFileName, std::ios::binary);
if (!outputFile.is_open())
{
std::cerr << "Failed to open output file: " << outputFileName << std::endl;
return 1;
}
std::mutex fileMutex;
std::vector<std::future<void>> futures;
auto start = std::chrono::high_resolution_clock::now();
// Start downloading each chunk in parallel
for (size_t i = 0; i < totalChunks; ++i)
{
futures.push_back(std::async(std::launch::async, [&, i]()
{
try
{
// Calculate start and length for each chunk
size_t start = i * chunkSize;
size_t length = std::min(chunkSize, blobSize - start);
// Define download options with range for each chunk
Azure::Storage::Blobs::DownloadBlobToOptions rangeOptions;
rangeOptions.Range = Azure::Core::Http::HttpRange{ static_cast<int64_t>(start), static_cast<int64_t>(length) };
// Temporary buffer for chunk
std::vector<uint8_t> buffer(length);
// Download chunk data into the temporary buffer
blobClient.DownloadTo(buffer.data(), length, rangeOptions);
// Lock and write the buffer to the file at the correct position
std::lock_guard<std::mutex> lock(fileMutex);
outputFile.seekp(start);
outputFile.write(reinterpret_cast<char*>(buffer.data()), length);
}
catch (const std::exception& e)
{
std::cerr << "Error downloading chunk " << i << ": " << e.what() << std::endl;
}
}));
}
// Wait for all chunks to finish downloading
for (auto& f : futures)
{
f.get();
}
// Stop timing the download
auto end = std::chrono::high_resolution_clock::now();
// Close the file stream
outputFile.close();
// Calculate time taken in seconds
std::chrono::duration<double> elapsedSeconds = end - start;
// Calculate download speed in MBps
double downloadSpeed = (blobSize / (1024.0 * 1024.0)) / elapsedSeconds.count();
std::cout << "Downloaded blob '" << blobName << "' to file '" << outputFileName << "' of size " << blobSize << " bytes." << std::endl;
std::cout << "Time taken: " << elapsedSeconds.count() << " seconds" << std::endl;
std::cout << "Download speed: " << downloadSpeed << " MBps" << std::endl;
return 0;
}
上面的代码将文件分成 4 MB 的块,并使用
std::async
并发下载它们,以实现高效的多线程,确保使用 std::mutex
的线程安全写入。
输出:
Downloaded blob 'test.mp4' to file 'demo1.mp4' of size 69632912 bytes.
Time taken: 15.3713 seconds
Download speed: 4.32019 MBps
文件:
另请检查您创建的 GitHub 链接,他们还可以提供很好的建议来帮助使用 C++ SDK。