将 DeltaTable 与 Azure Blob 存储结合使用时超出最大重试次数

问题描述 投票:0回答:1

我在使用

deltalake
库将数据保存/加载数据到 Azure Blob 存储时遇到问题。有时,我会收到以下错误:

DatasetError: Failed while saving data to data set CustomDeltaTableDataset(file_example).
Failed to parse parquet: Parquet error: AsyncChunkReader::get_bytes error:
Generic MicrosoftAzure error: Error after 10 retries in 2.196683949s, max_retries:10, 
retry_timeout:180s, source:error sending request for url 
(https://<address>/file.parquet):
 error trying to connect: dns error: failed to lookup address information: Name or service not known

这是我正在使用的参数的示例:

from deltalake import DeltaTable


datalake_vale = {
    'account_name': <account>,
    'client_id': <cli_id>,
    'tenant_id': <tenant_id>,
    'client_secret': <secret>,
    'timeout': '100000s'
}

# Load data from the delta table
dt = DeltaTable("abfs://<azure_address>", storage_options=datalake_vale)

我正在寻找像 max_retries 这样的参数,但找不到任何相关的内容。有谁知道这个问题的解决方案或解决方法吗?

预先感谢您的帮助!

python-3.x azure-blob-storage databricks delta-lake
1个回答
0
投票

您可以控制重试次数和超时次数,如下所示:

datalake_vale = {
    'account_name': account_name,
    'client_id': client_id,
    'tenant_id': tenant_id,
    'client_secret': client_secret,
    'timeout': '100000s',
    'retries': '20', 
    'retry_delay': '2',
}

以下是完整代码供您参考:

from deltalake import DeltaTable
import fsspec

# Azure Blob Storage configuration
account_name = '<accountName>'
client_id = '<clientId>'
tenant_id = '<tenantId>'
client_secret = '<ClientSecret>'
container_name = '<containerName>'

# Construct storage_options dictionary with retry settings
datalake_vale = {
    'account_name': account_name,
    'client_id': client_id,
    'tenant_id': tenant_id,
    'client_secret': client_secret,
    'timeout': '100000s',
    'retries': '20', 
    'retry_delay': '2',
}

# Azure Blob Storage path for Delta Table
delta_table_path = f"abfss://{container_name}@{account_name}.dfs.core.windows.net/<deltaTablePath>"

# Load DeltaTable with storage_options
dt = DeltaTable(delta_table_path, storage_options=datalake_vale)

# Example: Retrieve and print schema
print(dt.schema())

您可以得到如下所示的输出:

enter image description here

增量表将成功加载,您可以更新它。确保您使用的是 ADLS 帐户。它不适用于 blob 存储。

© www.soinside.com 2019 - 2024. All rights reserved.