我在 MinIO 服务器中以 Delta 格式存储了一些数据,如下所示:
minio server --address :9010 --console-address :19010 ./data
然后,当我尝试用 Polars 阅读它时,我观察到约 1 秒的延迟,然后出现警告,然后一切正常。
警告:
2024-04-08T07:17:19Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: timeout: error trying to connect: HTTP connect timeout occurred after 1s: HTTP connect timeout occurred after 1s: timed out (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 1s }), connection: Unknown } }) }))
[2024-04-08T07:17:19Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Host is down (os error 64): tcp connect error: Host is down (os error 64): Host is down (os error 64) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 64, kind: Uncategorized, message: "Host is down" })), connection: Unknown } }) }))
[2024-04-08T07:17:19Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Host is down (os error 64): tcp connect error: Host is down (os error 64): Host is down (os error 64) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 64, kind: Uncategorized, message: "Host is down" })), connection: Unknown } }) }))
代码:
import polars as pl
minio_storage_options = {
"AWS_ENDPOINT_URL": "http://localhost:9010",
"AWS_ACCESS_KEY_ID": "minioadmin",
"AWS_SECRET_ACCESS_KEY": "minioadmin",
"AWS_REGION": "<localhost>", # Unused??
"AWS_ALLOW_HTTP": "true", # Required
}
df = pl.read_delta("s3://reddit-submissions/submissions-raw", storage_options=minio_storage_options)
print(df.head())
我在这里做错了什么?
❯ uv pip freeze | grep "delta\|polars"
deltalake==0.16.4
polars==0.20.18
❯ python
Python 3.11.5 (main, Aug 24 2023, 15:09:45) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ^D
现在您有两种方法可以“解决”此问题:这两种方法都可以消除警告并基本上提高性能,因为不会尝试身份验证方法。
代码:
os.environ["AWS_EC2_METADATA_DISABLED"] = "true"
storage_options
中设置参数(第一次执行似乎不起作用,导致延迟约3秒,但随后由于某种原因起作用)代码:
df.write_delta(
target=s3_path,
overwrite_schema=True,
mode="overwrite",
storage_options = {
"AWS_S3_ALLOW_UNSAFE_RENAME": "true", # Required if we don't use a LockClient
"AWS_REGION": "x",
"AWS_ACCESS_KEY_ID": "x",
"AWS_SECRET_ACCESS_KEY": "x",
"AWS_SESSION_TOKEN": "x"
}
)