由于 OpenSSL 版本导致无法解决的构建服务错误

问题描述 投票:0回答:1

pyspark @Functions.pandas_udf 函数(装饰器)将导致 Foundry Build Service 环境中出现无法解决的依赖错误。它使用 pyarrow,它使用构建系统环境没有的 openssl 版本,甚至通过将其放入 meta.yml 中将其安装在用户/项目环境中也无法解决问题。 标准输出:

ImportError: PyArrow >= 4.0.0 must be installed; however, it was not found.

Traceback (most recent call last):
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/pyspark/sql/pandas/utils.py", line 53, in require_minimum_pyarrow_version
    import pyarrow
  File "/app/work-dir/__environment__/__SYMLINKS__/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
ImportError: /app/work-dir/__python_runtime_environment__/__SYMLINKS__/lib-dynload/../../libcrypto.so.3: version OPENSSL_3.4.0' not found (required by /app/work-dir/__environment__/__SYMLINKS__/site-packages/pyarrow/../../../././libssl.so.3)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/transforms_spark_module/delegate.py", line 100, in _execute_job
    result = job.run(
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/transforms/_build.py", line 329, in run
    self._transform.compute(**kwargs, **parameters)
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/transforms/api/_transform.py", line 334, in compute
    output_df: Union[DataFrame, Any] = self(**kwargs)
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/transforms/api/_transform.py", line 183, in __call__
    return self._compute_func(*args, **kwargs)
  File "/app/work-dir/__user_code_environment__/__SYMLINKS__/site-packages/myproject/datasets/data_anon.py", line 15, in compute
    "case_weight": redist(df, "case_weight"),
  File "/app/work-dir/__user_code_environment__/__SYMLINKS__/site-packages/myproject/datasets/utils.py", line 32, in redist
    df = df.withColumn(column_name, add_noise(column_name, dist))
  File "/app/work-dir/__user_code_environment__/__SYMLINKS__/site-packages/myproject/datasets/utils.py", line 14, in add_noise
    @F.pandas_udf(DoubleType())
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/pyspark/sql/pandas/functions.py", line 338, in pandas_udf
    require_minimum_pyarrow_version()
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/pyspark/sql/pandas/utils.py", line 60, in require_minimum_pyarrow_version
    raise ImportError(
ImportError: PyArrow >= 4.0.0 must be installed; however, it was not found.

palantir-foundry
1个回答
0
投票

PySpark 错误消息具有误导性,正如您已正确识别的那样,这是由 Conda 环境和 Foundry Build 环境中存在不同的 OpenSSL 版本引起的。

较新版本的 Python Transforms 附带 OpenSSL 3.4.0,因此您可以通过确保不在 meta.yaml 文件中固定 openssl 版本并通过将存储库升级到最新模板版本来解决此问题。

© www.soinside.com 2019 - 2024. All rights reserved.