我有一个 dotnet 应用程序,它依赖于 librdkafka-redist nuget。我想在基于 alpine 映像(为 linux-arm64 平台构建)的 docker 容器中运行此应用程序。
在运行时我看到:
Unhandled exception. System.DllNotFoundException: Failed to load the librdkafka native library.
at Confluent.Kafka.Impl.Librdkafka.TrySetDelegates(List`1 nativeMethodCandidateTypes)
at Confluent.Kafka.Impl.Librdkafka.LoadLinuxDelegates(String userSpecifiedPath)
at Confluent.Kafka.Impl.Librdkafka.Initialize(String userSpecifiedPath)
at Confluent.Kafka.Producer`2..ctor(ProducerBuilder`2 builder)
at Confluent.Kafka.ProducerBuilder`2.Build()
该库似乎已正确打包;如果我把壳放入容器中,我会看到这个:
$ ls /app/runtimes/linux-arm64/native
librdkafka.so
但我也可以说我会遇到问题,因为 alpine 没有附带 glibc 支持:
$ ldd /app/runtimes/linux-arm64/native/librdkafka.so
/lib/ld-musl-aarch64.so.1 (0xffffb4158000)
libm.so.6 => /lib/ld-musl-aarch64.so.1 (0xffffb4158000)
libdl.so.2 => /lib/ld-musl-aarch64.so.1 (0xffffb4158000)
libpthread.so.0 => /lib/ld-musl-aarch64.so.1 (0xffffb4158000)
libc.so.6 => /lib/ld-musl-aarch64.so.1 (0xffffb4158000)
Error loading shared library ld-linux-aarch64.so.1: No such file or directory (needed by /app/runtimes/linux-arm64/native/librdkafka.so)
Error relocating /app/runtimes/linux-arm64/native/librdkafka.so: __vsnprintf_chk: symbol not found
...
我可以使用 alpine gcompat 包和
patchelf
显然可以解决其中一些问题(稍微匿名的 Dockerfile):
FROM mcr.microsoft.com/dotnet/aspnet:6.0-alpine
RUN apk add patchelf
RUN apk add binutils
RUN apk add gcompat
RUN adduser --disabled-password \
--gecos "" \
--no-create-home \
--uid 10028 \
myuser
USER myuser
COPY --chown=myuser:myuser . app/
RUN patchelf --remove-needed ld-linux-aarch64.so.1 /app/runtimes/linux-arm64/native/librdkafka.so && \
patchelf --add-needed libgcompat.so.0 /app/runtimes/linux-arm64/native/librdkafka.so
ENV COREHOST_TRACE=1
ENTRYPOINT [ "dotnet", "app/MyApp.dll" ]
此时我认为我已经解决了我的原生依赖问题:
$ ldd /app/runtimes/linux-arm64/native/librdkafka.so
/lib/ld-musl-aarch64.so.1 (0xffffb5290000)
libgcompat.so.0 => /lib/libgcompat.so.0 (0xffffb4d5b000)
libm.so.6 => /lib/ld-musl-aarch64.so.1 (0xffffb5290000)
libdl.so.2 => /lib/ld-musl-aarch64.so.1 (0xffffb5290000)
libpthread.so.0 => /lib/ld-musl-aarch64.so.1 (0xffffb5290000)
libc.so.6 => /lib/ld-musl-aarch64.so.1 (0xffffb5290000)
libucontext.so.1 => /lib/libucontext.so.1 (0xffffb4d49000)
libobstack.so.1 => /usr/lib/libobstack.so.1 (0xffffb4d36000)
但我在运行时仍然得到
DllNotFoundException
。当我设置COREHOST_TRACE=1
时,我可以看到
Adding runtimeTargets native asset runtimes/linux-arm64/native/librdkafka.so rid=linux-arm64 assemblyVersion= fileVersion=0.0.0.0 from librdkafka.redist/1.9.2
...
Chose linux-arm64, so removing rid (win-x86) specific assets for package librdkafka.redist/1.9.2 and asset type native
Chose linux-arm64, so removing rid (win-x64) specific assets for package librdkafka.redist/1.9.2 and asset type native
Chose linux-arm64, so removing rid (osx-x64) specific assets for package librdkafka.redist/1.9.2 and asset type native
Chose linux-arm64, so removing rid (osx-arm64) specific assets for package librdkafka.redist/1.9.2 and asset type native
Chose linux-arm64, so removing rid (linux-x64) specific assets for package librdkafka.redist/1.9.2 and asset type native
...
Reconciling library librdkafka.redist/1.9.2
Parsed native deps entry 0 for asset name: librdkafka from package: librdkafka.redist, library version: 1.9.2, relpath: runtimes/linux-arm64/native/librdkafka.so, assemblyVersion , fileVersion 0.0.0.0
...
Processing native/culture for deps entry [librdkafka.redist, 1.9.2, runtimes/linux-arm64/native/librdkafka.so]
Considering entry [librdkafka.redist/1.9.2/runtimes/linux-arm64/native/librdkafka.so], probe dir [], probe fx level:0, entry fx level:0
Relative path query /app/runtimes/linux-arm64/native/librdkafka.so (skipped file existence check)
Probed deps dir and matched '/app/runtimes/linux-arm64/native/librdkafka.so'
Adding to native path: /app/runtimes/linux-arm64/native/
...
Property NATIVE_DLL_SEARCH_DIRECTORIES = /app/runtimes/linux-arm64/native/:/usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.11/:
在启动失败之前。
所以我认为我已经 a) 确定框架知道(并希望尝试)加载我认为是的本机库,b) 确定没有缺少依赖项,这意味着库无法加载。
我还可以执行更多流程或步骤来进一步诊断发生的情况吗?我认为有一系列ldd
没有向我展示的依赖关系——例如,在
openssl
、zlib
等上,这可能是问题所在?
我最终确实解决了这个问题。我使用
strace
来检查进程到底在寻找什么,发现它正在寻找名为 alpine-librdkafka.so
的库,而不是异常中指示的 librdkafka.so
。
后来进行了一些挖掘,我发现
Confluent.Kafka
实际上是在发行版内部进行切换,并寻找以发行版命名的文件。不幸的是,在我们使用的版本中没有 alpine-librdkafka.so
arm64 版本。
我们已经从源代码添加了一个构建到我们的管道中,现在一切都很好。顺便说一句,我相信所需的构建是在
1.9.0
左右添加的,但对我们来说升级比仅仅从源头构建和打包我们需要的构建更复杂。
为 rhel 运行时发布图像时遇到同样的问题:
dotnet publish -r rhel.8-x64
问题解决了,然后我将运行时更改为linux:
dotnet publish -r linux-x64