malloc():检测到未对齐的 tcache 块。有人在 MPI fortran 程序中遇到过这个问题吗?

问题描述 投票:0回答:1

我有一个 MPI 程序,如果我在一个处理器上运行它,而不是在 8 个处理器上运行它,我会遇到“malloc():检测到未对齐的 tcache 块”错误。内存分配如下所示:


  ALLOCATE(XPOINTS((Npx+1)))
  IF(MY_RANK .eq. 0) WRITE(*,*)  "TESTING"
  ALLOCATE(YPOINTS((Npy+1)))
  ALLOCATE(ZPOINTS((Npz+1)))
  ALLOCATE(x_GLBL((1-Ngl):(Nx_glbl+Ngl)))
  ALLOCATE(y_GLBL((1-Ngl):(Ny_glbl+Ngl)))
  ALLOCATE(z_GLBL((1-Ngl):(Nz_glbl+Ngl)))

注意,我已经验证了所有分配的数字都是整数。 这是我看到的错误:

 TESTING
malloc(): unaligned tcache chunk detected
malloc(): unaligned tcache chunk detected

Program received signal SIGABRT: Process abort signal.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

Backtrace for this error:
#0  0x7f2145348960 in ???
#1  0x7f2145347ac5 in ???
#2  0x7f214513e51f in ???
        at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x7f21451929fc in __pthread_kill_implementation
        at ./nptl/pthread_kill.c:44
#4  0x7f21451929fc in __pthread_kill_internal
        at ./nptl/pthread_kill.c:78
#5  0x7f21451929fc in __GI___pthread_kill
        at ./nptl/pthread_kill.c:89
#6  0x7f214513e475 in __GI_raise
        at ../sysdeps/posix/raise.c:26
#7  0x7f21451247f2 in __GI_abort
        at ./stdlib/abort.c:79
#8  0x7f2145185675 in __libc_message
        at ../sysdeps/posix/libc_fatal.c:155
#9  0x7f214519ccfb in malloc_printerr
        at ./malloc/malloc.c:5664
#10  0x7f21451a13db in tcache_get
        at ./malloc/malloc.c:3195
#11  0x7f21451a13db in __GI___libc_malloc
        at ./malloc/malloc.c:3313
#12  0x55ecaeda5ab3 in ???
#13  0x55ecaed90452 in ???
#14  0x55ecaed902ee in ???
#15  0x7f2145125d8f in __libc_start_call_main
        at ../sysdeps/nptl/libc_start_call_main.h:58
#16  0x7f2145125e3f in __libc_start_main_impl
        at ../csu/libc-start.c:392
#17  0x55ecaed90324 in ???
#18  0xffffffffffffffff in ???
#0  0x7efe26f48960 in ???
#1  0x7efe26f47ac5 in ???
#2  0x7efe26d3e51f in ???
        at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x7efe26d929fc in __pthread_kill_implementation
        at ./nptl/pthread_kill.c:44
#4  0x7efe26d929fc in __pthread_kill_internal
        at ./nptl/pthread_kill.c:78
#5  0x7efe26d929fc in __GI___pthread_kill
        at ./nptl/pthread_kill.c:89
#6  0x7efe26d3e475 in __GI_raise
        at ../sysdeps/posix/raise.c:26
#7  0x7efe26d247f2 in __GI_abort
        at ./stdlib/abort.c:79
#8  0x7efe26d85675 in __libc_message
        at ../sysdeps/posix/libc_fatal.c:155
#9  0x7efe26d9ccfb in malloc_printerr
        at ./malloc/malloc.c:5664
#10  0x7efe26da13db in tcache_get
        at ./malloc/malloc.c:3195
#11  0x7efe26da13db in __GI___libc_malloc
        at ./malloc/malloc.c:3313
#12  0x55fa223ddab3 in ???
#13  0x55fa223c8452 in ???
#14  0x55fa223c82ee in ???
#15  0x7efe26d25d8f in __libc_start_call_main
        at ../sysdeps/nptl/libc_start_call_main.h:58
#16  0x7efe26d25e3f in __libc_start_main_impl
        at ../csu/libc-start.c:392
#17  0x55fa223c8324 in ???
#18  0xffffffffffffffff in ???

以前有人遇到过这种情况吗?我尝试了所有方法,但不明白为什么它不能在少于 8 个处理器上运行。尝试使用 Intel 和 GNU fortran。这是我的笔记本电脑特有的问题吗?

我尝试使用 Intel 和 GNU 编译器。它适用于 8 个处理器,但不适用于 1 个处理器。

编辑:我无法在更简单的程序中重现此错误,因此我附加了 git hub 存储库:https://github.com/SahajSJain/MyPoisonX.git

memory fortran mpi gfortran intel-fortran
1个回答
0
投票

消息

malloc(): unaligned tcache chunk detected
是来自
allocate
底层实现的错误消息。在您的情况下, malloc 的实现似乎在堆分配旁边存储有关分配块的附加元信息。在分配期间,malloc 检测到此元数据已损坏,这通常是由于对另一个分配的越界写入引起的。

AddressSanitizer 和 valgrind 是在执行期间检测此类越界访问的工具。我尝试使用 gfortran 和 OpenMPI 编译您的代码。编译器抱怨对

MPI_Cart_create
MPI_Cart_coords
的调用与声明不匹配。
PeriodicArr
必须声明
LOGICAL
。对
MPI_Cart_coords
的调用缺少
ierror
参数。

要使用 AddressSanitizer,请将

-fsanitize=address
添加到 CFLAGS 和 LFLAGS。

使用

mpirun -np 2 env ASAN_OPTIONS="detect_leaks=0" ./MyPoisonX
执行然后报告:

At line 199 of file CODE.SETUP_FIELD_VARIABLES.F90
Fortran runtime error: Index '27' of dimension 1 of array 'dxinv' above upper bound of 26

禁用泄漏检查对于避免大量与 MPI 相关的内存泄漏充斥屏幕是必要的。

我无法在我的系统上重现该错误,但缺少的

ierror
参数可能已经解释了该问题。

© www.soinside.com 2019 - 2024. All rights reserved.