我正在运行一个 openmp 程序(CentOS 8.5 上的 gcc 和 libgomp)。我使用 strace 进行检查,发现 syscall clone 被一遍又一遍地调用(我在下面提供了部分日志),我认为这意味着 openmp 线程不断重新创建,因为所有其他非 openmp 线程都有固定的数量,并且全部在主函数的一开始就初始化了。
但是我也尝试过写一个简单的openmp程序,看来openmp在初始化阶段创建了一个线程池,并在以后重用了它。
所以我的问题是:在什么情况下,libgomp 线程会终止,并重新创建线程?
clone(child_stack=0x7f16bff89ef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265184], tls=0x7f16bff8f700, child_tidptr=0x7f16bff8f9d0) = 3265184
sched_setaffinity(3265184, 16, [8]) = 0
futex(0x7f16bff8fd18, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4012f184, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4012f184, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clone(child_stack=0x7f16c1f8def0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265185], tls=0x7f16c1f93700, child_tidptr=0x7f16c1f939d0) = 3265185
sched_setaffinity(3265185, 16, [2]) = 0
futex(0x7f16c1f93d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c078aef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265186], tls=0x7f16c0790700, child_tidptr=0x7f16c07909d0) = 3265186
sched_setaffinity(3265186, 16, [4]) = 0
futex(0x7f16c0790d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16bff89ef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265187], tls=0x7f16bff8f700, child_tidptr=0x7f16bff8f9d0) = 3265187
sched_setaffinity(3265187, 16, [6]) = 0
futex(0x7f16bff8fd18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c178cef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265188], tls=0x7f16c1792700, child_tidptr=0x7f16c17929d0) = 3265188
sched_setaffinity(3265188, 16, [8]) = 0
futex(0x7f16c1792d18, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clone(child_stack=0x7f16c1f8def0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265189], tls=0x7f16c1f93700, child_tidptr=0x7f16c1f939d0) = 3265189
sched_setaffinity(3265189, 16, [2]) = 0
futex(0x7f16c1f93d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c178cef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265190], tls=0x7f16c1792700, child_tidptr=0x7f16c17929d0) = 3265190
sched_setaffinity(3265190, 16, [4]) = 0
环境变量:
export PARALLEL_ENSEMBLE_THREADS=5
export GOMP_CPU_AFFINITY=7,2,4,6,8
这更像是一个“系统管理员”答案,但您可以使用
strace
为您提供 stacktraces 显示调用给定系统调用的位置。为此,请使用 -k
命令行选项。例如,如果您尝试以下操作:
$ strace -etrace=clone -k -y -f -f bash -c "ls /dev/null;command echo"
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7e1b55ccaa10) = 141358
> /usr/lib/x86_64-linux-gnu/libc.so.6(_Fork+0x27) [0xee1a7]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_fork+0x52) [0xf3fb2]
> /usr/bin/bash(make_child+0x19e) [0x6909e]
> /usr/bin/bash(adjust_shell_level+0x315) [0x56935]
> /usr/bin/bash(adjust_shell_level+0x1bda) [0x581fa]
> /usr/bin/bash(execute_command_internal+0xb88) [0x4ae08]
> /usr/bin/bash(execute_command+0xce) [0x4dc6e]
> /usr/bin/bash(execute_command_internal+0x2ff2) [0x4d272]
> /usr/bin/bash(parse_and_execute+0x7ab) [0xb5ffb]
> /usr/bin/bash(_rl_enable_paren_matching+0xb0ce) [0x11e2de]
> /usr/bin/bash(main+0xf78) [0x33568]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x8a) [0x2a1ca]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x2a28b]
> /usr/bin/bash(_start+0x25) [0x34385]
strace: Process 141358 attached
/dev/null
...
(您会看到
SIGCHLD
,因为 strace
默认情况下会显示信号,但这部分与答案无关)
对于您的应用程序(仅跟踪
clone()
,跟随子级 -f
),这应该告诉您在 libgomp
的代码中创建线程的位置。