我正在尝试将 Julia MPI 嵌入到 C 代码中,如下所示。 MPI 似乎在 C 本身中工作得很好,但每当我尝试在 Julia 中获得排名时,它就会崩溃。程序抱怨通讯器无效。谁能帮助我吗?我正在使用 Open MPI 4.1.3
下面是显示我的机器上的问题的最小示例。基本上,它甚至无法获得 Julia 中
MPI.COMM_WORLD
的大小或排名。
#include <mpi.h>
#include <stdio.h>
#include <julia.h>
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
jl_init();
(void) jl_eval_string("println(\"Loading MPI...\")");
(void) jl_eval_string("using MPI");
(void) jl_eval_string("println(\"Done.\")");
(void) jl_eval_string("if MPI.Initialized() ; println(\"MPI is initialized.\") ; else ; println(\"Warning: MPI is not initialized.\") ; end ");
(void) jl_eval_string("comm = MPI.COMM_WORLD");
(void) jl_eval_string("println(comm)");
(void) jl_eval_string("println(MPI.Comm_size(comm))");
jl_atexit_hook(0);
MPI_Finalize();
return 0;
}
使用下面的代码进行编译:
mpicc main.c -I$JULIA_INC -L$JULIA_LIB -ljulia -o run.exe
并与
一起运行mpirun -np 2 ./run.exe
我的输出
Loading MPI...
Done.
MPI is initialized.
MPI.Comm(1140850688)
[4188745] signal (11.1): Segmentation fault
in expression starting at none:1
PMPI_Comm_size at /home/t2hsu/miniconda3/envs/mpi/lib/libmpi.so.40 (unknown line)
MPI_Comm_size at /home/t2hsu/.julia/packages/MPI/TKXAj/src/api/generated_api.jl:999 [inlined]
Comm_size at /home/t2hsu/.julia/packages/MPI/TKXAj/src/comm.jl:78
jfptr_Comm_size_591 at /home/t2hsu/.julia/compiled/v1.9/MPI/nO0XF_FB87d.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
do_call at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
ijl_eval_string at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:113
main at ./run_c.exe (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
_start at ./run_c.exe (unknown line)
Allocations: 2997 (Pool: 2985; Big: 12); GC: 0
Segmentation fault (core dumped)
main.c
#include <mpi.h>
#include <stdio.h>
#include <julia.h>
int main(int argc, char *argv[]) {
int rank, size;
char cmd[1024];
// Initialize the MPI environment
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("Hello from process %d of %d\n", rank, size);
int comm_id = MPI_Comm_c2f(MPI_COMM_WORLD);
jl_init();
(void) jl_eval_string("using MPI");
(void) jl_eval_string("if MPI.Initialized() ; println(\"MPI is initialized.\") ; else ; println(\"Warning: MPI is not initialized.\") ; end ");
sprintf(cmd, "comm = MPI.Comm(%d)", comm_id);
printf("Goinig to evaluate:\n");
printf(cmd);
printf("\n");
(void) jl_eval_string(cmd);
(void) jl_eval_string("println(comm)");
(void) jl_eval_string("println(MPI.Comm_rank(comm))");
jl_atexit_hook(0);
MPI_Finalize();
return 0;
}
使用以下代码进行编译:
mpicc main.c -I$JULIA_INC -L$JULIA_LIB -ljulia -o run.exe
并与
一起运行mpirun -np 2 ./run.exe
但是,我得到了错误输出:
Hello from process 0 of 2
Hello from process 1 of 2
MPI is initialized.
Goinig to evaluate:
comm = MPI.Comm(0)
MPI is initialized.
Goinig to evaluate:
comm = MPI.Comm(0)
MPI.Comm(0)
[exp-18-53:1727695] *** An error occurred in MPI_Comm_rank
[exp-18-53:1727695] *** reported by process [1988952065,1]
[exp-18-53:1727695] *** on communicator MPI_COMM_WORLD
[exp-18-53:1727695] *** MPI_ERR_COMM: invalid communicator
[exp-18-53:1727695] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[exp-18-53:1727695] *** and potentially your MPI job)
我自己发现了问题。这是因为我的 Julia MPI 没有使用与 C 相同的 MPI 库。
我按照
MPI.jl 文档中的建议使用
MPIPreferences
来重新配置目标 MPI 库,从而解决了这个问题。
(void) jl_eval_string("using MPIPreferences");
(void) jl_eval_string("MPIPreferences.use_system_binary(; library_names=[\"/home/t2hsu/miniconda3/envs/mpi/lib/libmpi\"]);");