考虑以下程序:
#include <iostream>
#include <mpi.h>
int main() {
int provided = -1;
MPI_Init_thread(NULL, NULL, MPI_THREAD_MULTIPLE, &provided);
if (provided != MPI_THREAD_MULTIPLE) {
return -1;
}
int this_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &this_rank);
double aze[36864]{};
MPI_Request req = MPI_REQUEST_NULL;
std::cout << this_rank << " starting bcast" << std::endl;
MPI_Ibcast(aze, 36864, MPI_DOUBLE, 1, MPI_COMM_WORLD, &req);
std::cout << this_rank << " req0 " << req << std::endl;
#pragma omp parallel
{
MPI_Status stat{};
// do {
MPI_Wait(&req, &stat);
// } while(req != MPI_REQUEST_NULL);
if (req != MPI_REQUEST_NULL) {
std::cout << this_rank << " wait returned non null request: " << req
<< " vs " << MPI_REQUEST_NULL << std::endl;
std::cout << this_rank << " MPI_SOURCE: " << stat.MPI_SOURCE << std::endl;
std::cout << this_rank << " MPI_TAG: " << stat.MPI_TAG << std::endl;
std::cout << this_rank << " MPI_ERROR: " << stat.MPI_ERROR << std::endl;
}
}
{
volatile int dummy = 0;
while (dummy != 1'000'000'000) {
dummy++;
}
std::cout << this_rank << " sleep done" << std::endl;
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
我正在使用 OpenMPI 5.0.2。我在最新版本中使用了 clang 和 gcc。我像这样运行并构建上面的再现器:
$ g++ -fopenmp ~/Downloads/trash/repro.mpi.cc -isystem /usr/include/openmpi-x86_64 -L /usr/lib64/openmpi/lib/ -lmpi
$ export OMP_NUM_THREADS=2
$ mpirun -n 4 ./a.out
预期的标准输出是(按排名标识符排序):
0 starting bcast
0 req0 0x3e573f28
0 sleep done
1 starting bcast
1 req0 0x79d9298
1 sleep done
2 starting bcast
2 req0 0xc4841b8
2 sleep done
3 starting bcast
3 req0 0x2fdf2f18
3 sleep done
请注意,地址可能会明显改变。
观察到的行为,就像这样(再次按排名标识符排序):
0 starting bcast
0 req0 0x25aa6f28
0 wait returned non null request: 0x25aa6f28 vs 0x4045e0
0 MPI_SOURCE: 0
0 MPI_TAG: 0
0 MPI_ERROR: 0
0 sleep done
1 starting bcast
1 req0 0xb169298
1 sleep done
2 starting bcast
2 req0 0xc4f81b8
2 wait returned non null request: 0xc4f81b8 vs 0x4045e0
2 MPI_SOURCE: 0
2 MPI_TAG: 0
2 MPI_ERROR: 0
2 sleep done
3 starting bcast
3 req0 0x10ccbf18
3 wait returned non null request: 0x10ccbf18 vs 0x4045e0
3 MPI_SOURCE: 0
3 MPI_TAG: 0
3 MPI_ERROR: 0
3 sleep done
我们观察到,当 MPI_Wait 返回时,不会在 MPI_Status 或日志中报告任何错误,MPI_Request 不会被释放并设置为 MPI_REQUEST_NULL。
根据 OpenMPI 文档和标准:
A call to MPI_Wait returns when the operation identified by request is complete. If the communication object associated with this request was created by a nonblocking send or receive call, then the object is deallocated by the call to MPI_Wait and the request handle is set to MPI_REQUEST_NULL.
(https://docs.open-mpi.org/en/v5.0.x/man-openmpi/man3/MPI_Wait.3.html#description)。
上面的代码片段不合理吗?请注意,如果 MPI_Wait 周围的 do/while 循环未注释,则代码将生成我期望的输出。但随后,它就变成了 MPI_Test 的语义(轮询)。 openmp 位是触发该问题的关键。
相关文本是(从 MPI 4.0 复制,但在其他版本中也同样存在):
多个线程完成同一个请求。一个程序,其中两个 线程阻塞等待同一请求是错误的。相似地, 同一请求不能出现在两个请求数组中 并发 MPI_{WAIT|TEST}{ANY|SOME|ALL} 调用。在 MPI 中,请求可以 只能完成一次。任何违反等待或测试的组合 这个规则是错误的。
唯一可以与指向同一请求句柄的指针同时调用的完成函数是
MPI_Test
。重要的一点是,所有线程实际上都引用公共请求句柄的相同存储。