我是 OpenMP 的新手。我编写了一个代码,通过计算函数
f(x) = 4/(1+x^2)
的面积来计算 Pi 的值。我尝试检查不同线程的效率。下面是我的代码:
int main() {
long long num_iteration = 100000000;
double pi = 0;
double x_margin = 1.0 / num_iteration;
double start_time, end_time;
int num_threads_main = 0;
int j = NUM_THREAD;
int a;
for(a = 1; a <= j; a++) {
pi = 0.0;
omp_set_num_threads(a);
start_time = omp_get_wtime();
#pragma omp parallel
{
int i;
int ID = omp_get_thread_num();
double sum = 0.0;
int num_threads_local = omp_get_num_threads();
double x_value = 0;
if(ID == 0)
num_threads_main = num_threads_local;
for (i = ID; i < num_iteration; i = i + num_threads_local) {
x_value = x_margin * ( i + 0.5 );
sum = sum + ( 4.0 / (1.0 + x_value * x_value));
}
#pragma critical
pi = pi + sum * x_margin;
}
end_time = omp_get_wtime() - start_time;
printf("pi with 1000000000 steps is %f in %f seconds with threads %d\n", pi, end_time, num_threads_main);
}
return 0;
}
然而,当我得到一个奇怪的结果时:
pi with 1000000000 steps is 3.141593 in 0.333257 seconds with threads 1
pi with 1000000000 steps is 3.141593 in 0.175574 seconds with threads 2
pi with 1000000000 steps is 3.141593 in 0.177884 seconds with threads 3
pi with 1000000000 steps is 3.141593 in 0.170591 seconds with threads 4
为什么线程越多我的程序速度反而不快?我知道 OpenMP 可能会产生一些开销。不过,我将我的代码与 Tim Mattson 的 OpenMP 教程代码进行了比较。我发现两者之间非常相似。以下是他的代码:
static long num_steps = 100000000;
double step;
int main ()
{
int i,j;
double pi, full_sum = 0.0;
double start_time, run_time;
double sum[MAX_THREADS];
step = 1.0/(double) num_steps;
for(j=1;j<=MAX_THREADS ;j++){
omp_set_num_threads(j);
full_sum = 0.0;
start_time = omp_get_wtime();
#pragma omp parallel private(i)
{
int id = omp_get_thread_num();
int numthreads = omp_get_num_threads();
double x;
double partial_sum = 0;
#pragma omp single
printf(" num_threads = %d",numthreads);
for (i=id;i< num_steps; i+=numthreads){
x = (i+0.5)*step;
partial_sum += + 4.0/(1.0+x*x);
}
#pragma omp critical
full_sum += partial_sum;
}
pi = step * full_sum;
run_time = omp_get_wtime() - start_time;
printf("\n pi is %f in %f seconds %d threds \n ",pi,run_time,j);
}
}
这是他的结果:
num_threads = 1
pi is 3.141593 in 0.483153 seconds 1 threds
num_threads = 2
pi is 3.141593 in 0.305407 seconds 2 threds
num_threads = 3
pi is 3.141593 in 0.246802 seconds 3 threds
num_threads = 4
pi is 3.141593 in 0.204905 seconds 4 threds
我在配备 2.3 GHz 双核 Intel Core i5 的 MacOS 10.15.3 上进行了实验。我是不是错过了什么。谢谢你。
您可能没有在 VS 设置中启用 OpenMP。如果不启用它,您仅使用一个线程