在C中使用相同参数生成n个pthread的最有效方法

Question

我有32个线程，我提前知道输入参数，函数内部没有任何变化（除了每个线程与之交互的内存缓冲区）。

在伪C代码中，这是我的设计模式：

// declare 32 pthreads as global variables

void dispatch_32_threads() {
   for(int i=0; i < 32; i++) {
      pthread_create( &thread_id[i], NULL, thread_function, (void*) thread_params[i] );
   }
   // wait until all 32 threads are finished
   for(int j=0; j < 32; j++) {
      pthread_join( thread_id[j], NULL); 
   }
}

int main (crap) {

    //init 32 pthreads here

    for(int n = 0; n<4000; n++) {
        for(int x = 0; x<100< x++) {
            for(int y = 0; y<100< y++) {
                dispatch_32_threads();
                //modify buffers here
            }
        }
    }
}

我打电话给dispatch_32_threads 100*100*4000= 40000000次。 thread_function和(void*) thread_params[i]不会改变。我认为pthread_create不断创建和销毁线程，我有32个核心，没有一个是100％利用率，它徘徊在12％左右。此外，当我将线程数减少到10时，所有32个内核的利用率仍然保持在5-7％，我认为运行时没有减速。运行不到10个缓慢的事情。

然而，运行1个线程非常慢，因此多线程正在帮助。我描述了我的代码，我知道它的thread_func很慢，并且thread_func是可并行化的。这让我相信pthread_create在不同的核心上不断产生和销毁线程，并且在10个线程之后我失去了效率，并且它变得更慢，thread_func本质上“不那么复杂”而不是产生超过10个线程。

这个评估是真的吗？使用100％所有核心的最佳方法是什么？

Answer 1

线程创建很昂贵。它取决于不同的参数，但很少低于1000个循环。并且线程同步和销毁类似。如果你的thread_function中的工作量不是很高，那么它将在很大程度上支配计算时间。

在内部循环中创建线程很少是个好主意。可能最好的是创建线程来处理外部循环的迭代。根据您的程序以及thread_function在迭代之间可能存在依赖关系，这可能需要一些重写，但解决方案可能是：

int outer=4000;
int nthreads=32;
int perthread=outer/nthreads;

// add an integer with thread_id to thread_param struct
void thread_func(whatisrequired *thread_params){
  // runs perthread iteration of the loop beginning at start
    int start = thread_param->thread_id;
    for(int n = start; n<start+perthread; n++) {
        for(int x = 0; x<100< x++) {
            for(int y = 0; y<100< y++) {
                //do the work
            }
        }
    }
}

int main(){
   for(int i=0; i < 32; i++) {
      thread_params[i]->thread_id=i;
      pthread_create( &thread_id[i], NULL, thread_func, 
              (void*) thread_params[i]);
   }
   // wait until all 32 threads are finished
   for(int j=0; j < 32; j++) {
      pthread_join( thread_id[j], NULL); 
   }
}

通过这种并行化，您可以考虑使用openmp。 parallel for子句将使您轻松地尝试最佳并行化方案。

如果存在依赖关系并且无法实现这种明显的并行化，则可以在程序启动时创建线程，并通过管理thread pool来使其工作。管理队列比创建线程要便宜（但原子访问确实有成本）。

编辑：或者，你可以 1.将所有循环放入线程函数中 2.在内循环的开始（或结束）添加一个barrier来同步你的线程。这将确保所有线程完成其工作。 3.在main中创建所有线程并等待完成。障碍比线程创建便宜，结果也是相同的。

在C中使用相同参数生成n个pthread的最有效方法

问题描述投票：0回答：1

1个回答

最新问题

在C中使用相同参数生成n个pthread的最有效方法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1