这是我想要并行化的伪代码,但不知道从哪里开始
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client['myDB']
collection = db.myCollection
test_list = ['foo', 'bar']
result_list = list()
for el in test_list:
result_list.append(collection.distinct('attrib',{'version': el}))
我知道如何用joblib创建并行循环,但我不知道如何并行查询MongoDB,我应该创建多个客户端或集合吗?如果我只是用joblib重写它而不关心MongoDB,上面的代码是否会起作用?
您可以在单独的线程中运行请求:
from multiprocessing.dummy import Pool as ThreadPool
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client['myDB']
collection = db.myCollection
thread_pool_size = 4
pool = ThreadPool(thread_pool_size)
def my_function(el):
return collection.distinct('attrib', {'version': el}))
test_list = ['foo', 'bar']
result_list = pool.map(my_function, test_list)