Python 中的多线程列表理解并不比非线程版本更快

Question

from concurrent.futures import ThreadPoolExecutor

words = [...]  # List of ~100 words
dictionary = { # ~20K items
    'word1': 0.12,
    'word2': 0.32,
    'word3': 0.24,
    # more words...
}

def get_word_frequency(word):
    return (word, dictionary[word]) if word in dictionary else None

def search_words(words):
    with ThreadPoolExecutor() as executor:
        results = list(executor.map(get_word_frequency, words))
    return {word: freq for word, freq in results if freq is not None}

result = search_words(words)

上面的代码几乎与正常列表理解花费相同的时间。

result = [w for w in words if w in dictionary.keys()]

这里我仅展示一个字典作为示例，在实际情况中，我确实从单个 json 文件加载了近 200 个字典。集合的性能比列表好得多，但我不确定如何在上述情况下实现基于集合的理解。

Answer 1

result = [w for w in words if w in dictionary.keys()]

在这里，您正在评估每次迭代的

dictionary.keys()

。这可能会导致问题。低迭代不会有问题，但如果

words

很长，可能会导致真正的问题。

我认为最好的解决方案是将其声明为变量：

keys = dictionary.keys()
result = [w for w in words if w in keys]

这里我们声明变量一次，所以它不会每次迭代都这样做。

这应该是问题所在，请告诉我这是否正确。

Python 中的多线程列表理解并不比非线程版本更快

问题描述投票：0回答：1

1个回答

最新问题

Python 中的多线程列表理解并不比非线程版本更快

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1