对于字典使用“not in”比使用“in”更快吗？

Question

假设我们正在解决一个简单的字数统计问题。有一个列表，我们尝试使用字典来统计列表中每个单词出现的频率。

这里哪种模式更快？

book_title =  ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_count_dict = {}

图案1

for word in book_title:
    if word in word_count_dict:
        word_count_dict[word] += 1
    else:
        word_count_dict[word] = 1

模式2

for word in book_title:
    if word not in word_count_dict:
        word_count_dict[word] = 1
    else:
        word_count_dict[word] += 1

Answer 1

一个是六个，另一个是六个。它们应该大致彼此等效 - 从计算角度来看，

not

操作几乎可以忽略不计（实际上是最便宜的可能操作），并且哈希表（如字典）中的

in

操作以恒定时间运行（哈希是在那里，或者不在那里）。如果我们处理一个列表，它将以线性时间运行，但仍然介于

in

和

not in

之间。另请参阅 Python 数据结构的计算复杂性。

所以基本上，使用使你的代码更容易理解的那个。

也就是说，您是否考虑过使用

collections.Counter

，一种专门为此目的而设计的数据结构？

import collections
book_title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_counts = collections.Counter(book_title)
print(word_counts)
# Counter({'great': 2, 'the': 2, 'adventures': 2, 'of': 2, 'expectations': 1, 'sherlock': 1, 'holmes': 1, 'gasby': 1, 'hamlet': 1, 'huckleberry': 1, 'fin': 1})

如果需要，您可以将

collections.Counter

转换为

dict

，事实上

collections.Counter

是

dict

的子类。它甚至有一个专门设计用于与其他计数器配合使用的

.update()

方法 - 如果您添加另一个书名，只需将其输入到

Counter

中，然后将其输入到

.update()

中即可。

Answer 2

使用

dis

，您可以查看为每个方法生成的字节码。

我可以看到通过反汇编程序运行代码的唯一区别就在

in

和

not in

，其中字节码的区别是：

COMPARE_OP 7 (not in)

或者

COMPARE_OP 6 (in)

然后

POP_JUMP_IF_FALSE

（即针对这种情况继续执行下一条指令）

总而言之，无论比较返回 true 或 false，这两种方法似乎都具有相同数量的指令，因此很可能执行速度相同。

但是，可能存在一些更接近 CPU 指令的底层优化，这会导致一种或另一种方法更快，但我认为该领域超出了这个问题的范围。如果是这种情况，那么我相信对较大列表执行时间的简单测量将证明哪个更快。

Python 字节码下两条指令的执行速度可能因 Python 版本、构建、操作系统或体系结构而异。您可能能够对 Python 源代码进行一些小的更改，以使一条或另一条指令执行得更快。

Answer 3

它们的成本大致相同。将

not in

运算符视为首先应用

in

运算符，然后将逻辑

not

应用于该结果（几乎可以忽略不计）。

为了确认，您可以执行以下一个小实验来测量执行时间

from time import time
book_title =  ['hi']*100000+['there']*10000
word_count_dict1 = {}
word_count_dict2 = {}
start = time()
for word in book_title:
    if word in word_count_dict1:
        word_count_dict1[word] += 1
    else:
        word_count_dict1[word] = 1
print(time()-start)
start = time()
for word in book_title:
    if word not in word_count_dict2:
        word_count_dict2[word] = 1
    else:
        word_count_dict2[word] += 1
print(time()-start)

输出（可能因您而异）

0.021044015884399414
0.02713179588317871

Answer 4

基本上，它们的成本是相同的。来自Python 3参考

运算符
not in
被定义为具有
in
的逆真值。

Answer 5

您可以检查两种模式的操作时间并自行比较。

import timeit

def pattern1():
  title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
  counts = {}
  for word in title:
    if word in counts:
      counts[word] += 1
    else:
      counts[word] = 1

def pattern2():
  title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
  counts = {}
  for word in title:
    if word not in counts:
      counts[word] = 1
    else:
      counts[word] += 1

sample1 = [timeit.timeit(pattern1, number=10000) for _ in range(10)]
sample2 = [timeit.timeit(pattern2, number=10000) for _ in range(10)]

print(sum(sample1) / len(sample1))
# 0.01713230140048836
print(sum(sample2) / len(sample2))
# 0.017954919600015273

正如我们所见，差异可以忽略不计。

对于字典使用“not in”比使用“in”更快吗？

问题描述投票：0回答：5

5个回答

最新问题

对于字典使用“not in”比使用“in”更快吗？

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5