Python 在字符串列表中搜索一组字符的最快方法是什么？ [已关闭]

Question

我正在寻找不同方法的 Python 性能比较，以挑选出包含特定字符集合的字符串。我的特殊问题是区分（但实际上不是解析）从二进制文件中提取的符号，以确定它们是否是分解的 C++ 标识符，例如

method_wrapper<Copter,&Copter::ten_hz_logging_loop>

，并且由于

{'<', '>', ':', ' '}

集中的字符在纯 C 标识符中都不合法，这似乎就像一个很好的启发式。

我找不到任何直接解决这个问题的问题（大多数是关于在字符串中搜索字符串），尽管这类似于Which is fast for searching strings regex or find() of Python?

我认为在编译了一些数字之后，分享我的测试结果以及我用来生成它们的方法会很有用，这样试图通过类似操作来提高性能的人就会从比较中受益。

Answer 1

以下是我用于测试的代码：

import timeit
import random
import string
import re

strs = [
    ''.join(random.choices(string.printable, k=random.randrange(10,10000)))
    for i in range(1000)
]
mmatch = re.compile(r'[<> :]')

def pmatch(x):
    return ('<' in x or '>' in x or ' ' in x or ':' in x)

print("in: ")
print(timeit.timeit(lambda: [('<' in x or '>' in x or ' ' in x or ':' in x) for x in strs], number=1000, globals=globals()))

print("in fn(): ")
print(timeit.timeit(lambda: [pmatch(x) for x in strs], number=1000, globals=globals()))

print("compiled re: ")
print(timeit.timeit(lambda: [mmatch.match(x) for x in strs], number=1000, globals=globals()))

print("any iterator: ")
print(timeit.timeit(lambda: [any(e in x for e in list("<> :")) for x in strs], number=1000, globals=globals()))

print("non-compiled re: ")
print(timeit.timeit(lambda: [re.match(r'[<> :]', x) for x in strs], number=1000, globals=globals()))

结果（从最快到最慢排序，但 YMMV 取决于字符串的长度和迭代次数）是：

in:
0.04299997794441879
in fn():
0.08772613597102463
compiled re:
0.12534234998747706
any iterator:
0.40786699019372463
non-compiled re:
0.43575337901711464

Python 在字符串列表中搜索一组字符的最快方法是什么？ [已关闭]

问题描述投票：0回答：1

1个回答

最新问题

Python 在字符串列表中搜索一组字符的最快方法是什么？ [已关闭]

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1