re.search 和 re.match 有什么区别？

Question

Python

search()

模块中的

match()

和

re

函数有什么区别？

我读过Python 2 文档（Python 3 文档），但我似乎从来没有记住它。我一直不得不查找并重新学习它。我希望有人能用例子清楚地回答它，这样（也许）它就会留在我的脑海里。或者至少我会有一个更好的地方来回答我的问题，并且重新学习它会花费更少的时间。

Answer 1

re.match

锚定在字符串的开头。这与换行无关，所以它与在模式中使用

不同。

正如 re.match 文档所说：

如果零个或多个字符在 beginning of string 匹配正则表达式模式，返回一个相应的
MatchObject
实例。如果字符串不存在，则返回
None
匹配模式；请注意，这是不同于零长度匹配。

注意：如果你想找到匹配项字符串中的任何地方，使用
search()
相反。

re.search

搜索整个字符串，如文档所述：

扫描字符串寻找正则表达式所在的位置 pattern 产生一个匹配，并返回一个相应的
MatchObject
实例。如果没有位置返回
None
字符串匹配模式；注意这不同于寻找一个中某个点的零长度匹配字符串。

因此，如果您需要匹配字符串的开头，或者匹配整个字符串，请使用

match

。它更快。否则使用

search

.

文档有一个 specific section for

match

vs.

search

也涵盖多行字符串：

Python 提供了两种不同的原语基于常规操作表达式：
match
检查匹配 仅在字符串的开头，而
search
检查匹配 任何地方在字符串中（这是什么 Perl 默认执行）。

请注意，
match
可能不同于
search
即使使用正则表达式以
'^'
开头：
'^'
只匹配在字符串的开头，或者在
MULTILINE
模式也马上在换行符之后。 “
match
” 操作成功仅当模式在字符串的start处匹配 无论模式如何，或在开始时由可选
pos
给出的位置争论无论是否换行符在它之前。

现在，说够了。是时候看看一些示例代码了：

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

Answer 2

search

⇒ 在字符串中的任何地方找到一些东西并返回一个匹配对象。

match

⇒ 在字符串的beginning处找到一些东西并返回一个匹配对象。

Answer 3

match 比搜索快得多，所以你可以做 regex.match((.*?)word(.*?)) 而不是做 regex.search("word") 并且如果你正在与数百万一起工作，可以获得大量的性能样品。

来自@ivan_bilan 在上面接受的答案下的评论让我思考这样的hack是否真的在加速任何事情，所以让我们看看你真正会获得多少吨的性能。

我准备了以下测试套件：

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

我进行了 10 次测量（1M、2M、...、10M 字），结果如下：

如您所见，搜索模式

'python'

比匹配模式'(.*?)python(.*?)'

更快

。

Python 很聪明。避免试图变得更聪明。

Answer 4

re.search

搜索在整个字符串中搜索模式，而

re.match

不搜索模式；如果没有，它别无选择，只能在字符串的开头match它。

Answer 5

您可以参考下面的例子来了解

re.match

和re.search

的工作原理

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match

会返回

none

，但是

re.search

会返回

abc

Answer 6

区别在于，

re.match()

会误导任何习惯于

Perl、grep或sed正则表达式匹配的人，而re.search()
不会。
:-)

更清醒的是，

正如 John D. Cook 所说，re.match()

“表现得好像每个模式都有 ^ 前置。”换句话说，

re.match('pattern')

 等于

re.search('^pattern')

。所以它锚定了一个模式的左侧。但它也

不锚定模式的右侧： 仍然需要终止 $

坦率地说，鉴于上述情况，我认为

re.match()

 应该被弃用。我很想知道应该保留它的原因。

Answer 7

更短：

search
```
 扫描整个字符串。
```
match
```
 只扫描字符串的开头。
```

下面的 Ex 说：

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

Answer 8

re.match 尝试匹配字符串开头的模式

。 re.search 尝试在整个字符串中匹配模式，直到找到匹配项。

Answer 9

快速回答

re.search('test', ' test')      # returns a Truthy match object (because the search starts from any index) 

re.match('test', ' test')       # returns None (because the search start from 0 index)
re.match('test', 'test')        # returns a Truthy match object (match at 0 index)

re.search 和 re.match 有什么区别？

问题描述投票：0回答：9

9个回答

最新问题

re.search 和 re.match 有什么区别？

问题描述 投票：0回答：9

9个回答

最新问题

问题描述投票：0回答：9