使用re和requests从网页中获取特定的url

Question

import requests, re

r = requests.get('example.com')
p = re.compile('\d')

print(p.match(str(r.text)))

这总是打印 None，即使 r.text 肯定包含数字，但 print(p.match('12345')) 有效。我需要对 r.text 做什么才能使其可以被 re.compile.match() 读取？转换为 str 显然是不够的。

Answer 1

这是因为

re.match

只检查字符串开头是否匹配，而

r.text

不以数字开头。

如果您想查找第一个匹配项，请使用

re.search

代替：

import requests, re

r = requests.get('https://example.com')
p = re.compile(r'\d')

print(p.search(r.text))

输出：

<re.Match object; span=(88, 89), match='8'>