使用python正则表达式[duplicate]提取两个不同字符之间的子字符串

Question

这个问题在这里已有答案：

My regex is matching too much. How do I make it stop? 5个答案

我想使用python正则表达式来提取两个不同字符之间的子字符串，>和<。

这是我的示例字符串：

<h4 id="Foobar:">Foobar:</h4>
<h1 id="Monty">Python<a href="https://..."></a></h1>

我当前的正则表达式是\>(.*)\<并匹配：

Foobar
Python<a href="https://..."></a>

我的正确匹配第一个例子而不是第二个例子。我希望它返回“Python”。我错过了什么？

Answer 1

使用表达式：

(?<=>)[^<:]+(?=:?<)

(?<=>) >的正面观察。
[^<:]+匹配<或:以外的任何东西。
(?=:?<)可选结肠:和<的正面预测。

你可以尝试表达live here。

在Python中：

import re
first_string = '<h4 id="Foobar:">Foobar:</h4>'
second_string = '<h1 id="Monty">Python<a href="https://..."></a></h1>'

print(re.findall(r'(?<=>)[^<:]+(?=:?<)',first_string)[0])
print(re.findall(r'(?<=>)[^<:]+(?=:?<)',second_string)[0])

打印：

Foobar
Python

或者你可以使用表达式：

(?<=>)[a-zA-Z]+(?=\W*<)

(?<=>) >的正面观察。
[a-zA-Z]+大写和小写字母。
(?=\W*<)任何非单词字符后跟<的正面预测。

你可以测试这个表达式here。

print(re.findall(r'(?<=>)[a-zA-Z]+(?=\W*<)',first_string)[0])
print(re.findall(r'(?<=>)[a-zA-Z]+(?=\W*<)',second_string)[0])

打印：

Foobar
Python

Answer 2

你错过了*量词的贪婪 - 用.它匹配尽可能多的字符。要将此量词切换为非贪婪模式，请添加?：

\>(.*?)\<

您可以在documentation部分的*?, +?, ??中阅读更多内容。

使用python正则表达式[duplicate]提取两个不同字符之间的子字符串

问题描述投票：0回答：2

2个回答

最新问题

使用python正则表达式[duplicate]提取两个不同字符之间的子字符串

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2