python中的正则表达式修改

Question

我使用以下正则表达式模式来识别缩写。

mytext = "This is AVGs and (NMN) and most importantly GFD"
mytext= re.sub(r"\b[A-Z\.]{2,}s?\b", "_ABB", mytext)
print(mytext)

我得到如下输出。

This is _ABB and (_ABB) and most importantly _ABB

但是，我想得到输出;

This is AVGs_ABB and (NMN_ABB) and most importantly GFD_ABB

请让我知道我在哪里做错了。

Answer 1

使用捕获组捕获您匹配的字边界之间的模式，然后在替换中使用它。第一个捕获组将以\\1的形式提供。

mytext = "This is AVGs and (NMN) and most importantly GFD"
mytext= re.sub(r"\b([A-Z\.]{2,}s?)\b", "\\1_ABB", mytext)
print(mytext)

Demo of code snippet

Answer 2

试试这个，

In [1]: str = "This is AVGs and (NMN) and most importantly GFD"
In [2]: regex = "[A-Z]{2,}"
In [3]: import re
In [4]: result = re.sub(regex, "_ABB", str)
In [5]: result
Out[5]: 'This is _ABBs and (_ABB) and most importantly _ABB'

Answer 3

替换时使用排除如下：

import re 
mytext = "This is AVGs and (NMN) and most importantly GFD"
mytext= re.sub(r"([A-Z]{2,})", "\\1_ABB", mytext)
print(mytext)

输出：

This is AVGs_ABB and (NMN_ABB) and most importantly GFD_ABB

Answer 4

你不需要在这里使用任何捕获组，因为你想要替换整个匹配，它本身是组0.只需在替换模式中使用\g<0>，请参阅Python re docs：

反向引用\g<0>替换RE匹配的整个子字符串。

看到online Python demo：

import re
mytext = "This is AVGs and (NMN) and most importantly GFD"
mytext= re.sub(r"\b[A-Z.]{2,}s?\b", r"\g<0>_ABB", mytext)
print(mytext)
# => This is AVGs_ABB and (NMN_ABB) and most importantly GFD_ABB

替换现在是r"\g<0>_ABB"，它用找到的匹配替换每个非重叠匹配，并将_ABB附加到它。

看到regex demo。

另请注意，在字符类中，.被解析为常规.符号，而不是作为匹配任何char但是换行符的“通配符”。

python中的正则表达式修改

问题描述投票：0回答：4

4个回答

Demo of code snippet

最新问题

python中的正则表达式修改

问题描述 投票：0回答：4

4个回答

Demo of code snippet

最新问题

问题描述投票：0回答：4