python中的正则表达式修改

问题描述 投票:0回答:4

我使用以下正则表达式模式来识别缩写。

mytext = "This is AVGs and (NMN) and most importantly GFD"
mytext= re.sub(r"\b[A-Z\.]{2,}s?\b", "_ABB", mytext)
print(mytext)

我得到如下输出。

This is _ABB and (_ABB) and most importantly _ABB

但是,我想得到输出;

This is AVGs_ABB and (NMN_ABB) and most importantly GFD_ABB

请让我知道我在哪里做错了。

python regex python-3.x
4个回答
0
投票

使用捕获组捕获您匹配的字边界之间的模式,然后在替换中使用它。第一个捕获组将以\\1的形式提供。

mytext = "This is AVGs and (NMN) and most importantly GFD"
mytext= re.sub(r"\b([A-Z\.]{2,}s?)\b", "\\1_ABB", mytext)
print(mytext)

Demo of code snippet


0
投票

试试这个,

In [1]: str = "This is AVGs and (NMN) and most importantly GFD"
In [2]: regex = "[A-Z]{2,}"
In [3]: import re
In [4]: result = re.sub(regex, "_ABB", str)
In [5]: result
Out[5]: 'This is _ABBs and (_ABB) and most importantly _ABB'

0
投票

替换时使用排除如下:

import re 
mytext = "This is AVGs and (NMN) and most importantly GFD"
mytext= re.sub(r"([A-Z]{2,})", "\\1_ABB", mytext)
print(mytext)

输出:

This is AVGs_ABB and (NMN_ABB) and most importantly GFD_ABB


0
投票

你不需要在这里使用任何捕获组,因为你想要替换整个匹配,它本身是组0.只需在替换模式中使用\g<0>,请参阅Python re docs

反向引用\g<0>替换RE匹配的整个子字符串。

看到online Python demo

import re
mytext = "This is AVGs and (NMN) and most importantly GFD"
mytext= re.sub(r"\b[A-Z.]{2,}s?\b", r"\g<0>_ABB", mytext)
print(mytext)
# => This is AVGs_ABB and (NMN_ABB) and most importantly GFD_ABB

替换现在是r"\g<0>_ABB",它用找到的匹配替换每个非重叠匹配,并将_ABB附加到它。

看到regex demo

另请注意,在字符类中,.被解析为常规.符号,而不是作为匹配任何char但是换行符的“通配符”。

© www.soinside.com 2019 - 2024. All rights reserved.