我有一个很长的文本,它是它们的一部分
C: state name of the Company in Russian: [03_SNYuLOOO IC "Story Group".]
). - [04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow,
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B].
我需要找到这样的所有子串:
[03_SNYuLOOO IC "Story Group".]
[04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow,
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B]
我试着用
re.findall(r'^\[\d{2}_[\s\S]+\]$', text)
但它返回空列表。我错了什么?
^
和$
锚点需要整个字符串匹配模式,[\s\S]+
尽可能多地匹配任何1+字符,抓住任何[
和]
到达字符串的末尾,所以最后的]
将匹配最右边的]
串。
您可以使用以下正则表达式:
r'\[\d{2}_[^]]+]'
细节
\[
- 文字[
\d{2}
- 两位数_
- 一个下划线[^]]+
- 除了]
之外的一个或多个字符]
- 文字]
。import re
s='''C: state name of the Company in Russian: [03_SNYuLOOO IC "Story Group".]
). - [04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow,
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B].'''
print(re.findall(r'\[\d{2}_[^]]+]', s))
# => ['[03_SNYuLOOO IC "Story Group".]', '[04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, \nul. Krasnobogatyrskaya, 2, is built.\n2, floor 3. com. 11. Office B]']