例如,我想查找名为
jemima
的人是否有 4 级。
我使用代码
JEMIMA((?:(?:(?!JEMIMA|DIVISION 4).|[\r\n])*?)DIVISION 4)
34
DIVISION 0
CIV-'F' HIST-'F' GEO-'F' KISW-'D' ENGL-'F' PHY-'F' CHEM-'F' BIO-'F' B/MATH-'F'
S1147/0173
20150987402
JEMIMA SILVESTER SANGAWE
F
28
DIVISION 4
CIV-'D' HIST-'F' GEO-'D' KISW-'C' ENGL-'C' PHY-'F' CHEM-'F' BIO-'D' B/MATH-'F'
S1148/0173
20150987403
但它只选择
JEMIMA SILVESTER SANGAWE
F
28
DIVISION 4
**我想选择整个区块**
S1147/0173
20150987402
JEMIMA SILVESTER SANGAWE
F
28
DIVISION 4
CIV-'D' HIST-'F' GEO-'D' KISW-'C' ENGL-'C' PHY-'F' CHEM-'F' BIO-'D' B/MATH-'F'
找到文本后请帮我提取整个段落。
检查以下正则表达式。
^(?=(?:(?!\n\n)[\S\s])*?\bJEMIMA\b)(?=(?:(?!\n\n)[\S\s])*?\bDIVISION\ 4\b).+(?:\n.+)*
此正则表达式检查文本中是否以任意顺序同时存在
JEMIMA
和 DIVISION 4
,并且可以细分如下。
^ the beginning of the string
---------------------------------------------------------
(?= look ahead to see if there is:
---------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
---------------------------------------------------------
(?! look ahead to see if there is not:
---------------------------------------------------------
\n '\n' (newline)
---------------------------------------------------------
\n '\n' (newline)
---------------------------------------------------------
) end of look-ahead
---------------------------------------------------------
[\S\s] any character of: non-whitespace (all
but \n, \r, \t, \f, and " "),
whitespace (\n, \r, \t, \f, and " ")
---------------------------------------------------------
)*? end of grouping
---------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
---------------------------------------------------------
JEMIMA 'JEMIMA'
---------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
---------------------------------------------------------
) end of look-ahead
---------------------------------------------------------
(?= look ahead to see if there is:
---------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
---------------------------------------------------------
(?! look ahead to see if there is not:
---------------------------------------------------------
\n '\n' (newline)
---------------------------------------------------------
\n '\n' (newline)
---------------------------------------------------------
) end of look-ahead
---------------------------------------------------------
[\S\s] any character of: non-whitespace (all
but \n, \r, \t, \f, and " "),
whitespace (\n, \r, \t, \f, and " ")
---------------------------------------------------------
)*? end of grouping
---------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
---------------------------------------------------------
DIVISION 4 'DIVISION 4'
---------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
---------------------------------------------------------
) end of look-ahead
---------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
---------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
---------------------------------------------------------
\n '\n' (newline)
---------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
---------------------------------------------------------
)* end of grouping
使用此正则表达式的 Python 程序示例。
import re
test_str = """
34
DIVISION 0
CIV-'F' HIST-'F' GEO-'F' KISW-'D' ENGL-'F' PHY-'F' CHEM-'F' BIO-'F' B/MATH-'F'
34
DIVISION 4
CIV-'F' HIST-'F' GEO-'F' KISW-'D' ENGL-'F' PHY-'F' CHEM-'F' BIO-'F' B/MATH-'F'
JEMIMA
S1147/0173
20150987402
JEMIMA SILVESTER SANGAWE
F
28
DIVISION 4
CIV-'D' HIST-'F' GEO-'D' KISW-'C' ENGL-'C' PHY-'F' CHEM-'F' BIO-'D' B/MATH-'F'
S1148/0173
20150987403
S1148/
0173/
JEMIMA/
20150987403 (DIVISION 4)
"""
r = r"^(?=(?:(?!\n\n)[\S\s])*?\bJEMIMA\b)(?=(?:(?!\n\n)[\S\s])*?\bDIVISION\ 4\b).+(?:\n.+)*"
rx = re.compile(r, re.MULTILINE)
print('\n\n'.join(m.group() for m in rx.finditer(test_str)))
输出:
34
DIVISION 4
CIV-'F' HIST-'F' GEO-'F' KISW-'D' ENGL-'F' PHY-'F' CHEM-'F' BIO-'F' B/MATH-'F'
JEMIMA
S1147/0173
20150987402
JEMIMA SILVESTER SANGAWE
F
28
DIVISION 4
CIV-'D' HIST-'F' GEO-'D' KISW-'C' ENGL-'C' PHY-'F' CHEM-'F' BIO-'D' B/MATH-'F'
S1148/
0173/
JEMIMA/
20150987403 (DIVISION 4)
4th DIVISION
应该在JEMIMA
之后吗?