我有400个文件,其中包含多行。我想查找特定的行并仅提取/打印其中的一部分。
我想到达此行:
Full seesion name: T27I5E8_S1_N005_V004
并且仅打印:
S1_V004
我尝试过:
for filename in os.listdir(data_directory):
with open(data_directory + "/" + filename) as file:
for line in file:
if re.search(r'([S][\d])|([V][\d]{3})', line):
print(line)
但它会打印出整行。我也尝试过:
subjID = re.compile(r'([S][\d])|([V][\d]{3})')
for filename in os.listdir(data_directory):
with open(data_directory + "/" + filename) as file:
for line in file:
print(subjID.findall(line))
但是输出看起来像:
[]
[]
[]
[]
[('S1', ''), ('', 'V094')]
[]
[]
[]
[]
[]
[]
[]
[('S1', ''), ('', 'V094')]
[]
[]
[]
[]
[]
[]
[]
您可以使用
for filename in os.listdir(data_directory):
with open(data_directory + "/" + filename) as file:
for line in file:
m = re.findall(r'(?<![^_])[SV]\d+(?![^_])', line)
if len(m):
print("_".join(m))
请参见Python demo和regex demo。使用re.findall
,可以找到所有匹配项,如果找到,则结果是由匹配文本组成的_
串联字符串。
图案详细信息
(?<![^_])
-字符串或开始于_
的位置]的位置>[SV]
-S
或V
\d+
-1个以上的数字(?![^_])
-字符串或位置的末尾,紧跟着_
。]]