仅打印与正则表达式Pandas匹配的字符串部分

问题描述 投票:0回答:1

我有400个文件,其中包含多行。我想查找特定的行并仅提取/打印其中的一部分。

我想到达此行:

Full seesion name: T27I5E8_S1_N005_V004

并且仅打印:

S1_V004

我尝试过:

for filename in os.listdir(data_directory): 
     with open(data_directory + "/" + filename) as file:
        for line in file:
            if re.search(r'([S][\d])|([V][\d]{3})', line):
                    print(line)

但它会打印出整行。我也尝试过:

  subjID = re.compile(r'([S][\d])|([V][\d]{3})')

for filename in os.listdir(data_directory): 
     with open(data_directory + "/" + filename) as file:
        for line in file:
            print(subjID.findall(line))

但是输出看起来像:

  []
[]
[]
[]
[('S1', ''), ('', 'V094')]
[]
[]
[]
[]
[]
[]
[]
[('S1', ''), ('', 'V094')]
[]
[]
[]
[]
[]
[]
[]
python regex pandas extract
1个回答
0
投票

您可以使用

for filename in os.listdir(data_directory): 
  with open(data_directory + "/" + filename) as file:
    for line in file:
      m = re.findall(r'(?<![^_])[SV]\d+(?![^_])', line)
      if len(m):
        print("_".join(m))

请参见Python demoregex demo。使用re.findall,可以找到所有匹配项,如果找到,则结果是由匹配文本组成的_串联字符串。

图案详细信息

  • [(?<![^_])-字符串或开始于_的位置]的位置>
  • [[SV]-SV
  • \d+-1个以上的数字
  • [(?![^_])-字符串或位置的末尾,紧跟着_。]]
© www.soinside.com 2019 - 2024. All rights reserved.