在Python中解析文本时如何找到最近的标点符号之前的子字符串

Question

在使用 Python 3.12 解析具有数百行的文本文件时，我一直在绊倒如何从具有多个括号的不同行字符串中提取子字符串，如下所示：

line1: 'group(something)'
line2: 'other(something) and group(something,something,something) and not group(something,something)'

我只对从

something

或

group()

中提取

not group()

/s 感兴趣。

我尝试定义和使用一个函数，能够处理每个

something

内部的

group()

的提取：

def find_within(search_in, search_for, until):
    return search_in[search_in.find(search_for) + len(search_for) : search_in.find(until)]

group1 = find_within(line1, 'group(', ')')        # extracting group from the first line
group2 = find_within(line2, 'and group(', ')')    # extracting group from the second line
not_group = find_within(line2, 'not group(', ')')    # extracting 'not group' from the second line

此函数可以处理从仅具有单个

group()

（如

line1

）的行中进行提取，但不能处理从具有多个项目（如

line2

）的行中的提取。它会吐出 null 相反。

我尝试通过将最后一个

.find()

替换为

.rfind()

来修改该函数，如下所示：

def find_within(search_in, search_for, until):
    return search_in[search_in.find(search_for) + len(search_for) : search_in.rfind(until)]

但是输出是这样的：

'something,something,something) and not group(something,something'

，而我期望的是

'something,something,something'

看起来

.find()

会查找括号的第一次出现，因此当我试图从字符串中间的

something

中提取

group(something)

时，

.find()

会查找结束符

group(something)

之前的字符串开头的一对括号，因此会出现

''

。

.rfind()

查找最后一次出现的括号，因此它会提取字符串中最后一个括号之前的所有内容。

有更好的方法来处理

.find()

还是我需要诉诸正则表达式？

提前谢谢您。

Answer 1

您可以使用模式来查找预期输出：

\bgroup\s*\(([^)]+)\)|and\s+group\(([^)]+)\)|and\s+not\s+group\(([^)]+)\)|\bother\(([^)]+)\)

在Python中解析文本时如何找到最近的标点符号之前的子字符串

问题描述投票：0回答：1

1个回答

最新问题

在Python中解析文本时如何找到最近的标点符号之前的子字符串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1