如何将字符串拆分成句子,同时包括标点符号?

问题描述 投票:1回答:1

我希望拆分句子包括标点符号(例如:?,!,。),如果句子末尾有双引号,我也想包含它。

我在python3中使用了re.split()函数来将我的字符串拆分成句子。但遗憾的是,结果字符串不包括标点符号,如果在句子末尾有一个双引号,它们也不包括双引号。

这是我目前的代码:

x = 'This is an example sentence. I want to include punctuation! What is wrong with my code? It makes me want to yell, "PLEASE HELP ME!"'
sentence = re.split('[\.\?\!]\s*', x)

我得到的输出是:

['This is an example sentence', 'I want to include punctuation', 'What is wrong with my code', 'It makes me want to yell, "PLEASE HELP ME', '"']
regex python-3.x string punctuation sentence
1个回答
1
投票

尝试分解一个lookbehind:

sentences = re.split('(?<=[\.\?\!])\s*', x)
print(sentences)

['This is an example sentence.', 'I want to include punctuation!',
 'What is wrong with my code?', 'It makes me want to yell, "PLEASE HELP ME!"']

当我们在我们后面看到一个标点符号时,这个正则表达式的工作就是分裂。在这种情况下,在继续输入字符串之前,我们还匹配并使用我们前面的任何空格。

这是我处理双引号问题的平庸尝试:

x = 'This is an example sentence. I want to include punctuation! "What is wrong with my code?"  It makes me want to yell, "PLEASE HELP ME!"'
sentences = re.split('((?<=[.?!]")|((?<=[.?!])(?!")))\s*', x)
print filter(None, sentences)

['This is an example sentence.', 'I want to include punctuation!',
 '"What is wrong with my code?"', 'It makes me want to yell, "PLEASE HELP ME!"']

请注意,它正确地分割了以双引号结尾的句子。

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.