通过和/或将文本分割成句子

问题描述 投票:0回答:1

文本字符串:

text = ‘Turn left and take the door between stairs and elevator. Turn right to the corridor.’

欲望输出:

splitted_sentences= [‘turn left’, ‘take the door between stairs and elevator’, ‘turn right to the corridor’]

我们如何通过 Python 将这段文本拆分成句子,如 splitted_sentences 列表中所示?

python text split nlp sentence
1个回答
2
投票

我编写的代码返回的结果接近所需的结果:

import re
from nltk.tokenize import RegexpTokenizer

text = 'Turn left and take the door between stairs and elevator. Turn right to the corridor.'
text = text.lower()
text = text.replace("and", ",")
split1 = re.split('; |[.] |[:]|, |\* |\n', text)
tokenizer = RegexpTokenizer(r'\w+')
tokens = [tokenizer.tokenize(word) for word in split1]
d = []
i = 0
for t in tokens:
    for a in t:
        if a == 'between':
            m = tokens.index(t)
while i < m:
    d.append(tokens[i])
    i +=1
d.append(tokens[m] + ['and'] + tokens[m+1])
n = m+2
while n < len(tokens):
    d.append(tokens[n])
    n +=1
print(d)
© www.soinside.com 2019 - 2024. All rights reserved.