文本字符串:
text = ‘Turn left and take the door between stairs and elevator. Turn right to the corridor.’
欲望输出:
splitted_sentences= [‘turn left’, ‘take the door between stairs and elevator’, ‘turn right to the corridor’]
我们如何通过 Python 将这段文本拆分成句子,如 splitted_sentences 列表中所示?
我编写的代码返回的结果接近所需的结果:
import re
from nltk.tokenize import RegexpTokenizer
text = 'Turn left and take the door between stairs and elevator. Turn right to the corridor.'
text = text.lower()
text = text.replace("and", ",")
split1 = re.split('; |[.] |[:]|, |\* |\n', text)
tokenizer = RegexpTokenizer(r'\w+')
tokens = [tokenizer.tokenize(word) for word in split1]
d = []
i = 0
for t in tokens:
for a in t:
if a == 'between':
m = tokens.index(t)
while i < m:
d.append(tokens[i])
i +=1
d.append(tokens[m] + ['and'] + tokens[m+1])
n = m+2
while n < len(tokens):
d.append(tokens[n])
n +=1
print(d)