Sentence2vec和Word2vec涉及停用词和命名实体

问题描述投票：0回答：1

我正在研究一个涉及sentence2vec的NLP项目。我假设我将使用预先训练的字嵌入将标记转换为向量，然后继续进行句子嵌入。

因为我的句子涉及：停止像不能，不会，不是等等，NLTK会减少到{ca，wo，are} + not。所以我无法减少它们，我不想将它们作为停用词删除，因为下面提到的句子应该有不同的嵌入。

我的名字是Priyank 我的名字不是Priyank

另一个重要的疑问是如何在我的句子向量中加入命名实体，例如Mark K. Hogg这样的人的名字。

python nlp word2vec sentence-similarity

1个回答

1
投票

你可以从这个list删除那些你不想成为停止词的词

# Open a file and read it into memory
file = open('words.txt')
text = file.read()

# Apply the stoplist to the text
clean = [word for word in text.split() if word not in stoplist]

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.