我想输入一个句子,然后输出一个句子,其中困难的单词变得更简单。
我正在使用 Nltk 来标记句子和标记单词,但我在使用 WordNet 查找我想要的单词的特定含义的同义词时遇到困难。
例如:
输入: “我拒绝去捡拒绝”
也许拒绝#1是最简单的拒绝词,但拒绝#2意味着垃圾,还有更简单的词可以用在那里。
Nltk 或许能够将拒绝 #2 标记为名词,但是如何从 WordNet 获取拒绝(垃圾)的同义词?
听起来你想要基于单词词性的单词同义词(即名词、动词等)
Follows 根据词性为句子中的每个单词创建同义词。 参考资料:
代码
import nltk; nltk.download('popular')
from nltk.corpus import wordnet as wn
def get_synonyms(word, pos):
' Gets word synonyms for part of speech '
for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
for lemma in synset.lemmas():
yield lemma.name()
def pos_to_wordnet_pos(penntag, returnNone=False):
' Mapping from POS tag word wordnet pos tag '
morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
'VB':wn.VERB, 'RB':wn.ADV}
try:
return morphy_tag[penntag[:2]]
except:
return None if returnNone else ''
用法示例
# Tokenize text
text = nltk.word_tokenize("I refuse to pick up the refuse")
for word, tag in nltk.pos_tag(text):
print(f'word is {word}, POS is {tag}')
# Filter for unique synonyms not equal to word and sort.
unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))
for synonym in unique:
print('\t', synonym)
输出
注意基于 POS 的拒绝同义词的不同集合。
word is I, POS is PRP
word is refuse, POS is VBP
decline
defy
deny
pass_up
reject
resist
turn_away
turn_down
word is to, POS is TO
word is pick, POS is VB
beak
blame
break_up
clean
cull
find_fault
foot
nibble
peck
piece
pluck
plunk
word is up, POS is RP
word is the, POS is DT
word is refuse, POS is NN
food_waste
garbage
scraps
对于那些不习惯使用 Python 编码的人,您还可以在这里从 Wordnet 获取每个单词的同义词和反义词:https://wordsplayground.org/