我正在使用nltk和wordnet链接属于某些关系组的单词。例如,“停车”和“建筑”应该有一些父联系。我使用上位词但是对于某些词语没有连接。
x = wordnet.synset('parking.n.01')
y = wordnet.synset('building.n.01')
print(x._shortest_hypernym_paths(y))
print(y._shortest_hypernym_paths(x))
{Synset('parking.n.01'):0,Synset('room.n.02'):1,Synset('position.n.07'):2,Synset('relation.n.01') :3,Synset('abstraction.n.06'):4,Synset('entity.n.01'):5,Synset('ROOT'):6} {Synset('building.n.01'): 0,Synset('structure.n.01'):1,Synset('artifact.n.01'):2,Synset('whole.n.02'):3,Synset('object.n.01') ):4,Synset('physical_entity.n.01'):5,Synset('entity.n.01'):6,Synset('ROOT'):7}
在这里,连接通过'entity.n.01',它实际上是几乎所有物理名词的根。我怎样才能得到比这更好的东西?
我想得到像'停车' - >'结构' - >'建筑'这样的东西;它可以更长,但“外星人”的单词不应该在那里,例如'monkey'也会拉到实体。
找到一些有用的方式来查看可能性:
def getShortestHypernymPath(word1, word2, nulls=False):
syns1 = wordnet.synsets(word1)
syns2 = wordnet.synsets(word2)
for s1 in syns1:
for s2 in syns2:
lch = s2.lowest_common_hypernyms(s1)
if len(lch) > 0 or nulls:
print(s1, '<-->', s2, '===', lch)
nlpf.getShortestHypernymPath('parking', 'building', nulls=False)
返回:
Synset('parking.n.01')< - > Synset('building.n.01')=== [Synset('entity.n.01')] Synset('parking.n.01')< - > Synset('construction.n.01')=== [Synset('abstraction.n.06')] Synset('parking.n.01')< - > Synset('construction.n.07 ')=== [Synset('abstraction.n.06')] Synset('parking.n.01')< - > Synset('building.n.04')=== [Synset('abstract。 n.06')] Synset('parking.n.02')< - > Synset('building.n.01')=== [Synset('entity.n.01')] Synset('停车。 n.02')< - > Synset('construction.n.01')=== [Synset('act.n.02')] Synset('parking.n.02')< - > Synset( 'construction.n.07')=== [Synset('act.n.02')] Synset('parking.n.02')< - > Synset('building.n.04')=== [Synset('abstraction.n.06')] Synset('park.v.02')< - > Synset('build.v.05')=== [Synset('control.v.01') ]
所以我至少可以调解它。