我有一张由nltk.tree.Tree制作的清单
>>>question = 'When did Beyonce start becoming popular?'
>>>questionSpacy = spacy_nlp(question)
>>>print(questionSpacy)
[Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])]
目标是制作另一棵树。我知道这是愚蠢的,但我不知道如何知道代表一个句子的树是否包含在代表另一个句子的另一个句子中。
我做了一次尝试,但没有成功。我想我没有考虑到每一个案例。有时父节点必须是array[0].label()
,有时候是array[0]
。
from nltk import Tree
class WordTree:
def __init__(self, array, parent = None):
#print("son :",array[0][i])
self.parent = []
self.children = [] # if parenthesis then it has son after "," analyse : include all elements until the next parenthesi
self.data = array
#print(array[0])
for son in array[0]:
print(type(son),son)
if type(son) is Tree:
print("sub tree creation")
self.children.append(son.label())
print("son:",son)
t = WordTree(son,son.label()) # should I verify if parent is empty ?
print("end of sub tree creation")
elif type(son) is str:
print("son creation")
self.children.append(son)
else:
print("issue?")
break # prolbem ?
当我运行t = WordTree(treeQuestion, treeQuestion[0].label())
时,我得到以下输出:
<class 'str'> When_WRB_advmod
son creation
<class 'str'> did_VBD_aux
son creation
<class 'str'> Beyonce_NNP_nsubj
son creation
<class 'nltk.tree.Tree'> (becoming_VBG_xcomp popular_JJ_acomp)
sub tree creation
son: (becoming_VBG_xcomp popular_JJ_acomp)
<class 'str'> p
son creation
<class 'str'> o
son creation
<class 'str'> p
son creation
<class 'str'> u
son creation
<class 'str'> l
son creation
<class 'str'> a
son creation
<class 'str'> r
son creation
<class 'str'> _
son creation
<class 'str'> J
son creation
<class 'str'> J
son creation
<class 'str'> _
son creation
<class 'str'> a
son creation
<class 'str'> c
son creation
<class 'str'> o
son creation
<class 'str'> m
son creation
<class 'str'> p
son creation
end of sub tree creation
<class 'str'> ?_._punct
son creation
正如你所看到的,在('becoming_VBG_xcomp', ['popular_JJ_acomp'])
中,它使用儿子的字母popular_JJ_acomp
来制作几个儿子而不是它的名字来制作一个儿子。当然这是一个错误。因此如何将nltk.tree生成的数组转换为另一棵树?
我想我已经找到了一些东西,它将nltk.tree制作的数组转换为用Python制作的树,但我还没有将它概括为一般。
from anytree import Node, RenderTree
class WordTree:
'''Tree for spaCy dependency parsing array'''
def __init__(self, array, parent = None):
"""
Construct a new 'WordTree' object.
:param array: The array contening the dependency
:param parent: The parent of the array if exists
:return: returns nothing
"""
self.parent = []
self.children = []
self.data = array
for element in array[0]:
print(type(element),element)
# we check if we got a subtree
if type(element) is Tree:
print("sub tree creation")
self.children.append(element.label())
print("son:",element)
t = WordTree([element],element.label())
print("end of sub tree creation")
# else if we have a string we create a son
elif type(element) is str:
print("son creation")
self.children.append(element)
# in other case we have a problem
else:
print("issue?")
break
事实上它适用于以下示例:
[Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])]
给予:
<class 'str'> When_WRB_advmod
son creation
<class 'str'> did_VBD_aux
son creation
<class 'str'> Beyonce_NNP_nsubj
son creation
<class 'nltk.tree.Tree'> (becoming_VBG_xcomp popular_JJ_acomp)
sub tree creation
son: (becoming_VBG_xcomp popular_JJ_acomp)
<class 'str'> popular_JJ_acomp
son creation
end of sub tree creation
<class 'str'> ?_._punct
son creation
但尝试时我没有输出:
for i,sent in enumerate(sentences):
i = WordTree(sentences, sentences[0].label())