Truecasing - SpaCy

问题描述 投票:0回答:1

意图是基于POS标签进行大写,我可以借助以下链接实现这一点。

How can I best determine the correct capitalization for a word?

尝试使用spacy实现类似的结果?

def truecase(doc):
    truecased_sents = [] # list of truecased sentences
    tagged_sent = token.tag_([word.lower() for token in doc])
    normalized_sent = [w.capitalize() if t in ["NN","NNS"] else w for (w,t) in tagged_sent]
    normalized_sent[0] = normalized_sent[0].capitalize()
    string = re.sub(" (?=[\.,'!?:;])", "", ' '.join(normalized_sent))
    return string

它抛出了这个错误

  tagged_sent = token.tag_([word.lower() for token in doc])
NameError: global name 'token' is not defined

如何将令牌声明为全局并解决此问题。我的方法是否正确?

python nltk spacy
1个回答
0
投票
import spacy, re
nlp = spacy.load('en_core_web_sm')
doc = nlp(u'autonomous cars shift insurance liability toward manufacturers.')
tagged_sent = [(w.text, w.tag_) for w in doc]
normalized_sent = [w.capitalize() if t in ["NN","NNS"] else w for (w,t) in tagged_sent]
normalized_sent[0] = normalized_sent[0].capitalize()
string = re.sub(" (?=[\.,'!?:;])", "", ' '.join(normalized_sent))
print string

输出:自动汽车将保险责任转移给制造商。

© www.soinside.com 2019 - 2024. All rights reserved.