如何为 ngrams 应用 nltk.pos_tag()

Question

我需要将

nltk.pos_tag()

与双字母组合一起使用，这是我的代码：

from nltk.util import ngrams
from collections import Counter
bigrams = list(ngrams(all_file_data, 2))
print(bigrams[:50])
print(Counter(bigrams).most_common(30))

输出为：

[('SUBDELAGATION', 'ON'), ('ON', 'AGENDA'), ('AGENDA', 'ITEM'), ('ITEM', '3'), ...]

如何获得 pos_tag 以及附图中的二元组频率结果？

Answer 1

试试这个：

from nltk import pos_tag, word_tokenize

from nltk.util import ngrams
from collections import Counter

text = "hello world is a common sentence. A common sentence is foo bar. A foo bar is a common ice cream."
tagged_texts = pos_tag(word_tokenize(text))

counter = Counter(ngrams(tagged_texts, 2))

counter.most_common(3)

[出]：

[((('is', 'VBZ'), ('a', 'DT')), 2),
 ((('a', 'DT'), ('common', 'JJ')), 2),
 ((('common', 'JJ'), ('sentence', 'NN')), 2),
 ((('.', '.'), ('A', 'DT')), 2),
 ((('foo', 'JJ'), ('bar', 'NN')), 2),
 ((('hello', 'JJ'), ('world', 'NN')), 1),
 ((('world', 'NN'), ('is', 'VBZ')), 1),
 ((('sentence', 'NN'), ('.', '.')), 1),
 ((('A', 'DT'), ('common', 'JJ')), 1),
 ((('sentence', 'NN'), ('is', 'VBZ')), 1),
 ((('is', 'VBZ'), ('foo', 'JJ')), 1),
 ((('bar', 'NN'), ('.', '.')), 1),
 ((('A', 'DT'), ('foo', 'JJ')), 1),
 ((('bar', 'NN'), ('is', 'VBZ')), 1),
 ((('common', 'JJ'), ('ice', 'NN')), 1),
 ((('ice', 'NN'), ('cream', 'NN')), 1),
 ((('cream', 'NN'), ('.', '.')), 1)]

如何为 ngrams 应用 nltk.pos_tag()

问题描述投票：0回答：1

1个回答

最新问题

如何为 ngrams 应用 nltk.pos_tag()

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1