支持特征对R中包“tm”的函数“term_stats()”的结果意味着什么?它与计数有什么不同?

问题描述 投票:0回答:1

运行以下脚本将生成结果

a <- c("Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven't found it yet, keep looking. Don't settle. As with all matters of the heart, you'll know when you find it. - Steve Jobs")
a_source <- VectorSource(a)
a_corpus <- VCorpus(a_source)
term_stats(a_corpus)
term_stats(a_corpus)

       term    count   support
    1  .         5       1
    2  to        5       1
    3  is        4       1
    4  you       4       1
    5  ,         3       1
r nlp tm
1个回答
0
投票

支持是单词出现的文档数,count是出现次数。如果做tf-idf,你需要两者。

library(tm)

txt <- c("Your work is going to fill a large part of your life, 
       and the only way to be truly satisfied is to do what you
        believe is great work. 
       And the only way to do great work is to love what you do. 
       If you haven't found it yet, keep looking. Don't settle. 
       As with all matters of the heart, you'll know when you find it. 
       - Steve Jobs")

term_stats(VCorpus(VectorSource(txt)))[1:5,]

term count support
.        5       1
to       5       1
is       4       1


#Split txt into 4 docs
txt_df <- data.frame( txt = c(
"Your work is going to fill a large part of your life, 
 and the only way to be truly satisfied is to do what you 
 believe is great work." , 
 "And the only way to do great work is to love what you do." , 
 "If you haven't found it yet, keep looking. Don't settle." , 
 "As with all matters of the heart, you'll know when you find it. - 
 Steve Jobs"))

term_stats(VCorpus(VectorSource(txt_df$txt)))[1:6,]

term count support
.        5       4
you      4       4
,        3       3
the      3       3
to       5       2
is       4       2

默认是按支持排序。

© www.soinside.com 2019 - 2024. All rights reserved.