如何将句子分成单词[复制]

问题描述 投票:0回答:2

这个问题在这里已有答案:

在r中,我目前正在处理对话的数据集。目前的数据如下:

Mike, "Hello how are you"
Sally, "Good you"

我计划最终创建这个数据的文字云,并且需要它看起来像这样:

Mike, Hello
Mike, how
Mike, are
Mike, you
Sally, good
Sally, you
r words sentence
2个回答
2
投票

也许这样使用reshape2::melt

# Sample data
df <- read.csv(text =
    'Mike, "Hello how are you"
    Sally, "Good you"', header = F)

# Split on words
lst <- strsplit(trimws(as.character(df[, 2])), "\\s");
names(lst) <- trimws(df[, 1]);

# Reshape into long dataframe 
library(reshape2);
df.long <- (melt(lst))[2:1];
#     L1 value
#1  Mike Hello
#2  Mike   how
#3  Mike   are
#4  Mike   you
#5 Sally  Good
#6 Sally   you

说明:在空白trimws上的第二列中拆分尾随/前导空格修剪(\\s)条目并存储在list中。从第一列获取list条目名称,并使用data.frame重塑为长reshape2::melt

我把它变成一个逗号分隔的data.frame直到你...


0
投票

使用标记器,例如通过tidytext::unnest_tokens

library(tidyverse)
library(tidytext)

dialogue <- read_csv(
    'Mike, "Hello how are you"
     Sally, "Good you"', 
    col_names = c('speaker', 'sentence')
)

dialogue %>% unnest_tokens(word, sentence)
#> # A tibble: 6 x 2
#>   speaker  word
#>     <chr> <chr>
#> 1    Mike hello
#> 2    Mike   how
#> 3    Mike   are
#> 4    Mike   you
#> 5   Sally  good
#> 6   Sally   you
© www.soinside.com 2019 - 2024. All rights reserved.