我有一系列数据框,每个数据框包含一个名称列和一个文本列。我想在文本中查找重复项,然后生成与重复项关联的所有名称的列表。我可以获取文本重复项的列表以及每个重复项发生的次数,但我正在努力寻找一种获取关联名称列表的方法。这是一个可重现的示例:
#two separate data frames with name/string
books1 <- data.frame(
name=rep("Ellie", 4),
book= c("Anne of Green Gables", "The Secret Garden", "Alice in Wonderland", "A Little Princess"))
books2 <- data.frame(
name=rep('Jess', 6),
book=c("Harry Potter", "Percy Jackson", "Anne of Green Gables", "Chronicles of Narnia", "Redwall", "A Little Princess"))
#combine into single data frame
books <- bind_rows(books1, books2)
#identify repeats
repeatbooks <- books %>% group_by(book) %>% summarize(n=n())
这给了我:
book n
1 A Little Princess 2
2 Alice in Wonderland 1
3 Anne of Green Gables 2
4 Chronicles of Narnia 1
5 Harry Potter 1
6 Percy Jackson 1
7 Redwall 1
8 The Secret Garden 1
我想要的是这样的:
book n name
1 A Little Princess 2 Ellie, Jess
2 Alice in Wonderland 1 Ellie
3 Anne of Green Gables 2 Ellie, Jess
我希望做这样的事情,但它会创建多行,而不是将名称分组为一行
#identify repeats while catching associated names - doesn't group into single column
repeatbooks <- books %>% group_by(book) %>% summarize(n=n(), names=c(paste0(name), ', '))
你的意思是像下面这样吗
books %>%
reframe(
n = n(),
name = toString(unique(name)),
.by = book
)
这样
book n name
1 Anne of Green Gables 2 Ellie, Jess
2 The Secret Garden 1 Ellie
3 Alice in Wonderland 1 Ellie
4 A Little Princess 2 Ellie, Jess
5 Harry Potter 1 Jess
6 Percy Jackson 1 Jess
7 Chronicles of Narnia 1 Jess
8 Redwall 1 Jess