在使用 dplyr 进行汇总时制作字符串列表

问题描述 投票:0回答:1

我有一系列数据框,每个数据框包含一个名称列和一个文本列。我想在文本中查找重复项,然后生成与重复项关联的所有名称的列表。我可以获取文本重复项的列表以及每个重复项发生的次数,但我正在努力寻找一种获取关联名称列表的方法。这是一个可重现的示例:

#two separate data frames with name/string
books1 <- data.frame(
  name=rep("Ellie", 4),
  book= c("Anne of Green Gables", "The Secret Garden", "Alice in Wonderland", "A Little Princess"))

books2 <- data.frame(
  name=rep('Jess', 6),
  book=c("Harry Potter", "Percy Jackson", "Anne of Green Gables", "Chronicles of Narnia", "Redwall", "A Little Princess"))

#combine into single data frame
books <- bind_rows(books1, books2)

#identify repeats
repeatbooks <- books %>% group_by(book) %>% summarize(n=n())

这给了我:

  book                     n
1 A Little Princess        2
2 Alice in Wonderland      1
3 Anne of Green Gables     2
4 Chronicles of Narnia     1
5 Harry Potter             1
6 Percy Jackson            1
7 Redwall                  1
8 The Secret Garden        1

我想要的是这样的:

  book                     n     name
1 A Little Princess        2     Ellie, Jess
2 Alice in Wonderland      1     Ellie
3 Anne of Green Gables     2     Ellie, Jess

我希望做这样的事情,但它会创建多行,而不是将名称分组为一行

#identify repeats while catching associated names - doesn't group into single column
repeatbooks <- books %>% group_by(book) %>% summarize(n=n(), names=c(paste0(name), ', '))
r dplyr summarize
1个回答
0
投票

你的意思是像下面这样吗

books %>%
  reframe(
    n = n(),
    name = toString(unique(name)),
    .by = book
  )

这样

                  book n        name
1 Anne of Green Gables 2 Ellie, Jess
2    The Secret Garden 1       Ellie
3  Alice in Wonderland 1       Ellie
4    A Little Princess 2 Ellie, Jess
5         Harry Potter 1        Jess
6        Percy Jackson 1        Jess
7 Chronicles of Narnia 1        Jess
8              Redwall 1        Jess
© www.soinside.com 2019 - 2024. All rights reserved.