根据公共元素的数量对数据帧进行分组[重复]

Question

这个问题在这里已有答案：

Collapse / concatenate / aggregate a column to a single comma separated string within each group 3回答

我有一个具有以下结构的数据帧（df）：

Store Item
S1    I1
S1    I2
S1    I3
S1    I4
S2    I1
S2    I2
S2    I3
S3    I1
S3    I2
S3    I3
S4    I5

我希望有一种方法可以根据商店中的常见元素获取商店的组/集群，具体如下：

Store Group Common_element_with_group
S1    1     I1,I2,I3,I4
S2    2     I1,I2,I3
S3    2     I1,I2,I3
S4    3     I5

有谁知道实现这一目标的方法，我甚至没有办法解决这个问题。

Answer 1

以下是来自aggregate的base R的选项

transform(aggregate(.~Store, df, toString), Group = cumsum(!duplicated(Item)))
#  Store           Item Group
#1    S1 I1, I2, I3, I4     1
#2    S2     I1, I2, I3     2
#3    S3     I1, I2, I3     2
#4    S4             I5     3

或者这可以用data.table完成

library(data.table)
setDT(df)[, .(Item = toString(Item)), Store][, Group := cumsum(!duplicated(Item))][]

Answer 2

使用aggregate的解决方案。

agg <- aggregate(Item ~ Store, df, paste, collapse = ", ")

然后你可以创建一个Group列

agg$Group <- seq_len(nrow(agg))

最后，更改列顺序：

agg <- agg[, c(1, 3, 2)]
agg
#  Store Group           Item
#1    S1     1 I1, I2, I3, I4
#2    S2     2     I1, I2, I3
#3    S3     3     I1, I2, I3
#4    S4     4             I5

Answer 3

你可以试试：

library(tidyverse)
d %>% 
  group_by(Store) %>% 
  summarise(Common_element_with_group=paste(Item, collapse=","),
            Group=factor(n())) %>% 
  mutate(Group=factor(Group, levels = levels(Group), labels = 1:nlevels(Group)))
# A tibble: 4 x 3
   Store Common_element_with_group  Group
  <fctr>                     <chr> <fctr>
1     S1               I1,I2,I3,I4      1
2     S2                  I1,I2,I3      2
3     S3                  I1,I2,I3      2
4     S4                        I5      3

数据：

d <- read.table(text="Store Item
S1    I1
                S1    I2
                S1    I3
                S1    I4
                S2    I1
                S2    I2
                S2    I3
                S3    I1
                S3    I2
                S3    I3
                S4    I5", header=T)

Answer 4

您可以在基础R中执行以下操作：

df <- stack(lapply(split(df, df$Store), function(x) paste0(x$Item, collapse = ",")));
df$Group <- as.numeric(factor(df$values, levels = unique(df$values)));
df;
#       values ind Group
#1 I1,I2,I3,I4  S1     1
#2    I1,I2,I3  S2     2
#3    I1,I2,I3  S3     2
#4          I5  S4     3

df <- read.table(text =
    "Store Item
S1    I1
S1    I2
S1    I3
S1    I4
S2    I1
S2    I2
S2    I3
S3    I1
S3    I2
S3    I3
S4    I5", header = T)

根据公共元素的数量对数据帧进行分组[重复]

问题描述投票：0回答：4

4个回答

最新问题

根据公共元素的数量对数据帧进行分组[重复]

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4