我运行模拟并创建10,000个阵容。我希望列出创建的阵容数量。例如,这里有5个阵容......
col_1 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
col_2 <- c("Jack", "Malik", "Brett", "Demetrius", "Jalen","David")
col_3 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
col_4 <- c("Katie", "Emily", "Tara", "Imani", "Molly", "Claire")
col_5 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
df <- data.frame(col_1, col_2, col_3,col_4,col_5)
我想要的输出是大约...
阵容A = col_1,col_3,col5 = 3
阵容B = col_2 = 1
阵容C = col_5 = 1
我把头撞到了调查dplyr包装解决方案的墙上。任何帮助,将不胜感激。谢谢。
这将是我的解决方案:
df_t <- df %>%
# Transpose the dataset, make sure people are sorted alphabetically
gather(lineup_number, person_name) %>% # Lineup/Person Level
arrange(lineup_number, person_name) %>% # Arrange alphabetically
group_by(lineup_number) %>%
mutate(person_order = paste0("person", row_number())) %>%
ungroup() %>%
spread(person_order, person_name) # Row: Lineup. Column: Person
df_t %>%
select(starts_with("person")) %>%
group_by_all() %>%
summarise(num_lineups = n())
这里是一个tidyverse
唯一的解决方案,我们安排所有cols,折叠,获取唯一值,转置和分组以获得计数。这种方法也为团队成员提供了帮助。
library(tidyverse)
df2 <- df %>%
arrange_all() %>%
mutate_all(funs(paste0(., collapse = ","))) %>%
distinct() %>%
t() %>%
as.data.frame %>%
mutate(col = colnames(df)) %>%
group_by(team = V1) %>%
summarise(count = n(),
lineup = paste(col, collapse = ","))
print(df2)
# A tibble: 3 x 3
team count lineup
<fct> <int> <chr>
1 Ebony,Jada,Jane,Latoya,Mary,Sandra 3 col_1,col_3,col_5
2 Jalen,David,Malik,Brett,Jack,Demetrius 1 col_2
3 Molly,Claire,Emily,Tara,Katie,Imani 1 col_4
首先,我们确保数据框的所有列中的级别匹配并剥离它们以获得数字。
(d2 <- sapply(d, function(x) as.numeric(factor(x, levels=sort(unique(unlist(d)))))))
# col_1 col_2 col_3 col_4 col_5
# [1,] 5 10 5 16 5
# [2,] 3 12 3 14 3
# [3,] 4 7 4 18 4
# [4,] 6 9 6 15 6
# [5,] 1 11 1 17 1
# [6,] 2 8 2 13 2
然后我们可以在列上应用toString
,对它们进行分解并将它们分解为因子级别;我们只想要names
,
n <- lapply(split(m <- factor(apply(d2, 2, toString)), m), names)
这实际上是结果,我们rbind
与他们的length
s。
res <- do.call(rbind, lapply(n, function(x) cbind(toString(x), length(x))))
res
# [,1] [,2]
# [1,] "col_2" "1"
# [2,] "col_4" "1"
# [3,] "col_1, col_3, col_5" "3"
最后,我们可能想给矩阵一些有意义的dimnames
。
dimnames(res) <- list(paste("Lineup", LETTERS[1:nrow(res)]), c("col", "n"))
res
# col n
# Lineup A "col_2" "1"
# Lineup B "col_4" "1"
# Lineup C "col_1, col_3, col_5" "3"
注意:如果你有超过26个阵容,你可能只想做1:nrow(res)
而不是LETTERS[1:nrow(res)]
。