我在 R 中有这张表:
df <- data.frame(
var1 = c("red", "red", "blue", "blue", "green", "green"),
var2 = c("canada", "usa", "usa", "canada", "canada", "france")
)
我想找出被 var1 的所有值共享的 var2 的所有值
我尝试长期这样做:
var1_categories <- unique(df$var1)
common_var2 <- lapply(var1_categories, function(cat) unique(df$var2[df$var1 == cat]))
names(common_var2) <- var1_categories
shared_var2 <- Reduce(intersect, common_var2)
result_df <- data.frame(var2 = shared_var2)
我可以在 R 中做些什么来对 var2 的所有值执行此操作吗?
例如
我正在尝试这样的事情:
result <- df %>%
group_by(var2) %>%
summarize(var1_categories = list(unique(var1))) %>%
mutate(category_count = sapply(var1_categories, length)) %>%
mutate(var1_categories = sapply(var1_categories, paste, collapse = ", ")) %>%
arrange(desc(category_count))
你可以试试
> names(which(colMeans(table(df) > 0) == 1))
[1] "canada"