使用dplyr
获得多列独立频率计数的好方法是什么?我想从一个价值表中走出来:
# A tibble: 7 x 4
a b c d
<int> <int> <int> <int>
1 1 2 1 3
2 1 2 1 3
3 2 2 5 3
4 3 2 4 3
5 3 3 2 3
6 5 3 4 3
7 5 4 2 1
到这样的频率表:
# A tibble: 5 x 5
x a_n b_n c_n d_n
<int> <int> <int> <int> <int>
1 1 2 0 2 1
2 2 1 4 2 0
3 3 2 2 0 6
4 4 0 1 2 0
5 5 2 0 1 0
我仍然试图让我的头围绕dplyr
,但似乎这是它可以做的事情。如果使用附加库更容易,那也没关系。
library(dplyr)
library(reshape2)
df %>%
melt() %>%
dcast(value ~ variable, fun.aggregate=length)
# value a b c d
# 1 1 2 0 2 1
# 2 2 1 4 2 0
# 3 3 2 2 0 6
# 4 4 0 1 2 0
# 5 5 2 0 1 0
df <- structure(list(a = c(1L, 1L, 2L, 3L, 3L, 5L, 5L), b = c(2L, 2L,
2L, 2L, 3L, 3L, 4L), c = c(1L, 1L, 5L, 4L, 2L, 4L, 2L), d = c(3L,
3L, 3L, 3L, 3L, 3L, 1L)), .Names = c("a", "b", "c", "d"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7"))
对于您在问题中提供的相同数据集,这将是另一种解决方案(base-R):
myfreq <- sapply(df, function(x) table(factor(x, levels=unique(unlist(df)), ordered=TRUE)))
输出将是:
> myfreq
# a b c d
# 1 2 0 2 1
# 2 1 4 2 0
# 3 2 2 0 6
# 5 2 0 1 0
# 4 0 1 2 0
library(tidyverse)
dt <- data.frame(a = c(1L, 1L, 2L, 3L, 3L, 5L, 5L), b = c(2L, 2L, 2L, 2L, 3L, 3L, 4L),
c = c(1L, 1L, 5L, 4L, 2L, 4L, 2L), d = c(3L, 3L, 3L, 3L, 3L, 3L, 1L))
dt2 <- dt %>%
mutate(ID = 1:n()) %>%
gather(Group, x, -ID) %>%
select(-ID) %>%
mutate(Group = paste(Group, "n", sep = "_")) %>%
count(Group, x) %>%
spread(Group, n, fill = 0L)
在基础R中使用tabulate
:
apply(df,2,function(x) tabulate(x)[min(df):max(df)])
# a b c d
#[1,] 2 0 2 1
#[2,] 1 4 2 0
#[3,] 2 2 0 6
#[4,] 0 1 2 NA
#[5,] 2 NA 1 NA