我的数据帧df
目前看起来像这样:
cat 1 2 3 4
1 a 0 1 0 1
2 b 0 0 1 0
3 b 1 0 1 1
4 a 1 0 1 1
5 b 1 1 1 1
6 a 0 1 1 0
cat <- c("a", "b", "b", "a", "b", "a")
df = cbind(cat, data.frame(matrix(c(0, 1, 0, 1, 0,
0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1,
1, 0), nrow=6, byrow = T)))
(即第一列中的2个类别,以及每个后续列中每个类别的二进制数据)。理想情况下,我想按类别对每个列进行分组,但也要按二进制类别进行分组,最后得到如下内容:
1 a.0 2 1 1 1
2 a.1 1 2 2 2
3 b.0 0 1 0 1
4 b.1 2 1 2 2
到目前为止,我最好的尝试是:
aggregate(df[,-1], by=list(df[,1]), FUN = table)
但不幸的是,这并没有让我知道我到底发生了什么
希望这可以帮助!
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -cat) %>%
mutate(new_cat=paste(cat, value, sep="_")) %>%
group_by(new_cat, key) %>%
tally() %>%
spread(key, n) %>%
replace(., is.na(.), 0)
输出是:
new_cat X1 X2 X3 X4
1 a_0 2 1 1 1
2 a_1 1 2 2 2
3 b_0 1 2 0 1
4 b_1 2 1 3 2
样本数据:
df <- structure(list(cat = c("a", "b", "b", "a", "b", "a"), X1 = c(0L,
0L, 1L, 1L, 1L, 0L), X2 = c(1L, 0L, 0L, 0L, 1L, 1L), X3 = c(0L,
1L, 1L, 1L, 1L, 1L), X4 = c(1L, 0L, 1L, 1L, 1L, 0L)), .Names = c("cat",
"X1", "X2", "X3", "X4"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
您可以按以下方式计算数据框中的每个二进制类别:
df[df$cat == "a", -1] == 1
此示例来自a和1.该命令将返回下表:
X1 X2 X3 X4
1 FALSE TRUE FALSE TRUE
4 TRUE FALSE TRUE TRUE
6 FALSE TRUE TRUE FALSE
现在,您可以按列向该函数应用一个和来获取其中一行。在这种情况下,它返回数据帧的第a.1行:
apply(df[df$cat == "a", -1] == 1, 2, sum)
同样,您可以找到其他剩余的行。
apply(df[df$cat == "a", -1] == 0, 2, sum)
apply(df[df$cat == "a", -1] == 1, 2, sum)
apply(df[df$cat == "b", -1] == 0, 2, sum)
apply(df[df$cat == "b", -1] == 1, 2, sum)
如果你真的需要重复这个操作,可以建立一个迭代函数,在每次迭代时你根据cat的值改变感兴趣的值,即
for (val in levels(df$cat)) apply(df[df$cat == val, -1] == 1, 2, sum)
希望它有效,抱歉我的通心粉英语。
df <- structure(list(cat = c("a", "b", "b", "a", "b", "a"), X1 = c(0L,
0L, 1L, 1L, 1L, 0L), X2 = c(1L, 0L, 0L, 0L, 1L, 1L), X3 = c(0L,
1L, 1L, 1L, 1L, 1L), X4 = c(1L, 0L, 1L, 1L, 1L, 0L)), .Names = c("cat",
"X1", "X2", "X3", "X4"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
df <- split(df, df$cat) # Split by Cat
df <- lapply(seq_along(df),function(i)
{
kk<- apply(df[[i]],2,table) # Find frequency in each column
kk <- data.frame(do.call(cbind, kk)) # Combine list by column
kk$cat <- paste(names(df)[i],rownames(kk), sep = ".") # Define name of cat column
rownames(kk)<- NULL
kk
})
n_df <- do.call(rbind, df) # Combine list by row