计算其上方行内数据框内值的出现次数

Question

我正在尝试找到一种方法来创建一个矩阵，该矩阵对数据帧的每一行中的值进行计数。我希望它能够识别数据帧每一行中的值，并计算该值出现在行上方的所有行中的值（不是整个数据帧）的次数。

在数据帧的单行中，相同的值永远不会出现多次。

例如：

矩阵结果：

0 0 0（没有出现df值，因为上面没有行）

1 0 0（上面发生了3次，其他没发生过）

2 1 0（上面发生了3次，上面发生了2次，没发生过6次）

0 0 0（上面的行中没有发生任何df值）

1 3 1（8次发生一次，3次发生3次，6次发生一次）

Answer 1

另一个......为了好玩

out<-matrix(1,nrow = nrow(df),ncol = ncol(df))
for(i in 1:nrow(df)){
  out[i,]<-sapply(1:ncol(df),function(z) sum(unlist(df[0:(i-1),]) %in% df[i,z]))
}

out
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    0    0
[3,]    2    1    0
[4,]    0    0    0
[5,]    1    3    1

Answer 2

这是一种方式：

# convert to a vector
x = as.vector(t(as.matrix(df)))

# get counts of each unique element (in the right place)
# and add them up
res = rowSums(sapply(unique(x), function(z) {
  r = integer(length(x))
  r[x == z] = 0:(sum(x == z) - 1)
  return(r)
}))

# convert to matrix
res = matrix(res, ncol = ncol(df), byrow = T)
res
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    0    0
# [3,]    2    1    0
# [4,]    0    0    0
# [5,]    1    3    1

使用此数据：

df = read.table(text = "
a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6", header = T)

Answer 3

其他三种方法：

1）基础R：

temp <- stack(df)[c(outer(c(0,5,10), 1:5, '+')),]
temp$val2 <- with(temp, ave(values, values, FUN = seq_along)) - 1
df2 <- unstack(temp, val2 ~ ind)

这使：

2）与data.table：

library(data.table)
melt(setDT(df)[, r := .I],
     id = 'r')[order(r), val2 := rowid(value) - 1
               ][, dcast(.SD, rowid(variable) ~ variable, value.var = 'val2')
                 ][, variable := NULL][]

这给出了相同的结果。

3）与tidyverse：

library(dplyr)
library(tidyr)
df %>% 
  mutate(r = row_number()) %>% 
  gather(k, v, -4) %>% 
  arrange(r) %>% 
  group_by(v) %>% 
  mutate(v2 = row_number() - 1) %>% 
  ungroup() %>% 
  select(r, k, v2) %>% 
  spread(k, v2)

当然，这也会给出相同的结果。

Answer 4

这是另一个解决方案：

df = read.table(text = "a b c
                1 2 3
                3 4 5
                3 2 6
                7 8 9
                8 3 6", header = T)

elements = sort(unique(unlist(df)))
frequency = sapply(elements, # for each element 
                   function(element) {apply(df == element, 1, sum)}) # Sum the number of occurances per row
#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,]    1    1    1    0    0    0    0    0    0
# [2,]    0    0    1    1    1    0    0    0    0
# [3,]    0    1    1    0    0    1    0    0    0
# [4,]    0    0    0    0    0    0    1    1    1
# [5,]    0    0    1    0    0    1    0    1    0


results = df
for(i in 1:nrow(df)){
  for(j in 1:ncol(df))
    results[i,j] = sum(frequency[1:i-1, # Sum the prevoius rows occurances  
                                 which(df[i,j] == elements)]) # Of the same element
}
# a b c
# 1 0 0 0
# 2 1 0 0
# 3 2 1 0
# 4 0 0 0
# 5 1 3 1

Answer 5

我知道我们不应该以“谢谢”发表评论，但谢谢大家。我已经将Brian的回复标记为最有用，因为我对R来说很新，他是我可以一直遵循的例子而不需要查看任何内容。我会很高兴找到你所分享的所有其他方式和新的（对我而言）功能/方法。

计算其上方行内数据框内值的出现次数

问题描述投票：2回答：5

5个回答

最新问题

计算其上方行内数据框内值的出现次数

问题描述 投票：2回答：5

5个回答

最新问题

问题描述投票：2回答：5