按因子类别获取数据框中的最后一个值

问题描述 投票:0回答:2

我有一个这样的数据框:

a=c("A","A","A","A","B","B","C","C","C","D","D")
b=c(1,2,3,4,1,2,1,2,3,1,2)
c=c(1345,645,75,8,95,678,598,95,75,4,53)
mydf <- data.frame(a,b,c) # edit note: do _not_ use cbind inside data.frame

我的目标是在新数据框上添加一个额外的列,该列将采用列“c”的最后一个值,并考虑“a”列中的因子。更具体地说,在这个考试中,最终结果是这样的:

   a b    c   d
1  A 1 1345   0
2  A 2  645   0
3  A 3   75   0
4  A 4    8   8
5  B 1   95   0
6  B 2  678 678
7  C 1  598   0
8  C 2   95   0
9  C 3   75  75
10 D 1    4   0
11 D 2   53  53
r dataframe
2个回答
2
投票

如果你不需要你的变量都是fators,那么dplyr有一个很好的解决方案:

df <- data.frame(a = c("A","A","A","A","B","B","C","C","C","D","D"),
                 b=c(1,2,3,4,1,2,1,2,3,1,2),
                 c=c(1345,645,75,8,95,678,598,95,75,4,53),stringsAsFactors = F)    

library(dplyr)

df <- tbl_df(df)

df  %>% group_by(a)%>%
  mutate(d = ifelse(b == max(b),c[which(b == max(b))],0))



# A tibble: 11 x 4
# Groups:   a [4]
       a     b     c     d
   <chr> <dbl> <dbl> <dbl>
 1     A     1  1345     0
 2     A     2   645     0
 3     A     3    75     0
 4     A     4     8     8
 5     B     1    95     0
 6     B     2   678   678
 7     C     1   598     0
 8     C     2    95     0
 9     C     3    75    75
10     D     1     4     0
11     D     2    53    53

0
投票

使用data.table

 library(data.table) 
 df <- data.frame(a,b,c)    
 setDT(df)
 df[, idx := .N, by = a]
 df[, id := 1:.N, by = a]
 df <- df[id == idx, d := c]
 df[, c("id", "idx") := NULL]
 df[is.na(df)] <- 0

    a b    c   d
 1: A 1 1345   0
 2: A 2  645   0
 3: A 3   75   0
 4: A 4    8   8
 5: B 1   95   0
 6: B 2  678 678
 7: C 1  598   0
 8: C 2   95   0
 9: C 3   75  75
10: D 1    4   0
11: D 2   53  53
© www.soinside.com 2019 - 2024. All rights reserved.