在data.frame中显示重复记录并省略单个记录

问题描述 投票:1回答:4

我一直在努力解决如何在R中仅选择重复的data.frame行。对于Instance,我的data.frame是:

age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
Names=c("John","John","John", "Harry", "Paul", "Paul", "Paul", "Khan", "Khan", "Khan", "Sam", "Joe")
village <- data.frame(Names, age, height)

 Names age height
 John  18   76.1
 John  19   77.0
 John  20   78.1
 Harry  21   78.2
 Paul  22   78.8
 Paul  23   79.7
 Paul  24   79.9
 Khan  25   81.1
 Khan  26   81.2
 Khan  27   81.8
 Sam  28   82.8
 Joe  29   83.5

我希望看到如下结果:

Names age height
John  18   76.1
John  19   77.0
John  20   78.1
Paul  22   78.8
Paul  23   79.7
Paul  24   79.9
Khan  25   81.1
Khan  26   81.2
Khan  27   81.8

谢谢你的时间...

r duplicates dataframe
4个回答
3
投票

使用duplicated两次的解决方案:

village[duplicated(village$Names) | duplicated(village$Names, fromLast = TRUE), ]


   Names age height
1   John  18   76.1
2   John  19   77.0
3   John  20   78.1
5   Paul  22   78.8
6   Paul  23   79.7
7   Paul  24   79.9
8   Khan  25   81.1
9   Khan  26   81.2
10  Khan  27   81.8

使用by的替代解决方案:

village[unlist(by(seq(nrow(village)), village$Names, 
                  function(x) if(length(x)-1) x)), ]

1
投票
village[ duplicated(village),]

1
投票

我发现@Sven的答案使用了重复的“最整洁”,但你也可以通过许多其他方式做到这一点。还有两个:

  1. 使用table()和子集,方法是将列表> 1的名称与第一列中的名称相匹配: village[village$Names %in% names(which(table(village$Names) > 1)), ]
  2. 使用ave()以一种不同的方式“制表”,但以相同的方式子集: village[with(village, ave(as.numeric(Names), Names, FUN = length) > 1), ]

0
投票

我想出了一个使用嵌套sapply的解决方案:

> village_dups = 
village[unique(unlist(which(sapply(sapply(village$Names,function(x) 
which(village$Names==x)),function(y) length(y)) > 1))),]
> village_dups
   Names age height
1   John  18   76.1
2   John  19   77.0
3   John  20   78.1
5   Paul  22   78.8
6   Paul  23   79.7
7   Paul  24   79.9
8   Khan  25   81.1
9   Khan  26   81.2
10  Khan  27   81.8
© www.soinside.com 2019 - 2024. All rights reserved.