C# LINQ 在列表中查找重复项

问题描述 投票:0回答:13

使用 LINQ,从

List<int>
中,如何检索包含重复多次的条目及其值的列表?

linq list duplicates
13个回答
882
投票

解决问题最简单的方法是根据元素的值对元素进行分组,然后如果组中存在多个元素,则选择该组的代表。在 LINQ 中,这翻译为:

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => y.Key)
              .ToList();

如果你想知道元素重复了多少次,可以使用:

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => new { Element = y.Key, Counter = y.Count() })
              .ToList();

这将返回匿名类型的

List
,每个元素将具有属性
Element
Counter
,以检索您需要的信息。

最后,如果您正在寻找字典,您可以使用

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .ToDictionary(x => x.Key, y => y.Count());

这将返回一个字典,以您的元素作为键,并将其重复的次数作为值。


217
投票

找出可枚举是否包含任何重复

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

找出可枚举中的值是否全部唯一

var allUnique = enumerable.GroupBy(x => x.Key).All(g => g.Count() == 1);

35
投票

仅查找重复值:

var duplicates = list.GroupBy(x => x.Key).Where(g => g.Count() > 1);

例如

var list = new[] {1,2,3,1,4,2};

GroupBy
将按数字键对数字进行分组,并用它来维护计数(重复次数)。之后,我们只是检查重复多次的值。

仅查找唯一值:

var unique = list.GroupBy(x => x.Key).Where(g => g.Count() == 1);

例如

var list = new[] {1,2,3,1,4,2};

GroupBy
将按数字键对数字进行分组,并用它来维护计数(重复的次数)。之后,我们只是检查那些只重复一次的值是否是唯一的。


32
投票

另一种方法是使用

HashSet
:

var hash = new HashSet<int>();
var duplicates = list.Where(i => !hash.Add(i));

如果您想在重复项列表中使用唯一值:

var myhash = new HashSet<int>();
var mylist = new List<int>(){1,1,2,2,3,3,3,4,4,4};
var duplicates = mylist.Where(item => !myhash.Add(item)).Distinct().ToList();

这是与通用扩展方法相同的解决方案:

public static class Extensions
{
  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector, IEqualityComparer<TKey> comparer)
  {
    var hash = new HashSet<TKey>(comparer);
    return source.Where(item => !hash.Add(selector(item))).ToList();
  }

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
  {
    return source.GetDuplicates(x => x, comparer);      
  }

  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
  {
    return source.GetDuplicates(selector, null);
  }

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source)
  {
    return source.GetDuplicates(x => x, null);
  }
}

15
投票

你可以这样做:

var list = new[] {1,2,3,1,4,2};
var duplicateItems = list.Duplicates();

使用这些扩展方法:

public static class Extensions
{
    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        var grouped = source.GroupBy(selector);
        var moreThan1 = grouped.Where(i => i.IsMultiple());
        return moreThan1.SelectMany(i => i);
    }

    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source)
    {
        return source.Duplicates(i => i);
    }

    public static bool IsMultiple<T>(this IEnumerable<T> source)
    {
        var enumerator = source.GetEnumerator();
        return enumerator.MoveNext() && enumerator.MoveNext();
    }
}

在 Duplicates 方法中使用 IsMultiple() 比 Count() 更快,因为这不会迭代整个集合。


6
投票

我创建了一个扩展来响应此问题,您可以将其包含在您的项目中,我认为当您在 List 或 Linq 中搜索重复项时,这会返回大多数情况。

示例:

//Dummy class to compare in list
public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Surname { get; set; }
    public Person(int id, string name, string surname)
    {
        this.Id = id;
        this.Name = name;
        this.Surname = surname;
    }
}


//The extention static class
public static class Extention
{
    public static IEnumerable<T> getMoreThanOnceRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    { //Return only the second and next reptition
        return extList
            .GroupBy(groupProps)
            .SelectMany(z => z.Skip(1)); //Skip the first occur and return all the others that repeats
    }
    public static IEnumerable<T> getAllRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    {
        //Get All the lines that has repeating
        return extList
            .GroupBy(groupProps)
            .Where(z => z.Count() > 1) //Filter only the distinct one
            .SelectMany(z => z);//All in where has to be retuned
    }
}

//how to use it:
void DuplicateExample()
{
    //Populate List
    List<Person> PersonsLst = new List<Person>(){
    new Person(1,"Ricardo","Figueiredo"), //fist Duplicate to the example
    new Person(2,"Ana","Figueiredo"),
    new Person(3,"Ricardo","Figueiredo"),//second Duplicate to the example
    new Person(4,"Margarida","Figueiredo"),
    new Person(5,"Ricardo","Figueiredo")//third Duplicate to the example
    };

    Console.WriteLine("All:");
    PersonsLst.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All:
        1 -> Ricardo Figueiredo
        2 -> Ana Figueiredo
        3 -> Ricardo Figueiredo
        4 -> Margarida Figueiredo
        5 -> Ricardo Figueiredo
        */

    Console.WriteLine("All lines with repeated data");
    PersonsLst.getAllRepeated(z => new { z.Name, z.Surname })
        .ToList()
        .ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All lines with repeated data
        1 -> Ricardo Figueiredo
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
    Console.WriteLine("Only Repeated more than once");
    PersonsLst.getMoreThanOnceRepeated(z => new { z.Name, z.Surname })
        .ToList()
        .ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        Only Repeated more than once
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
}

3
投票

有一个答案,但我不明白为什么不起作用;

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

在这种情况下我的解决方案是这样的;

var duplicates = model.list
                    .GroupBy(s => s.SAME_ID)
                    .Where(g => g.Count() > 1).Count() > 0;
if(duplicates) {
    doSomething();
}

3
投票

另一种方法:

有重复:

bool hasAnyDuplicate = list.Count > list.Distinct().Count;

对于 重复值

List<string> duplicates = new List<string>();
duplicates.AddRange(list);
list.Distinct().ToList().ForEach(x => duplicates.Remove(x));

// for unique duplicate values:
duplicates.Distinct():

1
投票

在 MS SQL Server 中检查重复函数的完整 Linq to SQL 扩展集。不使用 .ToList() 或 IEnumerable。 这些查询在 SQL Server 中执行,而不是在内存中。。结果只能在记忆中返回。

public static class Linq2SqlExtensions {

    public class CountOfT<T> {
        public T Key { get; set; }
        public int Count { get; set; }
    }

    public static IQueryable<TKey> Duplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => s.Key);

    public static IQueryable<TSource> GetDuplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).SelectMany(s => s);

    public static IQueryable<CountOfT<TKey>> DuplicatesCounts<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(y => new CountOfT<TKey> { Key = y.Key, Count = y.Count() });

    public static IQueryable<Tuple<TKey, int>> DuplicatesCountsAsTuble<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => Tuple.Create(s.Key, s.Count()));
}

1
投票

Linq 查询:

var query = from s2 in (from s in someList group s by new { s.Column1, s.Column2 } into sg select sg) where s2.Count() > 1 select s2;

1
投票

这是不使用组的更简单方法只需获取District元素,然后迭代它们并检查它们在列表中的计数如果它们的计数> 1这意味着它出现超过1个项目,因此将其添加到Repeteditemlist

var mylist = new List<int>() { 1, 1, 2, 3, 3, 3, 4, 4, 4 };
            var distList=  mylist.Distinct().ToList();
            var Repeteditemlist = new List<int>();
            foreach (var item in distList)
            {
               if(mylist.Count(e => e == item) > 1)
                {
                    Repeteditemlist.Add(item);
                }
            }
            foreach (var item in Repeteditemlist)
            {
                Console.WriteLine(item);
            }

预期产出:

1 3 4


0
投票

所有

GroupBy
答案都是最简单的,但并不是最有效的。它们对内存性能尤其不利,因为构建大型内部集合会产生分配成本。

一个不错的替代方案是 HuBeZa 基于

HashSet.Add
的方法。它的性能更好。

如果你不关心空值,据我所知,这样的东西是最高效的(CPU 和内存):

public static IEnumerable<TProperty> Duplicates<TSource, TProperty>(
    this IEnumerable<TSource> source,
    Func<TSource, TProperty> duplicateSelector,
    IEqualityComparer<TProperty> comparer = null)
{
    comparer ??= EqualityComparer<TProperty>.Default;

    Dictionary<TProperty, int> counts = new Dictionary<TProperty, int>(comparer);

    foreach (var item in source)
    {
        TProperty property = duplicateSelector(item);
        counts.TryGetValue(property, out int count);

        switch (count)
        {
            case 0:
                counts[property] = ++count;
                break;

            case 1:
                counts[property] = ++count;
                yield return property;
                break;
        }
    }
}

这里的技巧是,一旦重复计数达到 1,就避免额外的查找成本。当然,如果您还想要每个项目的重复出现次数,您可以继续使用 count 更新字典。对于空值,您只需要一些额外的处理,仅此而已。


-2
投票

按键删除重复项

myTupleList = myTupleList.GroupBy(tuple => tuple.Item1).Select(group => group.First()).ToList();
© www.soinside.com 2019 - 2024. All rights reserved.