使用 O(1) 内存、O(n) 运行时复杂性且无双重枚举来实现 LINQ“按谓词分块”

Question

示例：假设谓词是

i == 0

。

然后

[1] -> [(1)]
[0] -> []
[1, 0] -> [(1)]
[0, 1] -> [(1)]
[0, 0] -> []
[1, 1, 0] -> [(1, 1)]
[1, 0, 1] -> [(1), (1)]
[1, 1, 0, 0, 1, 0, 1, 1, 1] -> [(1, 1), (1), (1, 1, 1)]

基本上，返回谓词为假的连续子段。

我以为这会起作用

internal static IEnumerable<IEnumerable<T>> PartitionBy<T>(this IEnumerable<T> source, Func<T, bool> condition)
{
    IEnumerator<T> mover = source.GetEnumerator();

    for (; mover.MoveNext() ; )
    {
        var chunk = mover.MoveUntil(condition);

        if (chunk.Any())
        {
            yield return chunk;
        }
    }          
}

private static IEnumerable<T> MoveUntil<T>(this IEnumerator<T> mover, Func<T, bool> condition)
{
    bool hitCondition = false;

    do
    {
        if (condition(mover.Current))
        {
            hitCondition = true;
        }
        else
        {
            yield return mover.Current;
        }
    }
    while (!hitCondition && mover.MoveNext());
}

但我看到例如 [1, 1, 0] 它将返回 [(1), (1)]。我不完全明白为什么。如果我改变的话我就能让它工作

var chunk = mover.MoveUntil(condition);

拥有

mover.MoveUntil(condition).ToList();

，但如果可能的话，我希望不必在内存中保存任何子段。

Answer 1

可以使用 LINQ 调用流式传输结果。下面的实现：

不会创建临时
```
List
```
来减少内存消耗，我认为这对于内存来说是
```
O(1)
```
，因为一次只处理一个子段。
不会有双重枚举，并且每个记录都会调用谓词一次。
对于运行时来说，这将是
```
O(n)
```
，因为就像 this 答案建议一样，
```
GroupBy
```
操作应该是
```
O(n)
```
，其他 LINQ 调用是单遍操作，所以也应该是
```
O(n)
```
。


    public static IEnumerable<IEnumerable<T>> PartitionBy<T>(this IEnumerable<T> a, Func<T, bool> predicate)
    {
        int groupNumber = 0;
        Func<bool, int?> getGroupNumber = skip =>
        {
            if (skip)
            {
                // prepare next group, we don't care if we increment more than once
                // we only want to split groups
                groupNumber++;
                // null, to be able to filter out group separators
                return null;
            }
            return groupNumber;
        };
        return a
            .Select(x => new { Value = x, GroupNumber = getGroupNumber(predicate(x))} )
            .Where(x => x.GroupNumber != null)
            .GroupBy(x => x.GroupNumber)
            .Select(g => g.Select(x => x.Value));       
    }

Answer 2

首先，我认为你想要 O(n) 作为内存复杂度，因为你的输出长度与输入成线性比例。作为函数式编程的忠实粉丝，我选择使用折叠（对应于 C# 中的 LINQ 函数

Aggregate

）。

基本上，我们从一个空的集合和一个标志开始，该标志指示下一次迭代是否必须创建一个新的子集合（我们只知道当谓词匹配时，即在上一次迭代中）。我使用包含这两个元素的元组作为累加器。为了清晰起见，我在一个单独的函数中提取了聚合的逻辑。

static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> a, Func<T, bool> predicate)
{
    // The accumulator is a tuple defined as: (collection, mustCreateNewList)
    return a.Aggregate((new List<List<T>>(), true), (acc, e) => ForEachElement(acc, e, predicate)).Item1;
}

static (List<List<T>>, bool) ForEachElement<T>((List<List<T>>, bool) acc, T e, Func<T, bool> predicate)
{
    var (collection, mustCreateNewList) = acc;

    // The predicate matches, continue to iterate!
    if (predicate(e)) return (collection, true);

    // The previous iteration requests to create a new list
    if(mustCreateNewList) collection.Add(new List<T>());

    // Add the current element to the last list
    collection[collection.Count - 1].Add(e);

    return (collection, false);
}

初始集合遍历一次 (O(n))，并且输出的长度在最坏情况下具有输入的长度 (O(n))。

调用示例：

var array = new int[] { 1, 1, 0, 0, 1, 0, 1, 1, 1 };
var result = array.Partition(i => i == 0);

Answer 3

我知道这是一个很晚的答案，但我今天需要类似的东西，我认为下面的代码应该可以满足您的需要。

没有多重枚举
O(N) 运行时复杂度（仅检查每个项目一次）
内存不是 O(1)，但如果块明显小于原始集合本身，则可以认为是 O(1)。
我希望方法尽可能简单，所以我故意避免书中任何“简洁”的 LINQ 技巧。

不言而喻，它可能可以改进为以实际 O(1) 内存占用运行。

简而言之，迭代检查每个项目，如果每个项目与谓词匹配，则通过将每个项目放入列表中来前进。一旦遇到不匹配的项目，列表就会被

yield return

ed 并更新局部变量以进行进一步检查。当然，与谓词不匹配的元素会被跳过。

我还没有广泛测试该方法，但就我到目前为止所使用的情况而言，它正在正常工作。

public static IEnumerable<IEnumerable<T>> PartitionBy<T>(
            this IEnumerable<T> source, Predicate<T> predicate)
{
    List<T> list = new List<T>();

    foreach (var item in source)
    {
        if (predicate(item))
        {
            list.Add(item);
            continue;
        }
        else
        {
            if (list.Any())
            {
                yield return list;
                list = new List<T>();
            }

            //Keep skipping items.
            continue;
        }
    }
}

使用 O(1) 内存、O(n) 运行时复杂性且无双重枚举来实现 LINQ“按谓词分块”

问题描述投票：0回答：3

3个回答

最新问题

使用 O(1) 内存、O(n) 运行时复杂性且无双重枚举来实现 LINQ“按谓词分块”

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3