这个列表在Haskell中的排列实现到底是做什么的？

Question

我正在研究Data.List模块中的代码，并不能完全围绕这种排列实现：

permutations            :: [a] -> [[a]]
permutations xs0        =  xs0 : perms xs0 []
  where
    perms []     _  = []
    perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is)
      where interleave    xs     r = let (_,zs) = interleave' id xs r in zs
            interleave' _ []     r = (ts, r)
            interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r
                                     in  (y:us, f (t:y:us) : zs)

有人可以详细解释这些嵌套函数如何相互连接/相互作用？

Answer 1

对于迟到的答案感到抱歉，写下来比预期花了更长的时间。

所以，首先要在这样的列表函数中最大化懒惰，有两个目标：

在检查输入列表的下一个元素之前，尽可能多地生成答案
答案本身必须是懒惰的，所以必须坚持。

现在考虑permutation功能。这里最大的懒惰意味着：

在检查输入的n!元素后，我们应该确定至少有n排列
对于这些n!排列中的每一个，第一个n元素应该仅依赖于输入的第一个n元素。

第一个条件可以正式化为

length (take (factorial n) $ permutations ([1..n] ++ undefined))) `seq` () == ()

David Benbennick将第二个条件正式化为

map (take n) (take (factorial n) $ permutations [1..]) == permutations [1..n]

结合起来，我们有

map (take n) (take (factorial n) $ permutations ([1..n] ++ undefined)) == permutations [1..n]

让我们从一些简单的案例开始。首先permutation [1..]。我们必须拥有

permutations [1..] = [1,???] : ???

我们必须拥有两个要素

permutations [1..] = [1,2,???] : [2,1,???] : ???

请注意，前两个元素的顺序没有选择，我们不能先把[2,1,...]放在首位，因为我们已经确定第一个排列必须以1开头。现在应该清楚，permutations xs的第一个元素必须等于xs本身。

现在开始实施。

首先，有两种不同的方法可以对列表进行所有排列：

选择样式：从列表中继续拾取元素，直到没有剩余 permutations [] = [[]] permutations xxs = [(y:ys) | (y,xs) <- picks xxs, ys <- permutations xs] where picks (x:xs) = (x,xs) : [(y,x:ys) | (y,ys) <- picks xs]
插入样式：在所有可能的位置插入或交错每个元素 permutations [] = [[]] permutations (x:xs) = [y | p <- permutations xs, y <- interleave p] where interleave [] = [[x]] interleave (y:ys) = (x:y:ys) : map (y:) (interleave ys)

请注意，这些都不是最懒惰的。第一种情况，这个函数做的第一件事是从整个列表中选择第一个元素，这根本不是懒惰的。在第二种情况下，我们需要尾部的排列才能进行任何排列。

首先，请注意interleave可以变得更加懒惰。 interleave yss列表的第一个元素是[x]，如果yss=[]或(x:y:ys)，如果yss=y:ys。但这两个都与x:yss相同，所以我们可以写

interleave yss = (x:yss) : interleave' yss
interleave' [] = []
interleave' (y:ys) = map (y:) (interleave ys)

Data.List中的实现继续这个想法，但使用了一些技巧。

通过mailing list discussion也许最容易。我们从David Benbennick的版本开始，这与我上面写的那个版本相同（没有延迟交错）。我们已经知道permutations xs的第一个元素应该是xs本身。所以，让我们把它放进去

permutations xxs     = xxs : permutations' xxs
permutations' []     = []
permutations' (x:xs) = tail $ concatMap interleave $ permutations xs
  where interleave = ..

对tail的呼吁当然不是很好。但是，如果我们内联permutations和interleave的定义，我们得到

permutations' (x:xs)
  = tail $ concatMap interleave $ permutations xs
  = tail $ interleave xs ++ concatMap interleave (permutations' xs)
  = tail $ (x:xs) : interleave' xs ++ concatMap interleave (permutations' xs)
  = interleave' xs ++ concatMap interleave (permutations' xs)

现在我们有

permutations xxs     = xxs : permutations' xxs
permutations' []     = []
permutations' (x:xs) = interleave' xs ++ concatMap interleave (permutations' xs)
  where
   interleave yss = (x:yss) : interleave' yss
   interleave' [] = []
   interleave' (y:ys) = map (y:) (interleave ys)

下一步是优化。一个重要的目标是消除交错中的（++）调用。这不是那么容易，因为最后一行，map (y:) (interleave ys)。我们不能立即使用将尾部作为参数传递的foldr / ShowS技巧。出路就是摆脱地图。如果我们传递参数f作为必须在结尾映射结果的函数，我们得到

permutations' (x:xs) = interleave' id xs ++ concatMap (interleave id) (permutations' xs)
  where
   interleave f yss = f (x:yss) : interleave' f yss
   interleave' f [] = []
   interleave' f (y:ys) = interleave (f . (y:)) ys

现在我们可以传递尾巴，

permutations' (x:xs) = interleave' id xs $ foldr (interleave id) [] (permutations' xs)
  where
   interleave  f yss    r = f (x:yss) : interleave' f yss r
   interleave' f []     r = r
   interleave' f (y:ys) r = interleave (f . (y:)) ys r

这开始看起来像Data.List中的那个，但它还不一样。特别是，它并不像它可能的那样懒惰。我们来试试吧：

*Main> let n = 4
*Main> map (take n) (take (factorial n) $ permutations ([1..n] ++ undefined))
[[1,2,3,4],[2,1,3,4],[2,3,1,4],[2,3,4,1]*** Exception: Prelude.undefined

哦，哦，只有第一个n元素是正确的，而不是第一个factorial n。原因是我们仍然尝试在尝试其他任何事情之前将第一个元素（上例中的1）放置在所有可能的位置。

Yitzchak Gale提出了一个解决方案。考虑将输入分成初始部分，中间元素和尾部的所有方法：

[1..n] == []    ++ 1 : [2..n]
       == [1]   ++ 2 : [3..n]
       == [1,2] ++ 3 : [4..n]

如果您之前没有看过生成这些的技巧，可以使用zip (inits xs) (tails xs)执行此操作。现在[1..n]的排列将是

[] ++ 1 : [2..n]又名。 [1..n]，或
2插入（交错）某处[1]的排列，然后是[3..n]。但是在2结束时没有插入[1]，因为我们已经在前一个要点中得到了结果。
3交织成[1,2]的排列（不是在最后），然后是[4..n]。
等等

你可以看出这是最懒惰的，因为在我们考虑用3做某事之前，我们已经给出了所有的排列，这些排列都是从[1,2]的一些排列开始的。 Yitzchak提供的代码是

permutations xs = xs : concat (zipWith newPerms (init $ tail $ tails xs)
                                                (init $ tail $ inits xs))
  where
    newPerms (t:ts) = map (++ts) . concatMap (interleave t) . permutations3
    interleave t [y]        = [[t, y]]
    interleave t ys@(y:ys') = (t:ys) : map (y:) (interleave t ys')

注意对permutations3的递归调用，它可以是一个不必极其懒惰的变体。

正如您所看到的，这比我们之前的优化要差一些。但我们可以应用一些相同的技巧。

第一步是摆脱init和tail。让我们来看看zip (init $ tail $ tails xs) (init $ tail $ inits xs)究竟是什么

*Main> let xs = [1..5] in zip (init $ tail $ tails xs) (init $ tail $ inits xs)
[([2,3,4,5],[1]),([3,4,5],[1,2]),([4,5],[1,2,3]),([5],[1,2,3,4])]

init摆脱了([],[1..n])的组合，而tail摆脱了([1..n],[])的组合。我们不想要前者，因为那会使newPerms中的模式匹配失败。后者将失败interleave。两者都很容易修复：只需为newPerms []和interleave t []添加一个案例。

permutations xs = xs : concat (zipWith newPerms (tails xs) (inits xs))
  where
    newPerms [] is = []
    newPerms (t:ts) is = map (++ts) (concatMap (interleave t) (permutations is))
    interleave t []         = []
    interleave t ys@(y:ys') = (t:ys) : map (y:) (interleave t ys')

现在我们可以尝试内联tails和inits。他们的定义是

tails xxs = xxs : case xxs of
  []     -> []
  (_:xs) -> tails xs

inits xxs = [] : case xxs of
  []     -> []
  (x:xs) -> map (x:) (inits xs)

问题是inits不是尾递归。但是既然我们要对这些内容进行排列，我们并不关心元素的顺序。所以我们可以使用累积参数，

inits' = inits'' []
  where
  inits'' is xxs = is : case xxs of
    []     -> []
    (x:xs) -> inits'' (x:is) xs

现在我们使newPerms成为xxs和这个累积参数的函数，而不是tails xxs和inits xxs。

permutations xs = xs : concat (newPerms' xs [])
  where
    newPerms' xxs is =
      newPerms xxs is :
      case xxs of
        []     -> []
        (x:xs) -> newPerms' xs (x:is)
    newPerms [] is = []
    newPerms (t:ts) is = map (++ts) (concatMap (interleave t) (permutations3 is))

将newPerms内联到newPerms'然后给出

permutations xs = xs : concat (newPerms' xs [])
  where
    newPerms' []     is = [] : []
    newPerms' (t:ts) is =
      map (++ts) (concatMap (interleave t) (permutations is)) :
      newPerms' ts (t:is)

内联和展开concat，并将最终的map (++ts)移动到interleave，

permutations xs = xs : newPerms' xs []
  where
    newPerms' []     is = []
    newPerms' (t:ts) is =
        concatMap interleave (permutations is) ++
        newPerms' ts (t:is)
      where
      interleave []     = []
      interleave (y:ys) = (t:y:ys++ts) : map (y:) (interleave ys)

最后，我们可以重新应用foldr技巧摆脱(++)：

permutations xs = xs : newPerms' xs []
  where
    newPerms' []     is = []
    newPerms' (t:ts) is =
        foldr (interleave id) (newPerms' ts (t:is)) (permutations is)
      where
      interleave f []     r = r
      interleave f (y:ys) r = f (t:y:ys++ts) : interleave (f . (y:)) ys r

等等，我说摆脱了(++)。我们摆脱了其中一个，但不是interleave中的一个。为此，我们可以看到我们总是将yys的一些尾部连接到ts。因此，我们可以展开计算(ys++ts)以及interleave的递归，并使函数interleave' f ys r返回元组(ys++ts, interleave f ys r)。这给了

permutations xs = xs : newPerms' xs []
  where
    newPerms' []     is = []
    newPerms' (t:ts) is =
        foldr interleave (newPerms' ts (t:is)) (permutations is)
      where
      interleave ys r = let (_,zs) = interleave' id ys r in zs
      interleave' f []     r = (ts,r)
      interleave' f (y:ys) r = 
        let (us,zs) = interleave' (f . (y:)) ys r
        in  (y:us, f (t:y:us) : zs)

而且你有它，Data.List.permutations在其最大的懒惰优化的荣耀。

Twan写的很棒！我（@Yitz）只会添加一些参考：

Twan开发此算法的原始电子邮件主题，由Twan联系在上面，是令人着迷的阅读。
Knuth对Vol。中满足这些标准的所有可能算法进行了分类。 4法西斯。 2秒7.2.1.2。
Twan的permutations3与Knuth的“算法P”基本相同。据Knuth所知，这种算法最早是由英国教堂钟振铃于1600年代出版的。

Answer 2

基本算法基于这样的想法：一次从列表中取一个项目，找到包括新项目的每个项目的排列，然后重复。

为了解释这看起来是什么样的，[1 ..]将意味着一个列表，其中没有值（甚至没有第一个）已被检查。它是函数的参数。结果列表如下：

[[1..]] ++
[[2,1,3..]] ++
[[3,2,1,4..], [2,3,1,4..]] ++ [[3,1,2,4..], [1,3,2,4..]]
[[4,3,2,1,5..], etc

上面的聚类反映了算法的核心思想......每一行代表一个从输入列表中取出的新项目，并添加到正在置换的项目集合中。此外，它是递归的...在每个新行上，它采用所有现有的排列，并将项放置在它尚未存在的每个位置（除了最后一个之外的所有位置）。所以，在第三行，我们有两个排列[2,1]和[1,2]，然后我们在两个可用的时隙中发生3，所以[[3,2,1]，[2,3，分别为1]和[[3,1,2]，[1,3,2]]，然后附加任何未观察到的部分。

希望这至少可以澄清算法。但是，有一些优化和实现细节需要解释。

（旁注：使用了两个中央性能优化：首先，如果你想重复将一些项目添加到多个列表中，map (x:y:z:) list比匹配一些条件或模式匹配快很多，因为它没有分支，只是一个计算第二，这个使用了很多，通过反复预先添加项目，从后面到前面构建列表是便宜的（并且方便）;这在一些地方使用。

该函数所做的第一件事是建立两个基本案例：首先，每个列表至少有一个排列：本身。这可以在没有任何评估的情况下返回。这可以被认为是“拿0”的情况。

外部循环是如下所示的部分：

perms (t:ts) is = <prepend_stuff_to> (perms ts (t:is))

ts是列表中“未被触及”的部分，我们还没有进行排列，甚至还没有检查，最初是整个输入序列。

t是我们将在排列之间坚持的新项目。

is是我们将置换的项目列表，然后将t置于其间，并且最初是空的。

每次我们计算上述行中的一行时，我们都会到达包含thms（thms ts（t：is））之前的项目的末尾并将递归。

第二个循环是折叠器。对于is（原始列表中当前项目之前的东西）的每个排列，它将interleaves项目放入该列表中，并将其预先添加到thunk中。

foldr interleave <thunk> (permutations is)

第三个循环是最复杂的循环之一。我们知道它假定我们的目标项目t在排列中的每个可能的散布，然后是未观察到的尾部到结果序列。它通过递归调用执行此操作，在递归调用时将置换折叠到一堆函数中，然后在返回时，它执行相当于两个小状态机来构建结果。

让我们看一个例子：interleave [<thunk>] [1,2,3] t = 4和is = [5..]

首先，由于交错调用interleave'，它会在堆栈上构建ys和fs，如下所示：

y = 1, f = id
y = 2, f = (id . (1:))
y = 3, f = ((id . (1:)) . (2:))
(the functions are conceptually the same as ([]++), ([1]++), and ([1,2]++) respectively)

然后，当我们返回时，我们返回并评估包含两个值的元组qazxsw poi。

(us, zs)是我们在我们的目标us之后加上ys的列表。

t是结果累加器，每当我们得到一个新的排列时，我们将它添加到结果列表中。

因此，为了完成该示例，zs将被评估并作为上面每个堆栈级别的结果返回。

f (t:y:us)

希望这有助于，或至少补充材料([1,2]++) (4:3:[5..]) === [1,2,4,3,5..] ([1]++) (4:2[3,5..]) === [1,4,2,3,5..] ([]++) (4:1[2,3,5..]) === [4,1,2,3,5..]。

（感谢dfeuer将其提交给IRC并讨论了几个小时）

这个列表在Haskell中的排列实现到底是做什么的？

问题描述投票：36回答：2

2个回答

最新问题

这个列表在Haskell中的排列实现到底是做什么的？

问题描述 投票：36回答：2

2个回答

最新问题

问题描述投票：36回答：2