Haskell 中的线性时间最长回文子串

Question

我正在尝试在 O(n) 时间内解决 Haskell 中的最长回文子串问题，而不使用像

!!

之类的索引操作。我知道有两种实现最佳时间复杂度的方法，即 Manacher 算法和 here 使用后缀树的方法。

我已经找到了这个实现，非常类似于 Manacher 算法，但不幸的是它从根本上使用索引算法来跟踪回文边界。

Manacher 的算法存储有关重叠回文的信息，以避免额外的计算，因此我认为可能可以编写一个不太复杂的纯版本，然后使用

MemoTrie

来加快速度。然而，我对我们如何在这里做到这一点有点迷失。

我找到的关于如何实现后缀树版本的信息要少得多。天真地，我希望任何使用树的东西都更适合纯函数式语言，除了用于在 O(n) 时间内构造后缀树的算法相当复杂，并且可能会利用突变/无法移植到 Haskell 的引用。

我的问题是：我们可以在 Haskell 中编写一个不使用索引或可变引用的 O(n) 解决方案来解决这个问题吗？

Answer 1

这实际上是一个非常有趣的练习。就像 Daniel 在评论中提到的那样，Manacher 算法实际上并不需要完全随机数组访问 - 在任何时候它只需要查看连续的数组元素，因此它非常适合通过似乎是称为拉链。

拉链（列表）从概念上讲是一个具有焦点的列表，可以左右移动并可以访问相邻元素。对于这种情况，我使用了https://hackage.haskell.org/package/ListZipper-1.2.0.2/docs/Data-List-Zipper.html。

现在你可能会尝试在主函数中直接遍历单个Zipper来实现算法，但是[1]你会卡在wiki文章中命令式版本中带有“break”的分支对应的步骤上。顶部函数/循环的半径发生了变化（！），因此下一次迭代不是以默认的 0 半径开始，而是以严格的正半径开始（这最终是实现为 O(n) 而不是 O(n²) 的原因)).

为了避免这个困难，我们将从定义开始

type LookAround a =
    ( Zipper a -- (l) Left pointer
    , Zipper a -- (c) Cursor
    , Zipper a -- (r) Right pointer
    , Int      -- (w) Window radius
    )

使得四元组

(l,c,r,w)

是一个半径为

、以

为中心、并带有指向左

和右

两端的指针的“回文窗口”。将

LookAround

扩展到仍然是回文的最宽窗口很简单：

longestAround :: (Eq a)
              => LookAround a
              -> Int
longestAround (l, c, r, w)
  | not (beginp l') && not (endp r')
      && cursor l' == cursor r' =
       longestAround (left l, c, right r, w+1)
  | otherwise = w
    where l' = left l; r' = right r

cursor z

返回拉链焦点；

beginp

/

endp

检查拉链是否集中在两端；

left

/

right

左右移动焦点。

现在我只解释类型定义，然后实现就根据这些类型遵循命令式算法。

newtype Cache = Cache (Zipper Int) deriving (Show)
newtype MirrorCache = MirrorCache (Zipper Int)

palindrome :: (Eq a, Show a) => LookAround a -> Cache -> Cache

reuseCache :: (Eq a)
           => Int           -- how many steps forward to reuse
           -> Zipper a      -- array
           -> Cache         -- the cache up to (& including) the current center

           -> ( Cache       -- the cache up to (before) the next center to be searched after cache has been reused
              , Zipper a    -- the next center that needs to be searched
              , Maybe Int   -- Nothing, if the cache has been used fully (and the next center is past the parent palindrome).
                            -- Just the radius that is already covered otherwise.
              )

    where 
        doStep :: Int         -- (step) current step of reusage (only to keep track when to stop)
               -> Zipper a    -- (c) current point of reusage
               -> Cache       -- the built-up cache
               -> MirrorCache -- cache at the mirrored point
               -> Either
                    (Zipper a, Cache)            -- Left indicates there's still more to go
                    (Zipper a, Maybe Int, Cache) -- Right indicates cache already used

请注意，

Cache

和

MirroredCache

都是拉链，因为它们也需要向前移动（填充时）和向后移动（重复使用时）。

reuseCache

本质上实现了命令式算法的内部循环，根据找到的回文数的

Int

大小填充尽可能多的缓存。它返回建立的缓存以及下一个回文搜索应继续的点（+半径）。

doStep

进行缓存重用的一步，或者（左）使用缓存填充单元格，或者（右）告诉父级当前单元格无法从缓存填充（要么因为我们超出了最后一个回文的末尾，要么因为维基百科中的命令式算法使用

break

) 的特殊情况。
我明确地将

doStep

类型 def 与

reuseCache

分开写出，仅作为一步（无递归），因为我最初陷入了递归地狱。

请注意，这 3 个函数都不需要任意数组访问 - 所有信息都已编码在请求的 Zipper 中。

在完全实现之前的最后一点：我只做了奇数长度的情况（主要是遵循维基百科的实现），因此为了测试，我编写了一个帮助器

oddify

将条形图和

pp :: String -> [Int]

包装器放在上面的

palindrome

周围.

import           Data.List.Zipper (Zipper, beginp, cursor, empty, emptyp, endp,
                                   fromList, insert, left, push, right, toList)

type LookAround a =
    ( Zipper a -- (l) Left pointer
    , Zipper a -- (c) Cursor
    , Zipper a -- (r) Right pointer
    , Int      -- (w) Window radius
    )

newtype Cache = Cache (Zipper Int) deriving (Show)
newtype MirrorCache = MirrorCache (Zipper Int)

palindrome :: (Eq a, Show a) => LookAround a -> Cache -> Cache
palindrome a@(_, c, r, _) (Cache cache)
  | endp r = Cache cache
  | otherwise =
      let steps = longestAround a
          (Cache cache', c', radius) = reuseCache steps c (Cache (insert (2 * steps + 1) cache))
          next = case radius of
                   Nothing -> (c', c', c', 0) 
                   (Just theRadius) -> expand theRadius (c', c', c', 0) 
       in palindrome next (Cache cache')

longestAround :: (Eq a)
              => LookAround a
              -> Int
longestAround (l, c, r, w)
  | not (beginp l') && not (endp r')
      && cursor l' == cursor r' =
       longestAround (left l, c, right r, w+1)
  | otherwise = w
    where l' = left l; r' = right r

reuseCache :: (Eq a)
           => Int           -- how many steps forward to reuse
           -> Zipper a      -- array
           -> Cache         -- the cache up to (& including) the current center

           -> ( Cache       -- the cache up to (before) the next center to be searched after cache has been reused
              , Zipper a    -- the next center that needs to be searched
              , Maybe Int   -- Nothing, if the cache has been used fully (and the next center is past the parent palindrome).
                            -- Just the radius that is already covered otherwise.
              )
reuseCache steps arr (Cache startCache) = go 1 (right arr) (Cache $ right startCache) (MirrorCache $ left startCache) where

        go step a c m =
            case doStep step a c m of
              Left (a', c')     -> go (step + 1) a' c' (left' m)
              Right (a', r, c') -> (c', a', r)

        doStep :: Int         -- (step) current step of reusage (only to keep track when to stop)
               -> Zipper a    -- (c) current point of reusage
               -> Cache       -- the built-up cache
               -> MirrorCache -- cache at the mirrored point
               -> Either
                    (Zipper a, Cache)            -- Left indicates there's still more to go
                    (Zipper a, Maybe Int, Cache) -- Right indicates cache already used

        doStep step c (Cache cache) (MirrorCache mirror)

            -- Reused everything inside the parent palindrome range. Continue from current position.
              | step >= steps     = Right (c, Nothing, Cache cache)

            -- Only in this case we can reuse from the cache
              | not (emptyp mirror) && (cursor mirror /= (steps - step)) =
                  let ll = cursor mirror `min` (2 * (steps - step) + 1)
                   in Left (right c, Cache (push ll cache))

            -- If cursor m == steps - step, we cannot conclude anything
            -- about the palindrome length at the current position.
            -- Hence we return (even if step < steps) the known
              | otherwise               = Right (c, Just step, Cache cache)

expand :: Int -> LookAround a -> LookAround a
expand steps (l, c, r, w) = (iterateN left l, c, iterateN right r, w + steps)
    where iterateN f = (!! steps) . iterate f

left' :: MirrorCache -> MirrorCache
left' (MirrorCache m) = MirrorCache (left m)

---- I test in ghci with:
---- mapM_ print $ (pp ss) `zip` oddify ss
---- where `ss` is the test string

oddify :: String -> String
oddify (a:bs@(_:_)) = a : '|' : oddify bs
oddify xs = xs

pp :: String -> [Int]
pp string = 
    let z = fromList $ oddify string
        Cache cache = palindrome (z, z, z, 0) (Cache empty)
     in toList cache

ss :: String
--ss = "abracadabra"
ss = "yabadabadoo"

如果您有一些测试用例，我很乐意根据它们检查我的代码；我没有找到任何可用的套件。

[1] 除非您决定像链接文章中那样使用相互递归，这在我看来太复杂了。请注意，我的实现没有相互递归。

Haskell 中的线性时间最长回文子串

问题描述投票：0回答：1

1个回答

最新问题

Haskell 中的线性时间最长回文子串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1