HXT：在Haskell中使用HXT按位置选择节点？

Question

我正在尝试使用 Haskell 解析一些 XML 文件。对于这项工作，我使用 HXT 来获取有关现实世界应用程序中箭头的一些知识。所以我对箭头主题很陌生。

在 XPath（和 HaXml）中，可以按位置选择节点，比方说：

/root/a[2]/b

即使一遍又一遍地阅读文档后，我也不知道如何使用 HXT 做类似的事情。

这是我正在使用的一些示例代码：

module Main where

import Text.XML.HXT.Core

testXml :: String
testXml = unlines
    [ "<?xml version=\"1.0\"?>"
    , "<root>"
    , "    <a>"
    , "        <b>first element</b>"
    , "        <b>second element</b>"
    , "    </a>"
    , "    <a>"
    , "        <b>third element</b>"
    , "    </a>"
    , "    <a>"
    , "        <b>fourth element</b>"
    , "        <b>enough...</b>"
    , "    </a>"
    , "</root>"
    ]

selector :: ArrowXml a => a XmlTree String
selector = getChildren /> isElem >>> hasName "a" -- how to select second <a>?
                       /> isElem >>> hasName "b"
                       /> getText

main :: IO ()
main = do
    let doc = readString [] testXml
    nodes <- runX $ doc >>> selector
    mapM_ putStrLn nodes

期望的输出是：

third element

提前致谢！

Answer 1

我相信选择“/root/a[2]/b”的解决方案（第二个“a”标签内的所有“b”标签）：

selector :: ArrowXml a => Int -> a XmlTree String
selector nth =
    (getChildren /> isElem >>> hasName "a")   -- the parentheses required!
    >. (!! nth) 
    /> isElem >>> hasName "b" /> getText

（结果是

["third element"]

）。

解释：据我所知，

class (..., ArrowList a, ...) => ArrowXml a

，所以

ArrowXml a

是

ArrowList

的子类。通过

ArrowList

界面查看：

(>>.) :: a b c -> ([c] -> [d]) -> a b d
(>.) :: a b c -> ([c] -> d) -> a b d

因此

>>.

可以使用某些提升的

[c] -> [d]

选择列表的子集，并且

>.

可以使用

[c] -> d

类型的提升函数从列表中选择单个项目。因此，在选择子项并过滤标签“a”后，让我们使用

(!! nth) :: [a] -> a

。

有一件重要的事情需要注意：

infix 1 >>>
infix 5 />
infix 8 >.

（所以我很难弄清楚为什么不带括号的

>.

不能按预期工作）。因此，

getChildren /> isElem >>> hasName "a"

必须用括号括起来。

Answer 2

这只是EarlGray答案的延伸。看

>>.

和

>.

的解释！在提出问题后，我认识到我需要以一种特殊且确定性的方式遍历这棵树。这就是我针对我的具体问题使用的解决方案。对于其他人尝试完成同样的事情的情况，我想分享示例代码。

假设我们要提取第一个

<a>

和第二个

<b>

的文本。并非所有

<a>

元素都至少有两个

<b>

，因此 EarlGray 的代码将会退出，因为您无法使用

(!!)

函数（空列表！）。

看看

Control.Arrow.ArrowList

中的函数 single，它仅使用列表箭头的第一个结果：

single :: ArrowList a => a b c -> a b c
single f = f >>. take 1

我们想要提取第 n 个元素：

junction :: ArrowList a => a b c -> Int -> a b c
junction a nth = a >>. (take 1 . drop (nth - 1))

现在我们可以使用这个新箭头来构建选择器。有必要在我们要使用

junction

过滤的内容周围使用括号，因为

junction

会修改现有箭头。

selector :: ArrowXml a => a XmlTree String
selector = getChildren -- There is only one root element.
         -- For each selected element: Get a list of all children and filter them out.
         -- The junction function now selects at most one element.
         >>> (getChildren >>> isElem >>> hasName "a") `junction` 1 -- selects first <a>
         -- The same thing to select the second <b> for all the <a>s
         -- (But we had selected only one <a> in this case!
         -- Imagine commenting out the `junction` 1 above.)
         >>> (getChildren >>> isElem >>> hasName "b") `junction` 2 -- selects second <b>
         -- Now get the text of the element.
         >>> getChildren >>> getText

提取值并返回 Maybe 值：

main :: IO ()
main = do
    let doc = readString [] testXml
    text <- listToMaybe <$> (runX $ doc >>> selector)
    print text

这会输出

Just "second element"

以及示例 XML 文件。

HXT：在Haskell中使用HXT按位置选择节点？

问题描述投票：0回答：2

2个回答

最新问题

HXT：在Haskell中使用HXT按位置选择节点？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2