我正在尝试将分区和段落组织的xml文档转换为包含分页符和换行符作为里程碑的xml文档,该文档包含页面和行元素中的页面和行。
为此,我尝试使用util:get-fragment-between。
首先将页面上的所有行转换为片段,然后将每行转换为片段。
第一步工作,但在第二步,我得到以下错误org.exist.dom.memtree.ElementImpl cannot be cast to org.exist.dom.persistent.StoredNode
,我不明白。
下面是xquery文件,下面是我试图转换的xml文件的摘录。
xquery version "3.1";
let $doc := doc($docpath)
(: Build first fragment of containing only lines on page:)
let $begp-node := $doc//tei:pb[@n="15-v"]
let $endp-node := $doc//tei:pb[@n="16-r"]
let $p-fragment := util:get-fragment-between($begp-node, $endp-node, $make-fragment, $display-root-namespace)
let $p-node := util:parse($p-fragment)
(: so far so good, print out of p-node gives me an xml document with just the text on page 15-v :)
(:下一步。在这里,我尝试为新创建的页面片段中的每一行构建一个片段:)
let $lines := $p-node//tei:lb
for $line at $pos in $lines
let $make-fragment1 := true()
let $display-root-namespace1 := true()
let $beginning-node := $line
let $ending-node := $line/following::tei:lb[1]
let $fragment := util:get-fragment-between($beginning-node, $ending-node, $make-fragment1, $display-root-namespace1)
let $node := util:parse($fragment)
return $node
我希望$ node是一个只包含行片段的新xml文档。但相反,我得到错误:
org.exist.dom.memtree.ElementImpl无法强制转换为org.exist.dom.persistent.StoredNode
以下是原始文档的摘录:
<p>
<lb ed="#L"/>dilectio <choice>
<orig>dependant</orig>
<reg>dependant</reg>
</choice> causaliter a cognitione tamen quaelibet obiecti apprehensio vel cognitio
<lb ed="#L"/>cum voluntatis libertate sufficit dilectionem causare <g ref="#slash"/> prima
probatur quia si non sequitur quod dilec
<lb ed="#L"/>tio
<lb ed="#L"/>posset poni seu elici naturaliter a voluntate seclusa omni cognitione consequens
est falsum
<pb ed="#L" n="15-v"/>
<lb ed="#L" n="1"/> quia tunc voluntas posset diligere in infinitum contra <ref>
<name ref="#Augustine">augustinum</name> in libro 8 2 10 <title ref="#deTrinitate">de
trinitate</title>
</ref> patet consequentia quia positis omnibus causis ad productionem <sic>ad productionem</sic>
alicuius effectus re
<lb ed="#L" n="2"/>quisitis
<lb ed="#L" n="3"/>omni alio secluso talis effectus posset naturaliter poni in esse <g
ref="#slash"/>2a pars probatur quia
<lb ed="#L" n="4"/>quia si sola obiecti cognitio etc sequitur quod stante iudicio vel
apprehensione alicuius
<lb ed="#L" n="5"/>obiecti sub ratione <corr>
<del rend="strikethrough">boni</del>
<add place="inLine">mali</add>
</corr> seclusa omnia existentia vel apparentia bonitatis
<lb ed="#L" n="6"/>voluntas posset tale obiectum velle vel diligere consequentia nota sed
consequens est contra <ref>
<name ref="#Aristotle">philosophum</name>
</ref> et <ref>
<name ref="#Averroes">commentatorem</name>
<lb ed="#L" n="7"/>primo <name ref="#Ethics">ethicorum</name>
</ref> quia omnia bonum appetunt
<p xml:id="pgb1q2-d1e3692">
<g ref="#pilcrow"/>primum corollarium
<lb ed="#L" n="8"/>
任何建议都非常感谢。
该算法虽然比Java代码慢3倍,但在内存中工作:
(:~ trim the XML from $nodes $start to $end
: The algorithm is
: 1) find all the ancestors of the start node - $startParents
: 2) find all the ancestors of the end node- $endParents
: 3) recursively, starting with the common top we create a new element which is a copy of the element being trimmed by
: 3.1 copying all attributes
: 3.2 there are four cases depending on the node and the start and end edge nodes of the tree
: a) left and right nodes are the same - nothing else to copy
: b) both nodes are in the node's children - trim the start one, copy the intervening children and trim the end one
: c) only the start node is in the node's children - trim this node and copy the following siblings
: d) only the end node is in the node's children - copy the preceding siblings and trim the node
: attributes (currently in the fb namespace since its not a TEI attribute) are added to trimmed nodes
: @param start - the element bounding the start of the subtree
: @param end - the element bounding the end of the subtree
:)
declare function fb:trim-node($start as node() ,$end as node()) {
let $startParents := $start/ancestor-or-self::*
let $endParents := $end/ancestor-or-self::*
let $top := $startParents[1]
return
fb:trim-node($top,subsequence($startParents,2),subsequence($endParents,2))
};
declare function fb:trim-node($node as node(), $start as node()*, $end as node()*) {
if (empty($start) and empty($end))
then $node (: leaf is untrimmed :)
else
let $startNode := $start[1]
let $endNode:= $end[1]
let $children := $node/node()
return
element {QName (namespace-uri($node), name($node))} { (: preserve the namespace :)
$node/@* , (: copy all the attributes :)
if ($startNode is $endNode) (: edge node is common :)
then fb:trim-node($startNode, subsequence($start,2),subsequence($end,2))
else
if ($startNode = $children and $endNode = $children) (: both in same subtree :)
then (fb:trim-node($startNode, subsequence($start,2),()), (: first the trimmed start node :)
(: then the siblings between start and end nodes :)
$startNode/following-sibling::node()
except $endNode/following-sibling::node()
except $endNode,
fb:trim-node($endNode, (), subsequence($end,2)) (: then the trimmed end node :)
)
else if ($startNode = $children) (: start node is in the children :)
then
( fb:trim-node($startNode, subsequence($start,2),()), (: first the trimmed start node :)
$startNode/following-sibling::node() (: then the following siblings :)
)
else if ($endNode = $children) (: end node is in the children :)
then
( $endNode/preceding-sibling::node(), (: the preceding siblings :)
fb:trim-node($endNode, (), subsequence($end,2)) (: then the trimmed end node :)
)
else ()
}
};
有四种算法的比较,包括Java,使用joewiz的原型演示应用程序:http://kitwallace.co.uk/Book/set/fragment-between/page