我有一本书的 XML 文件,在章节中标记了人员。我正在尝试使用在 oXygen 中运行的 XQuery 3.1 来计算共现(当某些字符出现在同一章节中时),以创建一个边缘数据表以输入到网络分析中。简化的 XML 可能如下所示:
<book>
<chapter n="I">
<person>Susan</person>
<person>Victor</person>
</chapter>
<chapter n="II">
<person>Victor</person>
<person>Susan</person>
<person>Victor</person>
<person>Iago</person>
<person>Victor</person>
</chapter>
<chapter n="III">
<person>Susan</person>
<person>Iago</person>
<person>Scott</person>
<person>Susan</person>
</chapter>
</book>
我需要创建一个 csv 文件,其中列出了不同的人物对以及他们一起出现的章节数,如下所示:
From,To,Chapters
Susan,Victor,2
Susan,Iago,2
Victor,Iago,1
Susan,Iago,1
这是我到目前为止所拥有的:
declare option saxon:output "method=text";
declare variable $linefeed := " ";
concat('From,To,Chapters', $linefeed,
string-join(
let $persons := //person/string()=>distinct-values()
for $person in $persons
let $pers-chapters := //chapter[.//person/string()=$person]
for $pers-chapter in $pers-chapters
let $chap-num := $pers-chapter/data(@n)
let $fellow-occupants := $pers-chapter//person/string()=>distinct-values()
for $fellow-occupant in $fellow-occupants
where $fellow-occupant!=($person)
return concat ($person,',', $fellow-occupant, ',', $chap-num),
$linefeed))
返回:
From,To,Chapters
Susan,Victor,I
Susan,Victor,II
Susan,Iago,II
Susan,Iago,III
Susan,Scott,III
Victor,Susan,I
Victor,Susan,II
Victor,Iago,II
Iago,Susan,II
Iago,Victor,II
Iago,Susan,III
Iago,Scott,III
Scott,Susan,III
Scott,Iago,III
所以我有几个问题。我真正想做的是,首先,隔离每一对不同的人(所以如果我有苏珊,维克多,我也不想看到维克多,苏珊);我不确定如何在文本中查找并返回相同类型的不同节点对。 (您会看到我已将返回队列中的第一个人称为 $person,将第二个人称为 $fellow-resident。我不知道还能如何处理此问题。)
其次,到目前为止,我为每个共享章节创建了一个新行,因为这就是第二个 for 循环的作用。但我需要的是 XQuery 定位并计算每个不同对的章节数,并为该对输出一行以及章节数。
任何帮助将不胜感激!很长一段时间以来,我一直在努力解决这个问题,我确信我在概念层面上遗漏了一些基本的东西。
好问题!可能有几种方法可以解决这个问题。
就我个人而言,我会首先识别单个变量中的所有人:
let $distinct-people := //person => distinct-values() => sort()
这将产生 4 个名称的序列:
"Iago",
"Scott",
"Susan",
"Victor"
然后,我将构造一个不同对的序列:
let $distinct-pairs :=
for $person at $n in $distinct-people
(: select only the remaining people in the list :)
let $others := subsequence($distinct-people, $n + 1)
return
$others ! array { $person, . }
这将产生 5 个数组:
[Iago,Scott],
[Iago,Susan],
[Iago,Victor],
[Scott,Susan],
[Scott,Victor]
接下来,我将迭代这些对并计算两个字符出现的章节数,每当至少有一个章节时构建一个 CSV 行:
let $rows :=
for $pair in $distinct-pairs
let $occurrences := //chapter[person = $pair?1 and person = $pair?2]
let $count := count($occurrences)
where $count ge 1
return
string-join(($pair?1, $pair?2, $count), ",")
最后,我会返回完整的 CSV:
let $header := 'From,To,Chapters'
return
string-join(($header, $rows), $linefeed)
返回:
From,To,Chapters
Iago,Scott,1
Iago,Susan,2
Iago,Victor,1
Scott,Susan,1
Susan,Victor,2
将整个查询放在一起:
xquery version "3.1";
declare option saxon:output "method=text";
let $distinct-people := //person => distinct-values() => sort()
let $distinct-pairs :=
for $person at $n in $distinct-people
let $others := subsequence($distinct-people, $n + 1)
return
$others ! array { $person, . }
let $rows :=
for $pair in $distinct-pairs
let $occurrences := //chapter[person = $pair?1 and person = $pair?2]
let $count := count($occurrences)
where $count ge 1
return
string-join(($pair?1, $pair?2, $count), ",")
let $header := 'From,To,Chapters'
let $linefeed := " "
return
string-join(($header, $rows), $linefeed)