我有一个特殊情况,我想调整USX文件中的章节编号,这是一种特殊的XML格式。
文档:https://ubsicap.github.io/usx/index.html
以下是诗篇的前两章:
<usx version="2.0"><book code="PSA">PSA</book><chapter number="1" style="c"/><para style="s1">Book One</para><para style="s1">The Way of the Righteous and the Wicked</para><para style="p"><verse number="1" style="v"/>Blessed is the man
who walks not in the counsel of the wicked,
nor stands in the way of sinners,
nor sits in the seat of scoffers;</para><para style="p"><verse number="2" style="v"/>but his delight is in the law of the Lord,
and on his law he meditates day and night.</para><para style="p"><verse number="3" style="v"/>He is like a tree
planted by streams of water
that yields its fruit in its season,
and its leaf does not wither.
In all that he does, he prospers.</para><para style="p"><verse number="4" style="v"/>The wicked are not so,
but are like chaff that the wind drives away.</para><para style="p"><verse number="5" style="v"/>Therefore the wicked will not stand in the judgment,
nor sinners in the congregation of the righteous;</para><para style="p"><verse number="6" style="v"/>for the Lord knows the way of the righteous,
but the way of the wicked will perish.</para><chapter number="2" style="c"/><para style="s1">The Reign of the Lord’s Anointed</para><para style="p"><verse number="1" style="v"/>Why do the nations rage
and the peoples plot in vain?</para><para style="p"><verse number="2" style="v"/>The kings of the earth set themselves,
and the rulers take counsel together,
against the Lord and against his Anointed, saying,</para><para style="p"><verse number="3" style="v"/>“Let us burst their bonds apart
and cast away their cords from us.”</para><para style="p"><verse number="4" style="v"/>He who sits in the heavens laughs;
the Lord holds them in derision.</para><para style="p"><verse number="5" style="v"/>Then he will speak to them in his wrath,
and terrify them in his fury, saying,</para><para style="p"><verse number="6" style="v"/>“As for me, I have set my King
on Zion, my holy hill.”</para><para style="p"><verse number="7" style="v"/>I will tell of the decree:
The Lord said to me, “You are my Son;
today I have begotten you.</para><para style="p"><verse number="8" style="v"/>Ask of me, and I will make the nations your heritage,
and the ends of the earth your possession.</para><para style="p"><verse number="9" style="v"/>You shall break them with a rod of iron
and dash them in pieces like a potter’s vessel.”</para><para style="p"><verse number="10" style="v"/>Now therefore, O kings, be wise;
be warned, O rulers of the earth.</para><para style="p"><verse number="11" style="v"/>Serve the Lord with fear,
and rejoice with trembling.</para><para style="p"><verse number="12" style="v"/>Kiss the Son,
lest he be angry, and you perish in the way,
for his wrath is quickly kindled.
Blessed are all who take refuge in him.</para></usx>
注意章节和节节的编号在整个过程中并不匹配。为了方便起见,我选择将映射显示为图像,但我也将其格式化为 Python 字典,如下所示:
verse_map = {
'NUM': {
'12:16': '13:1',
'13:1-33': '13:+1',
},
...
'PSA': {
'1:1-2:12': 'X:X',
'3:1-9:20': 'X:+1',
'10:1': '9:22',
'10:2-18': '9:+21',
...
需要明确的是,需要进行的更改是在节数和章节数中。例如,这里是原文的一部分:
<chapter number="44" style="c"/><para style="s1">Come to Our Help</para><para style="p"><verse number="1" style="v"/>O God, we have heard with our ears,
our fathers have told us,
what deeds you performed in their days,
in the days of old:</para><para style="p"><verse number="2" style="v"/>you with your own hand drove out the nations,
but them you planted;
you afflicted the peoples,
but them you set free;</para><para style="p"><verse number="3" style="v"/>for not by their own sword did they win the land,
nor did their own arm save them,
but your right hand and your arm,
and the light of your face,
for you delighted in them.</para>
这是正确调整的版本:
<chapter number="43" style="c"/><para style="s1">Come to Our Help</para><para style="p"><verse number="2" style="v"/>O God, we have heard with our ears,
our fathers have told us,
what deeds you performed in their days,
in the days of old:</para><para style="p"><verse number="3" style="v"/>you with your own hand drove out the nations,
but them you planted;
you afflicted the peoples,
but them you set free;</para><para style="p"><verse number="4" style="v"/>for not by their own sword did they win the land,
nor did their own arm save them,
but your right hand and your arm,
and the light of your face,
for you delighted in them.</para>
为什么?以下是这些段落的映射:
'44:1-49:20': '-1:+1'
这是我到目前为止所拥有的:
for elem in tree.iter():
# print book code, chapter number and verse number
if elem.tag == 'book':
book_code = elem.get('code')
print(f"book {book_code}")
elif elem.tag == 'chapter':
chapter_number = elem.get('number')
print(f"chapter {chapter_number}")
elif elem.tag == 'verse':
verse_number = elem.get('number')
print(f"verse {verse_number}")
所以现在我只是循环遍历整个文件并取出书名、章节和诗句。
输出是这样的:
book PSA
chapter 1
verse 1
verse 2
...
chapter 2
verse 1
...
剩下的我都难住了。
以下是如何将章节编号设置为新值的示例:
import xml.etree.ElementTree as ET
from io import StringIO
import json
xml_= """\
<usx version="2.0">
<book code="PSA">PSA</book>
<chapter number="1" style="c" />
<para style="s1">Book One</para>
<para style="s1">The Way of the Righteous and the Wicked</para>
<para style="p">
<verse number="1" style="v" />Blessed is the man
who walks not in the counsel of the wicked,
nor stands in the way of sinners,
nor sits in the seat of scoffers;</para>
<para style="p">
<verse number="2" style="v" />but his delight is in the law of the Lord,
and on his law he meditates day and night.</para>
<para style="p">
<verse number="3" style="v" />He is like a tree
planted by streams of water
that yields its fruit in its season,
and its leaf does not wither.
In all that he does, he prospers.</para>
<para style="p">
<verse number="4" style="v" />The wicked are not so,
but are like chaff that the wind drives away.</para>
<para style="p">
<verse number="5" style="v" />Therefore the wicked will not stand in the judgment,
nor sinners in the congregation of the righteous;</para>
<para style="p">
<verse number="6" style="v" />for the Lord knows the way of the righteous,
but the way of the wicked will perish.</para>
<chapter number="2" style="c" />
<para style="s1">The Reign of the Lord’s Anointed</para>
<para style="p">
<verse number="1" style="v" />Why do the nations rage
and the peoples plot in vain?</para>
<para style="p">
<verse number="2" style="v" />The kings of the earth set themselves,
and the rulers take counsel together,
against the Lord and against his Anointed, saying,</para>
<para style="p">
<verse number="3" style="v" />“Let us burst their bonds apart
and cast away their cords from us.”</para>
<para style="p">
<verse number="4" style="v" />He who sits in the heavens laughs;
the Lord holds them in derision.</para>
<para style="p">
<verse number="5" style="v" />Then he will speak to them in his wrath,
and terrify them in his fury, saying,</para>
<para style="p">
<verse number="6" style="v" />“As for me, I have set my King
on Zion, my holy hill.”</para>
<para style="p">
<verse number="7" style="v" />I will tell of the decree:
The Lord said to me, “You are my Son;
today I have begotten you.</para>
<para style="p">
<verse number="8" style="v" />Ask of me, and I will make the nations your heritage,
and the ends of the earth your possession.</para>
<para style="p">
<verse number="9" style="v" />You shall break them with a rod of iron
and dash them in pieces like a potter’s vessel.”</para>
<para style="p">
<verse number="10" style="v" />Now therefore, O kings, be wise;
be warned, O rulers of the earth.</para>
<para style="p">
<verse number="11" style="v" />Serve the Lord with fear,
and rejoice with trembling.</para>
<para style="p">
<verse number="12" style="v" />Kiss the Son,
lest he be angry, and you perish in the way,
for his wrath is quickly kindled.
Blessed are all who take refuge in him.</para>
</usx>
"""
file = StringIO(xml_)
root = ET.fromstring(xml_)
chap_old_new = '{"PSA":{"1":"X", "2":"Y"}}'
p_data = json.loads(chap_old_new)
for elem in root.iter():
if elem.tag == "book":
book = elem.text
book_c = elem.get('code')
if elem.tag == "chapter":
#chap = elem.text
chap_no = elem.get('number')
# set chapter number value
new_no = str(*[v for k, v in p_data["PSA"].items() if k == chap_no])
elem.set('number', new_no)
if elem.tag == "verse":
verse = elem.text
verse_no = elem.get('number')
ET.dump(root)