使用 lxml 解析 DTD 时出错

问题描述 投票:0回答:2

我正在尝试编写一个验证脚本,用于根据 NITF DTD 验证 XML,http://www.iptc.org/std/NITF/3.4/specification/dtd/nitf-3-4.dtd。基于 this post,我想出了以下简单的脚本来验证 NITF XML 文档。下面是我在运行脚本时收到的错误消息,它的描述性不强,并且很难调试。如有任何帮助,我们将不胜感激。

#!/usr/bin/env python


def main():
    from lxml import etree, objectify
    from StringIO import StringIO

    f = open('nitf_test.xml')
    xml_doc = f.read()
    f.close()

    f = open('nitf-3-4.dtd')
    dtd_doc = f.read()
    f.close()

    dtd = etree.DTD(StringIO(dtd_doc))
    tree = objectify.parse(StringIO(xml_doc))
    dtd.validate(tree)


if __name__ == '__main__':

    main()

回溯错误消息:

Traceback (most recent call last):
  File "./test_nitf_doc.py", line 23, in <module>
    main()
  File "./test_nitf_doc.py", line 16, in main
    dtd = etree.DTD(StringIO(dtd_doc))
  File "dtd.pxi", line 43, in lxml.etree.DTD.__init__ (src/lxml/lxml.etree.c:126056)
  File "dtd.pxi", line 117, in lxml.etree._parseDtdFromFilelike (src/lxml/lxml.etree.c:126727)
lxml.etree.DTDParseError: error parsing DTD

如果我改变线路:

dtd = etree.DTD(StringIO(dtd_doc))

致:

dtd = etree.DTD(dtd_doc)

我得到的错误是:

lxml.etree.DTDParseError: failed to load external entity "NULL"
python xml lxml dtd
2个回答
6
投票

我查看了

nitf-3-4.dtd
,发现它引用了一个外部模块
xhtml-ruby-1.mod
,可以通过此链接下载。它需要存在于当前目录中,以便 DTD 解析器可以加载它。

完整的工作示例(假设您手头有有效的 NITF 文档):

% wget http://www.iptc.org/std/NITF/3.4/specification/dtd/nitf-3-4.dtd
% wget http://www.iptc.org/std/NITF/3.4/specification/dtd/xhtml-ruby-1.mod

Python代码:

from lxml import etree, objectify
dtd = etree.DTD(open('nitf-3-4.dtd', 'rb'))
tree = objectify.parse(open('nitf_test.xml', 'rb'))
print dtd.validate(tree)

输出:

% python nitf_test.py
True

0
投票

我看到了类似的问题,我的 DTD 如下所示:


<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT register-documents (register-document)+>
<!ATTLIST register-documents
    xmlns CDATA #IMPLIED
    date-produced CDATA #IMPLIED
    dtd-version CDATA #IMPLIED
    produced-by (applicant | RO | ISA | IPEA | IB | DO | EO) #REQUIRED
    ro CDATA #IMPLIED
    status CDATA #IMPLIED
>
<!ENTITY % register-document SYSTEM "register-document-v1-3-1.dtd">
%register-document;

但是,

register-document-v1-3-1.dtd
在同一目录中...

这可能是一个“相对路径”问题,我这样称呼它:

epo_dtd = etree.DTD("EPO/Sample/EBR/register-documents-v1-3-1.dtd"

register-document-v1-3-1.dtd 在

EPO/Sample/EBR
内,不在 CWD 中...

© www.soinside.com 2019 - 2024. All rights reserved.