Python 3.3:将 XML 转换为 YAML

问题描述 投票:0回答:4

我正在尝试使用 Python 3.3 将 XML 文件转换为 YAML。 这是我的代码:

#! /etc/python3

test_filename_input = './reference-conversions/wikipedia-example.xml'
test_filename_output = 'wikipedia-example_xml_read-as-binary.yaml'

file_object = open( test_filename_input, 'rb')
data_in = file_object.read()
file_object.close()

from xml.dom.minidom import parseString
document_object = parseString( data_in)

import yaml
stream = open( test_filename_output, 'w')
yaml.dump( document_object, stream)
stream.close()

作为参考,我使用了here中的XML文件:

<person>
  <firstName>John</firstName>
  <lastName>Smith</lastName>
  <age>25</age>
  <address>
    <streetAddress>21 2nd Street</streetAddress>
    <city>New York</city>
    <state>NY</state>
    <postalCode>10021</postalCode>
  </address>
  <phoneNumbers>
    <phoneNumber type="home">212 555-1234</phoneNumber>
    <phoneNumber type="fax">646 555-4567</phoneNumber>
  </phoneNumbers>
  <gender>
    <type>male</type>
  </gender>
</person>

这应该会导致这样的结果:

---
  firstName: John
  lastName: Smith
  age: 25
  address: 
        streetAddress: 21 2nd Street
        city: New York
        state: NY
        postalCode: 10021

  phoneNumber: 
        -  
            type: home
            number: 212 555-1234
        -  
            type: fax
            number: 646 555-4567
  gender: 
        type: male

然而,结果是:

&id001 !!python/object/new:xml.dom.minidom.Document
state: !!python/tuple
- implementation: !!python/object:xml.dom.minidom.DOMImplementation {}
- _elem_info: {}
  _id_cache: {}
  _id_search_stack: null
  childNodes: !!python/object/new:xml.dom.minicompat.NodeList
    listitems:
    - &id039 !!python/object/new:xml.dom.minidom.Element
      state: !!python/tuple
      - null
      - _attrs: null
        _attrsNS: null
        childNodes: !!python/object/new:xml.dom.minicompat.NodeList
          listitems:
          - &id045 !!python/object/new:xml.dom.minidom.Text
            state: !!python/tuple
            - null
            - _data: "\n  "
              nextSibling: &id002 !!python/object/new:xml.dom.minidom.Element
                state: !!python/tuple
                - null
                - _attrs: null
                  _attrsNS: null
                  childNodes: !!python/object/new:xml.dom.minicompat.NodeList
                    listitems:
[...]

有什么想法,如何让 PyYAML 过滤掉 xml.dom.minidom 中的对象内容或使用 xml.dom.minidom 的任何替代方法吗?

谢谢!

python xml dom yaml
4个回答
8
投票

这里有一种方法可以解决 xml.dom 的问题,并提供一种方法来映射节点同时具有内容和属性或子节点的不明确情况。对于上面的示例输入,它会产生:

$ python3 yamlout.py person.xml
---
person:
  firstName: John
  lastName: Smith
  age: 25
  address:
    streetAddress: 21 2nd Street
    city: New York
    state: NY
    postalCode: 10021
  phoneNumbers:
    phoneNumber:
      _xml_node_content: 212 555-1234 
      type: home # Attribute
    phoneNumber:
      _xml_node_content: 646 555-4567 
      type: fax # Attribute
  gender:
    type: male

实现,yamlout.py:

import sys
import json
import xml.etree.ElementTree as ET

if len(sys.argv) != 2:
    sys.stderr.write("Usage: {0} <file>.xml".format(sys.argv[0]))

XML_NODE_CONTENT = '_xml_node_content'
ATTR_COMMENT = '# Attribute'
def yamlout(node, depth=0):
    if not depth:
        sys.stdout.write('---\n')
    # Nodes with both content AND nested nodes or attributes
    # have no valid yaml mapping. Add  'content' node for that case
    nodeattrs = node.attrib
    children = list(node)
    content = node.text.strip() if node.text else ''
    if content:
        if not (nodeattrs or children):
            # Write as just a name value, nothing else nested
            sys.stdout.write(
                '{indent}{tag}: {text}\n'.format(
                    indent=depth*'  ', tag=node.tag, text=content or ''))
            return
        else:
            # json.dumps for basic handling of multiline content
            nodeattrs[XML_NODE_CONTENT] = json.dumps(node.text)

    sys.stdout.write('{indent}{tag}:\n'.format(
        indent=depth*'  ', tag=node.tag))

    # Indicate difference node attributes and nested nodes
    depth += 1
    for n,v in nodeattrs.items():
        sys.stdout.write(
            '{indent}{n}: {v} {c}\n'.format(
                indent=depth*'  ', n=n, v=v,
                c=ATTR_COMMENT if n!=XML_NODE_CONTENT else ''))
    # Write nested nodes
    for child in children:
        yamlout(child, depth)

with open(sys.argv[1]) as xmlf:
    tree = ET.parse(xmlf)
    yamlout(tree.getroot())

3
投票

我找到了一个 XML 到 YAML 转换器,但我必须在第 92 行左右进行一些小的更改:

outStr = yaml.dump(out)

改为

outStr = yaml.safe_dump(out)

删除输出中的所有

!!python/unicode
标签。我已经通过 shell 命令行测试了该脚本,它工作正常;我确信这只是一个简单的翻译就可以让它在 Python 命令行中工作。

编辑

我还添加了自己的打印方法,使输出看起来更像您最初发布的内容:

def prettyPrint(node, level):
childPrint = 0
attrPrint = 0

for x in node:
    try:
        if x['attributes']:
            attrPrint = 1

            for l in range(0, level):
                sys.stdout.write("\t")

            for a in x['attributes']:
                sys.stdout.write("- %s: %s\n" % (a, x['attributes'][a]))

    except KeyError:
        try:
            if x['children']:
                childPrint = 1

                for l in range(0, level):
                    sys.stdout.write("\t")

                sys.stdout.write("%s:\n" % x['name'])
                prettyPrint(x['children'], level+1)

        except KeyError:
            pass

    finally:
        if not childPrint:
            printNextNode(x, level, attrPrint)
            attrPrint = 0

        else:
            childPrint = 0

def printNextNode(node, level, attrPrint):
    for l in range(0, level):
        sys.stdout.write("\t")

    if attrPrint:
        sys.stdout.write('  ')

    sys.stdout.write("%s: %s\n" % (node['name'], node['text']))

然后在

convertXml2Yaml
函数中调用此函数:

sys.stdout.write('%s:\n' % out['name'])
prettyPrint(out['children'], 1)

0
投票

使用 https://pypi.org/project/yaplon/ -> https://github.com/twardoch/yaplon/

xml22yaml -i "file.xml" -o "file.yaml"

但不支持带有 BOM 的 UTF-8 文件。


0
投票

通过 JSON 对我有用:

print(yaml.dump(xmltodict.parse(xml_doc))
# includes the OrderedDict type into the output

print(yaml.safe_dump(xmltodict.parse(xml_doc))
# throws an exception on not being able to represent the input

json_doc = json.dumps(xmltodict.parse(xml_doc))
print(yaml.safe_dump(json.loads(json_doc)))
# works fine
© www.soinside.com 2019 - 2024. All rights reserved.