在Python中使用分离的模式将XML转换为JSON

问题描述 投票:0回答:1

我希望将传入的 XML 数据转换为 JSON,以便在 Python 中更有效地处理数据。 XML 是非标准格式,其中架构在相关值部分上方定义(示例如下)。

我能够正确读取架构,但在 XML 的值部分中创建正确的标签嵌套时遇到问题。

注意事项:

  • 除块 1 之外的所有块都可以有多个值
  • 子块,例如块 3_1,可以为空,并且应在 JSON 中表示为空列表

我希望以通用的方式解决问题(避免:“Block 3”中的子块),以便它可以适应结构/命名约定的微小变化。

XML 示例:

<body>
    <schema>
        <name>Block 1</name>
        <attributes>
            <string>block_1_attribute_1_name</string>
        </attributes>
        <subblocks>
            <block>
                <name>Block 2</name>
                <attributes>
                    <string>block_2_attribute_1_name</string>
                </attributes>
            </block>
            <block>
                <name>Block 3</name>
                <attributes>
                    <string>block_3_attribute_1_name</string>
                </attributes>
                <subblocks>
                    <block>
                        <name>Block 3_1</name>
                        <attributes>
                            <string>block_3_1_attribute_1_name</string>
                        </attributes>
                    </block>
                </subblocks>
            </block>
        </subblocks>
    </schema>
    <profiles>
        <values>
            <string>block_1_attribute_1_value_1</string>
        </values>
        <subblocks>
            <subblock>
                <values>
                    <string>block_2_attribute_1_value_1</string>
                </values>
            </subblock>
            <subblock>
                <values>
                    <string>block_3_attribute_1_value_1</string>
                </values>
                <subblocks>
                    <subblock>
                        <values>
                            <string>block_3_1_attribute_1_value_1</string>
                        </values>
                    </subblock>
                </subblocks>
            </subblock>
            <subblock>
                <values>
                    <string>block_3_attribute_1_value_2</string>
                </values>
                <subblocks>
                    <subblock>
                        <values>
                            <string>block_3_1_attribute_1_value_2</string>
                        </values>
                    </subblock>
                </subblocks>
            </subblock>
            <subblock>
                <values>
                    <string>block_3_attribute_1_value_3</string>
                </values>
                <subblocks>
                    <!-- empty subblock -->
                    <subblock/>  
                </subblocks>
            </subblock>
        </subblocks>
    </profiles>
</body>

输出示例:

{
    "Block 1": {
        "block_1_attribute_1_name": "block_1_attribute_1_value",
        "Block 2": [
            {
                "block_2_attribute_1_name": "block_2_attribute_1_value"
            }
        ],
        "Block 3": [
            {
                "block_3_attribute_1_name": "block_3_attribute_1_value_1",
                "Block 3_1": [
                    {
                        "block_3_1_attribute_1_name": "block_3_1_attribute_1_value_1"
                    }
                ]
            },
            {
                "block_3_attribute_1_name": "block_3_attribute_1_value_2",
                "Block 3_1": [
                    {
                        "block_3_1_attribute_1_name": "block_3_1_attribute_1_value_2"
                    }
                ]
            },
            {
                "block_3_attribute_1_name": "block_3_attribute_1_value_3",
                "Block 3_1": []
            },
        ]
    }
}

我已经尝试编写到目前为止为我提供模式的代码。这可以工作,但不是嵌套的(不一定是嵌套的)。我不知道从哪里开始剩下的。

def get_schema(root):
    """Retrieves the schema from the XML root element.

    Args:
        root (Element): The root element of the XML string.

    Returns:
        list: A list of tuples representing the schema. Each tuple contains
            the name of a given schema block and a list of attribute names
            associated with that element.
    """
    schema = []
    for name in root.findall(".//name"):
        attribute_names = [elem.text for elem in name.getnext().findall(".//string")]
        schema.append((name.text, attribute_names))
    return schema
python json xml lxml data-conversion
1个回答
0
投票

您可以使用包 xmltodict(使用

pip install xmltodict
获取它)轻松将 XML 字符串转换为 Python 字典,然后在必要时修改其结构。第二步,如果需要,您可以将修改后的字典转换为 JSON 字符串。

还有其他包,如 lxml 来解析 XML 字符串。它们处理非标准格式的能力可能有所不同。

这是一个带有

xmltodict
的示例:

import json
from pprint import pprint

import xmltodict

s = """
<body>
    <schema>
        <name>Block 1</name>
        <attributes>
            <string>block_1_attribute_1_name</string>
        </attributes>
        <subblocks>
            <block>
                <name>Block 2</name>
                <attributes>
                    <string>block_2_attribute_1_name</string>
                </attributes>
            </block>
            <block>
                <name>Block 3</name>
                <attributes>
                    <string>block_3_attribute_1_name</string>
                </attributes>
                <subblocks>
                    <block>
                        <name>Block 3_1</name>
                        <attributes>
                            <string>block_3_1_attribute_1_name</string>
                        </attributes>
                    </block>
                </subblocks>
            </block>
        </subblocks>
    </schema>
    <profiles>
        <values>
            <string>block_1_attribute_1_value_1</string>
        </values>
        <subblocks>
            <subblock>
                <values>
                    <string>block_2_attribute_1_value_1</string>
                </values>
            </subblock>
            <subblock>
                <values>
                    <string>block_3_attribute_1_value_1</string>
                </values>
                <subblocks>
                    <subblock>
                        <values>
                            <string>block_3_1_attribute_1_value_1</string>
                        </values>
                    </subblock>
                </subblocks>
            </subblock>
            <subblock>
                <values>
                    <string>block_3_attribute_1_value_2</string>
                </values>
                <subblocks>
                    <subblock>
                        <values>
                            <string>block_3_1_attribute_1_value_2</string>
                        </values>
                    </subblock>
                </subblocks>
            </subblock>
            <subblock>
                <values>
                    <string>block_3_attribute_1_value_3</string>
                </values>
                <subblocks>
                    <!-- empty subblock -->
                    <subblock/>  
                </subblocks>
            </subblock>
        </subblocks>
    </profiles>
</body>
"""

d = xmltodict.parse(s)
schema = d['body']['schema']
profiles = d['body']['profiles']
schema_json = json.dumps(schema)
profiles_json = json.dumps(profiles)

pprint(schema)
print('='*120)
pprint(schema_json)

print('#'*120)

pprint(profiles)
print('='*120)
pprint(profiles_json)

输出:

{'attributes': {'string': 'block_1_attribute_1_name'},
 'name': 'Block 1',
 'subblocks': {'block': [{'attributes': {'string': 'block_2_attribute_1_name'},
                          'name': 'Block 2'},
                         {'attributes': {'string': 'block_3_attribute_1_name'},
                          'name': 'Block 3',
                          'subblocks': {'block': {'attributes': {'string': 'block_3_1_attribute_1_name'},
                                                  'name': 'Block 3_1'}}}]}}
========================================================================================================================
('{"name": "Block 1", "attributes": {"string": "block_1_attribute_1_name"}, '
 '"subblocks": {"block": [{"name": "Block 2", "attributes": {"string": '
 '"block_2_attribute_1_name"}}, {"name": "Block 3", "attributes": {"string": '
 '"block_3_attribute_1_name"}, "subblocks": {"block": {"name": "Block 3_1", '
 '"attributes": {"string": "block_3_1_attribute_1_name"}}}}]}}')
########################################################################################################################
{'subblocks': {'subblock': [{'values': {'string': 'block_2_attribute_1_value_1'}},
                            {'subblocks': {'subblock': {'values': {'string': 'block_3_1_attribute_1_value_1'}}},
                             'values': {'string': 'block_3_attribute_1_value_1'}},
                            {'subblocks': {'subblock': {'values': {'string': 'block_3_1_attribute_1_value_2'}}},
                             'values': {'string': 'block_3_attribute_1_value_2'}},
                            {'subblocks': {'subblock': None},
                             'values': {'string': 'block_3_attribute_1_value_3'}}]},
 'values': {'string': 'block_1_attribute_1_value_1'}}
========================================================================================================================
('{"values": {"string": "block_1_attribute_1_value_1"}, "subblocks": '
 '{"subblock": [{"values": {"string": "block_2_attribute_1_value_1"}}, '
 '{"values": {"string": "block_3_attribute_1_value_1"}, "subblocks": '
 '{"subblock": {"values": {"string": "block_3_1_attribute_1_value_1"}}}}, '
 '{"values": {"string": "block_3_attribute_1_value_2"}, "subblocks": '
 '{"subblock": {"values": {"string": "block_3_1_attribute_1_value_2"}}}}, '
 '{"values": {"string": "block_3_attribute_1_value_3"}, "subblocks": '
 '{"subblock": null}}]}}')
© www.soinside.com 2019 - 2024. All rights reserved.