我希望将传入的 XML 数据转换为 JSON,以便在 Python 中更有效地处理数据。 XML 是非标准格式,其中架构在相关值部分上方定义(示例如下)。
我能够正确读取架构,但在 XML 的值部分中创建正确的标签嵌套时遇到问题。
注意事项:
我希望以通用的方式解决问题(避免:“Block 3”中的子块),以便它可以适应结构/命名约定的微小变化。
XML 示例:
<body>
<schema>
<name>Block 1</name>
<attributes>
<string>block_1_attribute_1_name</string>
</attributes>
<subblocks>
<block>
<name>Block 2</name>
<attributes>
<string>block_2_attribute_1_name</string>
</attributes>
</block>
<block>
<name>Block 3</name>
<attributes>
<string>block_3_attribute_1_name</string>
</attributes>
<subblocks>
<block>
<name>Block 3_1</name>
<attributes>
<string>block_3_1_attribute_1_name</string>
</attributes>
</block>
</subblocks>
</block>
</subblocks>
</schema>
<profiles>
<values>
<string>block_1_attribute_1_value_1</string>
</values>
<subblocks>
<subblock>
<values>
<string>block_2_attribute_1_value_1</string>
</values>
</subblock>
<subblock>
<values>
<string>block_3_attribute_1_value_1</string>
</values>
<subblocks>
<subblock>
<values>
<string>block_3_1_attribute_1_value_1</string>
</values>
</subblock>
</subblocks>
</subblock>
<subblock>
<values>
<string>block_3_attribute_1_value_2</string>
</values>
<subblocks>
<subblock>
<values>
<string>block_3_1_attribute_1_value_2</string>
</values>
</subblock>
</subblocks>
</subblock>
<subblock>
<values>
<string>block_3_attribute_1_value_3</string>
</values>
<subblocks>
<!-- empty subblock -->
<subblock/>
</subblocks>
</subblock>
</subblocks>
</profiles>
</body>
输出示例:
{
"Block 1": {
"block_1_attribute_1_name": "block_1_attribute_1_value",
"Block 2": [
{
"block_2_attribute_1_name": "block_2_attribute_1_value"
}
],
"Block 3": [
{
"block_3_attribute_1_name": "block_3_attribute_1_value_1",
"Block 3_1": [
{
"block_3_1_attribute_1_name": "block_3_1_attribute_1_value_1"
}
]
},
{
"block_3_attribute_1_name": "block_3_attribute_1_value_2",
"Block 3_1": [
{
"block_3_1_attribute_1_name": "block_3_1_attribute_1_value_2"
}
]
},
{
"block_3_attribute_1_name": "block_3_attribute_1_value_3",
"Block 3_1": []
},
]
}
}
我已经尝试编写到目前为止为我提供模式的代码。这可以工作,但不是嵌套的(不一定是嵌套的)。我不知道从哪里开始剩下的。
def get_schema(root):
"""Retrieves the schema from the XML root element.
Args:
root (Element): The root element of the XML string.
Returns:
list: A list of tuples representing the schema. Each tuple contains
the name of a given schema block and a list of attribute names
associated with that element.
"""
schema = []
for name in root.findall(".//name"):
attribute_names = [elem.text for elem in name.getnext().findall(".//string")]
schema.append((name.text, attribute_names))
return schema
您可以使用包 xmltodict(使用
pip install xmltodict
获取它)轻松将 XML 字符串转换为 Python 字典,然后在必要时修改其结构。第二步,如果需要,您可以将修改后的字典转换为 JSON 字符串。
还有其他包,如 lxml 来解析 XML 字符串。它们处理非标准格式的能力可能有所不同。
这是一个带有
xmltodict
的示例:
import json
from pprint import pprint
import xmltodict
s = """
<body>
<schema>
<name>Block 1</name>
<attributes>
<string>block_1_attribute_1_name</string>
</attributes>
<subblocks>
<block>
<name>Block 2</name>
<attributes>
<string>block_2_attribute_1_name</string>
</attributes>
</block>
<block>
<name>Block 3</name>
<attributes>
<string>block_3_attribute_1_name</string>
</attributes>
<subblocks>
<block>
<name>Block 3_1</name>
<attributes>
<string>block_3_1_attribute_1_name</string>
</attributes>
</block>
</subblocks>
</block>
</subblocks>
</schema>
<profiles>
<values>
<string>block_1_attribute_1_value_1</string>
</values>
<subblocks>
<subblock>
<values>
<string>block_2_attribute_1_value_1</string>
</values>
</subblock>
<subblock>
<values>
<string>block_3_attribute_1_value_1</string>
</values>
<subblocks>
<subblock>
<values>
<string>block_3_1_attribute_1_value_1</string>
</values>
</subblock>
</subblocks>
</subblock>
<subblock>
<values>
<string>block_3_attribute_1_value_2</string>
</values>
<subblocks>
<subblock>
<values>
<string>block_3_1_attribute_1_value_2</string>
</values>
</subblock>
</subblocks>
</subblock>
<subblock>
<values>
<string>block_3_attribute_1_value_3</string>
</values>
<subblocks>
<!-- empty subblock -->
<subblock/>
</subblocks>
</subblock>
</subblocks>
</profiles>
</body>
"""
d = xmltodict.parse(s)
schema = d['body']['schema']
profiles = d['body']['profiles']
schema_json = json.dumps(schema)
profiles_json = json.dumps(profiles)
pprint(schema)
print('='*120)
pprint(schema_json)
print('#'*120)
pprint(profiles)
print('='*120)
pprint(profiles_json)
输出:
{'attributes': {'string': 'block_1_attribute_1_name'},
'name': 'Block 1',
'subblocks': {'block': [{'attributes': {'string': 'block_2_attribute_1_name'},
'name': 'Block 2'},
{'attributes': {'string': 'block_3_attribute_1_name'},
'name': 'Block 3',
'subblocks': {'block': {'attributes': {'string': 'block_3_1_attribute_1_name'},
'name': 'Block 3_1'}}}]}}
========================================================================================================================
('{"name": "Block 1", "attributes": {"string": "block_1_attribute_1_name"}, '
'"subblocks": {"block": [{"name": "Block 2", "attributes": {"string": '
'"block_2_attribute_1_name"}}, {"name": "Block 3", "attributes": {"string": '
'"block_3_attribute_1_name"}, "subblocks": {"block": {"name": "Block 3_1", '
'"attributes": {"string": "block_3_1_attribute_1_name"}}}}]}}')
########################################################################################################################
{'subblocks': {'subblock': [{'values': {'string': 'block_2_attribute_1_value_1'}},
{'subblocks': {'subblock': {'values': {'string': 'block_3_1_attribute_1_value_1'}}},
'values': {'string': 'block_3_attribute_1_value_1'}},
{'subblocks': {'subblock': {'values': {'string': 'block_3_1_attribute_1_value_2'}}},
'values': {'string': 'block_3_attribute_1_value_2'}},
{'subblocks': {'subblock': None},
'values': {'string': 'block_3_attribute_1_value_3'}}]},
'values': {'string': 'block_1_attribute_1_value_1'}}
========================================================================================================================
('{"values": {"string": "block_1_attribute_1_value_1"}, "subblocks": '
'{"subblock": [{"values": {"string": "block_2_attribute_1_value_1"}}, '
'{"values": {"string": "block_3_attribute_1_value_1"}, "subblocks": '
'{"subblock": {"values": {"string": "block_3_1_attribute_1_value_1"}}}}, '
'{"values": {"string": "block_3_attribute_1_value_2"}, "subblocks": '
'{"subblock": {"values": {"string": "block_3_1_attribute_1_value_2"}}}}, '
'{"values": {"string": "block_3_attribute_1_value_3"}, "subblocks": '
'{"subblock": null}}]}}')