我知道已经有几个关于 xml 排序的问题,但它们似乎都不适合我的情况。 我有以下 xml 文件,表示 esri 文件地理数据库的数据模式的剪裁:
import xml.etree.ElementTree as ET
from operator import attrgetter
data = """<esri:Workspace xmlns:esri='http://www.esri.com/schemas/ArcGIS/10.8' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xs='http://www.w3.org/2001/XMLSchema'>
<WorkspaceDefinition xsi:type='esri:WorkspaceDefinition'>
<WorkspaceType>esriLocalDatabaseWorkspace</WorkspaceType>
<Version/>
<Domains xsi:type='esri:ArrayOfDomain'/>
<Sequences xsi:type='esri:ArrayOfSequence'/>
<DatasetDefinitions xsi:type='esri:ArrayOfDataElement'>
<DataElement xsi:type='esri:DEFeatureClass'/>
<DataElement xsi:type='esri:DEFeatureClass'/>
<DataElement xsi:type='esri:DEFeatureClass'/>
<DataElement xsi:type='esri:DEFeatureDataset'/>
<DataElement xsi:type='esri:DEFeatureClass'/>
<DataElement xsi:type='esri:DEFeatureClass'/>
</DatasetDefinitions>
</WorkspaceDefinition>
<WorkspaceData xsi:type='esri:WorkspaceData'/>
</esri:Workspace>"""
root_1 = ET.fromstring(data)
我想按标签和 DataElement 类型对它进行排序,以便它像这样排序:
WorkspaceData {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:WorkspaceData'}
WorkspaceDefinition {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:WorkspaceDefinition'}
DatasetDefinitions {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfDataElement'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureDataset'}
Domains {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfDomain'}
Sequences {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfSequence'}
Version {}
WorkspaceType {}
到目前为止,我设法按标签排序,但如何按 DataElement 类型排序?到目前为止,这是我的代码:
root_1[:] = sorted(root_1, key=attrgetter("tag")) # WorkspaceData, WorkspaceDefinition
for node in root_1.findall("*"): # DatasetDefinitions, Domains, Sequences, Version, WorkspaceType
node[:] = sorted(node, key=attrgetter("tag"))
print(node)
for subnode in node.findall("*"): #DataElement, Domain
subnode[:] = sorted(subnode, key=attrgetter("tag"))
#subnode[:] = sorted(subnode, key=subnode.get['xsi:type']) # not working!
print("\t", subnode.tag, subnode.attrib)
for subsubnode in subnode.findall("*"):
print("\t\t", subsubnode.tag, subsubnode.attrib)
subsubnode[:] = sorted(subsubnode, key=attrgetter("tag"))
IIUC,您可以稍微更改
key=
中的sorted()
参数:
import xml.etree.ElementTree as ET
from operator import attrgetter
data = """<esri:Workspace xmlns:esri='http://www.esri.com/schemas/ArcGIS/10.8' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xs='http://www.w3.org/2001/XMLSchema'>
<WorkspaceDefinition xsi:type='esri:WorkspaceDefinition'>
<WorkspaceType>esriLocalDatabaseWorkspace</WorkspaceType>
<Version/>
<Domains xsi:type='esri:ArrayOfDomain'/>
<Sequences xsi:type='esri:ArrayOfSequence'/>
<DatasetDefinitions xsi:type='esri:ArrayOfDataElement'>
<DataElement xsi:type='esri:DEFeatureClass'/>
<DataElement xsi:type='esri:DEFeatureClass'/>
<DataElement xsi:type='esri:DEFeatureClass'/>
<DataElement xsi:type='esri:DEFeatureDataset'/>
<DataElement xsi:type='esri:DEFeatureClass'/>
<DataElement xsi:type='esri:DEFeatureClass'/>
</DatasetDefinitions>
</WorkspaceDefinition>
<WorkspaceData xsi:type='esri:WorkspaceData'/>
</esri:Workspace>"""
root_1 = ET.fromstring(data)
root_1[:] = sorted(root_1, key=attrgetter("tag")) # WorkspaceData, WorkspaceDefinition
for node in root_1.findall(
"*"
): # DatasetDefinitions, Domains, Sequences, Version, WorkspaceType
node[:] = sorted(node, key=attrgetter("tag"))
print(node)
for subnode in node.findall("*"): # DataElement, Domain
subnode[:] = sorted(
subnode,
key=lambda node: ( # <--- change key= here
node.tag,
node.get("{http://www.w3.org/2001/XMLSchema-instance}type"),
),
)
print("\t", subnode.tag, subnode.attrib)
for subsubnode in subnode.findall("*"):
print("\t\t", subsubnode.tag, subsubnode.attrib)
subsubnode[:] = sorted(
subsubnode,
key=attrgetter("tag"),
)
打印:
<Element 'WorkspaceData' at 0x7f5ff630bec0>
<Element 'WorkspaceDefinition' at 0x7f5ff6316610>
DatasetDefinitions {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfDataElement'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureDataset'}
Domains {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfDomain'}
Sequences {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfSequence'}
Version {}
WorkspaceType {}