我正在尝试使用 etree.ElementTree 从许多 xml 文件加载数据。
The data in the xml looks like this sample:
<mydata>
<Record>
<Field name="ParaA">2927695</Field>
<Field name="Index"/>
<Field name="ParaB">D:\path\data</Field>
<Field name="ParaC">116.4</Field>
<Field name="ParaD">1583.4</Field>
<Field name="ParaBE">12.0</Field>
<Row>
<Field name="Para1">1</Field>
<Field name="Para2">D5A</Field>
<Field name="Para3">1586.0</Field>
</Row>
<Row>
<Field name="Para1">2</Field>
<Field name="Para2">D4A</Field>
<Field name="Para3">118.0</Field>
<Field name="Para4">12.0</Field>
</Row>
</Record>
</mydata>
行数是动态的。我想要 df 与 帕拉A |第 1 段 |帕拉2
已阅读文档但无法解决它。我认为问题在于该行正在记录中。
我需要的是像下面这样的df
ParaA Para1 Para2 Para3 Para4
2927695 1 D5A 1586.0
2927695 2 D4A 118.0 12.0
我的代码的 df 是空的
import pandas as pd
import numpy as np
import xml.etree.ElementTree as ET
xml_file = r'.\Test.xml'
tree = ET.parse(xml_file)
root = tree.getroot()
data = []
for record in root.findall('record'):
record_data = {}
for elem in record:
record_data[elem.tag] = elem.text
data.append(record_data)
df = pd.DataFrame(data)
print(df)
希望有人能帮忙。
适用于您提供的数据
for record in root.findall('Record'):
record_data = []
for elem in record:
match elem.tag:
case 'Field':
if elem.attrib['name'] == 'ParaA':
paraa = elem.text
case 'Row':
row={'ParaA': paraa}
for field in elem.findall('Field'):
row.update({field.attrib['name']:field.text})
record_data.append(row)
df 输出:
ParaA Para1 Para2 Para3 Para4
0 2927695 1 D5A 1586.0 NaN
1 2927695 2 D4A 118.0 12.0