将 xml 文件加载到 df(多级)

问题描述 投票:0回答:1

我正在尝试使用 etree.ElementTree 从许多 xml 文件加载数据。


The data in the xml looks like this sample:

<mydata>
<Record>
   <Field name="ParaA">2927695</Field>
   <Field name="Index"/>
   <Field name="ParaB">D:\path\data</Field>
   <Field name="ParaC">116.4</Field>
   <Field name="ParaD">1583.4</Field>
   <Field name="ParaBE">12.0</Field>
   <Row>
      <Field name="Para1">1</Field>
      <Field name="Para2">D5A</Field>
      <Field name="Para3">1586.0</Field>
   </Row>
   <Row>
      <Field name="Para1">2</Field>
      <Field name="Para2">D4A</Field>
      <Field name="Para3">118.0</Field>
      <Field name="Para4">12.0</Field>
   </Row>
   </Record>
   </mydata>

行数是动态的。我想要 df 与 帕拉A |第 1 段 |帕拉2

已阅读文档但无法解决它。我认为问题在于该行正在记录中。

我需要的是像下面这样的df

ParaA    Para1 Para2 Para3   Para4
2927695  1     D5A   1586.0
2927695  2     D4A   118.0   12.0

我的代码的 df 是空的

import pandas as pd
import numpy as np
import xml.etree.ElementTree as ET


xml_file = r'.\Test.xml'

tree = ET.parse(xml_file)
root = tree.getroot()


data = []


for record in root.findall('record'):
    record_data = {}
    for elem in record:
        record_data[elem.tag] = elem.text
    data.append(record_data)

df = pd.DataFrame(data)

print(df)

希望有人能帮忙。

python elementtree
1个回答
0
投票

适用于您提供的数据

for record in root.findall('Record'):
    record_data = []
    for elem in record:
        match elem.tag:
            case 'Field':
                if elem.attrib['name'] == 'ParaA':
                    paraa = elem.text
            case 'Row':
                row={'ParaA': paraa}
                for field in elem.findall('Field'):
                    row.update({field.attrib['name']:field.text})
                record_data.append(row)

df 输出:

     ParaA Para1 Para2   Para3 Para4
0  2927695     1   D5A  1586.0   NaN
1  2927695     2   D4A   118.0  12.0
© www.soinside.com 2019 - 2024. All rights reserved.