使用python中的xml树将嵌套的XML内容转换为CSV

问题描述 投票:-1回答:1

我对python很陌生,请像对待我一样。当我尝试将XML内容转换为“词典列表”时,我得到了输出,但没有达到预期的效果,并且尝试了很多。

XML内容

<project>
<data>
    <row>
        <respondent>m0wxo5f6w42h3fot34m7s6xij</respondent>
        <timestamp>10-06-16 11:30</timestamp>
        <product>1</product>
        <replica>1</replica>
        <seqnr>1</seqnr>
        <session>1</session>
        <column>
            <question>Q1</question>
            <answer>a1</answer>
        </column>
        <column>
            <question>Q2</question>
            <answer>a2</answer>
        </column>
    </row>
<row>
        <respondent>w42h3fot34m7s6x</respondent>
        <timestamp>10-06-16 11:30</timestamp>
        <product>1</product>
        <replica>1</replica>
        <seqnr>1</seqnr>
        <session>1</session>
        <column>
            <question>Q3</question>
            <answer>a3</answer>
        </column>
        <column>
            <question>Q4</question>
            <answer>a4</answer>
        </column>
    <column>
            <question>Q5</question>
            <answer>a5</answer>
        </column>
    </row>
</data>
</project>

我使用的代码:

import xml.etree.ElementTree as ET

tree = ET.parse(xml_file.xml)   # import xml from
root = tree.getroot()  
data_list = []

for item in root.find('./data'):    # find all projects node
  data = {}              # dictionary to store content of each projects
  for child in item:
    data[child.tag] = child.text   # add item to dictionary

#-----------------for loop with subchild is not working as expcted in my case
    for subchild in child:
      data[subchild.tag] = subchild.text
      data_list.append(data)
print(data_list)

headers = {k for d in data_list for k in d.keys()} # headers for csv 
with open(csv_file,'w') as f:
    writer = csv.DictWriter(f, fieldnames = headers)    # creating a DictWriter object
    writer.writeheader()    # write headers to csv
    writer.writerows(data_list)

data_list的输出正在将问题的最后一个信息发送到词典列表中。我想问题出在子子forloop上,但是我不明白如何用字典追加列表。

[{
  'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
  'timestamp': '10-06-16 11:30',
  'product': '1',
  'replica': '1',
  'seqnr': '1',
  'session': '1',
  'column': '\n  ,
  'question': 'Q2',
  'answer': 'a2'
},
{
'respondent': 'w42h3fot34m7s6x',
  'timestamp': '10-06-16 11:30',
  'product': '1',
  'replica': '1',
  'seqnr': '1',
  'session': '1',
  'column': '\n ,
  'question': 'Q2',
  'answer': 'a2'
}.......
]

我期望下面的输出,尝试了很多,但无法遍历列标记。

[{
    'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
    'timestamp': '10-06-16 11:30',
    'product': '1',
    'replica': '1',
    'seqnr': '1',
    'session': '1',
    'question': 'Q1',
    'answer': 'a1'
  },
  {
    'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
    'timestamp': '10-06-16 11:30',
    'product': '1',
    'replica': '1',
    'seqnr': '1',
    'session': '1',
    'question': 'Q2',
    'answer': 'a2'
  },
  {
    'respondent': 'w42h3fot34m7s6x',
    'timestamp': '10-06-16 11:30',
    'product': '1',
    'replica': '1',
    'seqnr': '1',
    'session': '1',
    'question': 'Q3',
    'answer': 'a3'
  },
  {
    'respondent': 'w42h3fot34m7s6x',
    'timestamp': '10-06-16 11:30',
    'product': '1',
    'replica': '1',
    'seqnr': '1',
    'session': '1',
    'question': 'Q4',
    'answer': 'a4'
  },
  {
    'respondent': 'w42h3fot34m7s6x',
    'timestamp': '10-06-16 11:30',
    'product': '1',
    'replica': '1',
    'seqnr': '1',
    'session': '1',
    'question': 'Q5',
    'answer': 'a5'
  }
]

我在xml树上引用了很多堆栈溢出问题,但仍然没有帮助我。

感谢您的任何帮助/建议。

我对python很陌生,请像对待我一样。当我尝试将XML内容转换为字典列表时,我得到了输出,但没有达到预期的效果,并且尝试了很多。 XML ...

python xml csv dictionary xml-parsing
1个回答
0
投票

我在理解此代码应该执行的操作时遇到了问题,因为它使用了抽象变量名,例如itemchildsubchild,这使得对代码进行推理变得困难。我不是那么聪明,所以我将变量重命名为rowtagcolumn,以便于我更轻松地查看代码的作用。 (在我的书中,即使row

© www.soinside.com 2019 - 2024. All rights reserved.