使用python中的xml树将嵌套的XML内容转换为CSV

Question

我对python很陌生，请像对待我一样。当我尝试将XML内容转换为“词典列表”时，我得到了输出，但没有达到预期的效果，并且尝试了很多。

XML内容

<project>
<data>
    <row>
        <respondent>m0wxo5f6w42h3fot34m7s6xij</respondent>
        <timestamp>10-06-16 11:30</timestamp>
        <product>1</product>
        <replica>1</replica>
        <seqnr>1</seqnr>
        <session>1</session>
        <column>
            <question>Q1</question>
            <answer>a1</answer>
        </column>
        <column>
            <question>Q2</question>
            <answer>a2</answer>
        </column>
    </row>
<row>
        <respondent>w42h3fot34m7s6x</respondent>
        <timestamp>10-06-16 11:30</timestamp>
        <product>1</product>
        <replica>1</replica>
        <seqnr>1</seqnr>
        <session>1</session>
        <column>
            <question>Q3</question>
            <answer>a3</answer>
        </column>
        <column>
            <question>Q4</question>
            <answer>a4</answer>
        </column>
    <column>
            <question>Q5</question>
            <answer>a5</answer>
        </column>
    </row>
</data>
</project>
我使用的代码：

import xml.etree.ElementTree as ET

tree = ET.parse(xml_file.xml)   # import xml from
root = tree.getroot()  
data_list = []

for item in root.find('./data'):    # find all projects node
  data = {}              # dictionary to store content of each projects
  for child in item:
    data[child.tag] = child.text   # add item to dictionary

#-----------------for loop with subchild is not working as expcted in my case
    for subchild in child:
      data[subchild.tag] = subchild.text
      data_list.append(data)
print(data_list)

headers = {k for d in data_list for k in d.keys()} # headers for csv 
with open(csv_file,'w') as f:
    writer = csv.DictWriter(f, fieldnames = headers)    # creating a DictWriter object
    writer.writeheader()    # write headers to csv
    writer.writerows(data_list)
data_list的输出正在将问题的最后一个信息发送到词典列表中。我想问题出在子子forloop上，但是我不明白如何用字典追加列表。

[{ 'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij', 'timestamp': '10-06-16 11:30', 'product': '1', 'replica': '1', 'seqnr': '1', 'session': '1', 'column': '\n , 'question': 'Q2', 'answer': 'a2' }, { 'respondent': 'w42h3fot34m7s6x', 'timestamp': '10-06-16 11:30', 'product': '1', 'replica': '1', 'seqnr': '1', 'session': '1', 'column': '\n , 'question': 'Q2', 'answer': 'a2' }....... ]

我期望下面的输出，尝试了很多，但无法遍历列标记。

[{ 'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij', 'timestamp': '10-06-16 11:30', 'product': '1', 'replica': '1', 'seqnr': '1', 'session': '1', 'question': 'Q1', 'answer': 'a1' }, { 'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij', 'timestamp': '10-06-16 11:30', 'product': '1', 'replica': '1', 'seqnr': '1', 'session': '1', 'question': 'Q2', 'answer': 'a2' }, { 'respondent': 'w42h3fot34m7s6x', 'timestamp': '10-06-16 11:30', 'product': '1', 'replica': '1', 'seqnr': '1', 'session': '1', 'question': 'Q3', 'answer': 'a3' }, { 'respondent': 'w42h3fot34m7s6x', 'timestamp': '10-06-16 11:30', 'product': '1', 'replica': '1', 'seqnr': '1', 'session': '1', 'question': 'Q4', 'answer': 'a4' }, { 'respondent': 'w42h3fot34m7s6x', 'timestamp': '10-06-16 11:30', 'product': '1', 'replica': '1', 'seqnr': '1', 'session': '1', 'question': 'Q5', 'answer': 'a5' } ]

我在xml树上引用了很多堆栈溢出问题，但仍然没有帮助我。

感谢您的任何帮助/建议。

我对python很陌生，请像对待我一样。当我尝试将XML内容转换为字典列表时，我得到了输出，但没有达到预期的效果，并且尝试了很多。 XML ...

Answer 1

我在理解此代码应该执行的操作时遇到了问题，因为它使用了抽象变量名，例如item，child，subchild，这使得对代码进行推理变得困难。我不是那么聪明，所以我将变量重命名为row，tag和column，以便于我更轻松地查看代码的作用。（在我的书中，即使row

使用python中的xml树将嵌套的XML内容转换为CSV

问题描述投票：-1回答：1

1个回答

最新问题

使用python中的xml树将嵌套的XML内容转换为CSV

问题描述 投票：-1回答：1

1个回答

最新问题

问题描述投票：-1回答：1