解析.xml文件中的数据并将其保存到Python中的.tsv文件中

问题描述 投票:-1回答:1

我有一个看起来像这样的数据集:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
<Review rid="1004293">
    <sentences>
        <sentence id="1004293:0">
            <text>Judging from previous posts this used to be a good place, but not any longer.</text>
            <Opinions>
                <Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
            </Opinions>
        </sentence>
        <sentence id="1004293:1">
            <text>The food here is rather good, but only if you like to wait for it.</text>
            <Opinions>
                <Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="4" to="8"/>
                <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
            </Opinions>
        </sentence>
...

我如何以以下格式将.xml文件中的数据解析为.tsv文件:

[“ negative”,“从以前的帖子来看,这曾经是一个好地方,但是现在不再有用。”,“ RESTAURANT#GENERAL”]]

[“正”,“这里的食物相当好,但是只有当你想等它的时候。”,“食物质量”]

[“负面”,“这里的食物相当不错,但只有当您想等待时才可以。”,“ SERVICE#GENERAL”]

谢谢!

我有一个像这样的数据集:[[[[[[]]]]]]

python-3.x xml parsing xml-parsing
1个回答
0
投票
您可以使用python的elementtree包来获取所需的输出。下面的代码将打印您的列表。您可以通过替换打印并写入tsv文件来创建tsv。

sample.xml文件必须存在于存在此代码的相同目录中。

from xml.etree import ElementTree file = 'sample.xml' tree = ElementTree.parse(file) root = tree.getroot() for sentence in root.iter('sentence'): # Loop all sentence in the xml for opinion in sentence.iter('Opinion'): # Loop all Opinion of a particular sentence. print([opinion.attrib['polarity'], sentence.find('text').text, opinion.attrib['category']])

输出:
© www.soinside.com 2019 - 2024. All rights reserved.