在Pandas数据框(Python)中打开一个tsv文件,其中包含从XML解析的数据时,会出现ParserError

问题描述 投票:0回答:1

我有一个看起来像这样的XML文件:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
<Review rid="1004293">
<sentences>
    <sentence id="1004293:0">
        <text>Judging from previous posts this used to be a good place, but not any longer.</text>
        <Opinions>
            <Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
        </Opinions>
    </sentence>
    <sentence id="1004293:1">
        <text>The food here is rather good, but only if you like to wait for it.</text>
        <Opinions>
            <Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="4" to="8"/>
            <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
        </Opinions>
    </sentence>
...

我使用此脚本:

from xml.etree import ElementTree
file = 'sample.xml'
tree = ElementTree.parse(file)
root = tree.getroot()
with open('output.tsv', 'w') as tsvfile:
    for sentence in root.iter('sentence'):
        for opinion in sentence.iter('Opinion'):
            writer.writerow([opinion.attrib['polarity'], sentence.find('text').text, opinion.attrib['category']])

获取如下所示的输出records.tsv文件:

['negative', 'Judging from previous posts this used to be a good place, but not any longer.', 'RESTAURANT#GENERAL']
['negative', 'We, there were four of us, arrived at noon - the place was empty - and the staff acted like we were imposing on them and they were very rude.', 'SERVICE#GENERAL']
['negative', 'They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.', 'SERVICE#GENERAL']
['negative', 'The food was lousy - too sweet or too salty and the portions tiny.', 'FOOD#QUALITY']
['negative', 'The food was lousy - too sweet or too salty and the portions tiny.', 'FOOD#STYLE_OPTIONS']

当我打开文件时,一切看起来不错,但是当我尝试将此文件加载到Pandas时,出现以下错误:

ParserError: Error tokenizing data. C error: Expected 5 fields in line 8, saw 6

我该如何解决?谢谢。

python-3.x xml pandas parsing output
1个回答
0
投票
这是因为引号内的,

您可以像这样将csv文件读取到pandas DataFrame

>>> df = pd.read_csv(StringIO(data), header=None, quotechar="'", quoting=1, delimiter=',', skipinitialspace=True) >>> df 0 1 2 0 negative Judging from previous posts this used to be a ... RESTAURANT#GENERAL 1 negative We, there were four of us, arrived at noon - t... SERVICE#GENERAL 2 negative They never brought us complimentary noodles, i... SERVICE#GENERAL 3 negative The food was lousy - too sweet or too salty an... FOOD#QUALITY 4 negative The food was lousy - too sweet or too salty an... FOOD#STYLE_OPTIONS

© www.soinside.com 2019 - 2024. All rights reserved.