XML解析迭代和句柄中断

问题描述 投票:0回答:1

我正在尝试解析XML的这一部分,并希望它通过确定需要运行多少次而以迭代方式自行运行。

[此外,订单项可能具有或不具有每一列的所有值,并且我尝试用无填充这些空白,如果这些标签/文本中的任何标签/文本均不存在,请稍后将其映射到右列csv转换

要解析的我的XML(以粗体显示,发票订单项突出显示:]

<Invoice>
    <DocumentSource>Supplier</DocumentSource>
    <DocumentType>Invoice</DocumentType>
    .
    .
    .

    <InvoiceLineItems>
        <LineItem>
            <InvoiceLineNum>1</InvoiceLineNum>
            <POLineNum>1</POLineNum>
            <Quantity>2</Quantity>
            <UOM>EA</UOM>
            <UnitPrice>50.00</UnitPrice>
            <LineAmount>100.00</LineAmount>
            <SalesTaxPercent>9.75</SalesTaxPercent>
            <SupplierPartNum />
            <ShortDescription>Marley &amp; Me</ShortDescription>
            <LongDescription>Marley &amp; Me</LongDescription>
            <DeliveryChargeCode/>
        </LineItem>
        <LineItem>
            <InvoiceLineNum>2</InvoiceLineNum>
            <LineAmount>-10.00</LineAmount>
        </LineItem>
    </InvoiceLineItems>
</Invoice> 

Output I am expecting need to look something like ...

我现在所拥有的非常基本,如下所示:

# Counting Line Items under an Invoice Line Items
for inv_line_items in root.findall('InvoiceLineItems'):
    countX = sum([1 for entry in inv_line_items.getiterator('LineItem')])
    print(countX)

invoice_ln1 = []
invoice_ln2 = []

for i in range(0, countX):
    for z in root[18][i]:
        if i == 0:
            #invoice_hdr0.append(z.text)
            if z.tag == 'InvoiceLineNum':
                invoice_ln1.append(z.text)
            
            if z.tag == 'POLineNum':
                invoice_ln1.append(z.text)
                
            if z.tag == 'Quantity':
                invoice_ln1.append(z.text)
                
            if z.tag == 'UOM':
                invoice_ln1.append(z.text)
                
            if z.tag == 'Unit_Price':
                invoice_ln1.append(z.text)
                
            if z.tag == 'LineAmount':
                invoice_ln1.append(z.text)
                
            if z.tag == 'SalesTaxPercent':
                invoice_ln1.append(z.text)
                
            if z.tag == 'SupplierPartNum':
                invoice_ln1.append(z.text)
                
            if z.tag == 'ShortDescription':
                invoice_ln1.append(z.text)
                
            if z.tag == 'LongDescription':
                invoice_ln1.append(z.text)
                
            if z.tag == 'DeliveryChargeCode':
                invoice_ln1.append(z.text)
                
            print(invoice_ln1)

        else:
            #invoice_hdr1.append(z.text)
            if z.tag == 'InvoiceLineNum':
                invoice_ln2.append(z.text)

            if z.tag == 'POLineNum':
                invoice_ln2.append(z.text)

            if z.tag == 'Quantity':
                invoice_ln2.append(z.text)

            if z.tag == 'UOM':
                invoice_ln2.append(z.text)

            if z.tag == 'Unit_Price':
                invoice_ln2.append(z.text)

            if z.tag == 'LineAmount':
                invoice_ln2.append(z.text)

            if z.tag == 'SalesTaxPercent':
                invoice_ln2.append(z.text)

            if z.tag == 'SupplierPartNum':
                invoice_ln2.append(z.text)

            if z.tag == 'ShortDescription':
                invoice_ln2.append(z.text)
                
            if z.tag == 'LongDescription':
                invoice_ln2.append(z.text)

            if z.tag == 'DeliveryChargeCode':
                invoice_ln2.append(z.text)

            print(invoice_ln2)
python python-3.x xml-parsing iterator
1个回答
0
投票
items = """[your xml above]""" import lxml.html import pandas as pd categories = ["invoicelinenum", "polinenum","quantity","uom","unitprice","lineamount","salestaxpercent","supplierpartnum","shortdescription", "longdescription","deliverychargecode]"] columns = ['ILI Line Num','ILI PO Line', 'ILI QTY', 'ILI UOM','ILI Unit Price','ILI Line Amt','ILI Sales Tax %', 'ILI Supply','ShortDesc','LongDesc','ChargeCode'] doc = lxml.html.fromstring(items) invoices = doc.xpath('//InvoiceLineItems/LineItem'.lower()) def dict_to_list(d, keys): return [d.get(key, None) for key in keys] #credit: https://stackoverflow.com/a/58192327/9448090 all_inv = [] fin_dicts=[] fin_list = [] for invoice in invoices: items = [] for item in invoice: item_dict = {} item_dict[item.tag]= item.text items.append(item_dict) all_inv.append(items) for inv in all_inv: temp_dict={} for d in inv: temp_dict.update(d) fin_dicts.append(temp_dict) for dict in fin_dicts: fin_list.append(dict_to_list(dict, categories)) df = pd.DataFrame(fin_list,columns=columns) df

这应该会给您您想要的桌子。

© www.soinside.com 2019 - 2024. All rights reserved.