我正在尝试解析XML的这一部分,并希望它通过确定需要运行多少次而以迭代方式自行运行。
[此外,订单项可能具有或不具有每一列的所有值,并且我尝试用无填充这些空白,如果这些标签/文本中的任何标签/文本均不存在,请稍后将其映射到右列csv转换
要解析的我的XML(以粗体显示,发票订单项突出显示:]
<Invoice>
<DocumentSource>Supplier</DocumentSource>
<DocumentType>Invoice</DocumentType>
.
.
.
<InvoiceLineItems>
<LineItem>
<InvoiceLineNum>1</InvoiceLineNum>
<POLineNum>1</POLineNum>
<Quantity>2</Quantity>
<UOM>EA</UOM>
<UnitPrice>50.00</UnitPrice>
<LineAmount>100.00</LineAmount>
<SalesTaxPercent>9.75</SalesTaxPercent>
<SupplierPartNum />
<ShortDescription>Marley & Me</ShortDescription>
<LongDescription>Marley & Me</LongDescription>
<DeliveryChargeCode/>
</LineItem>
<LineItem>
<InvoiceLineNum>2</InvoiceLineNum>
<LineAmount>-10.00</LineAmount>
</LineItem>
</InvoiceLineItems>
</Invoice>
Output I am expecting need to look something like ...
我现在所拥有的非常基本,如下所示:
# Counting Line Items under an Invoice Line Items
for inv_line_items in root.findall('InvoiceLineItems'):
countX = sum([1 for entry in inv_line_items.getiterator('LineItem')])
print(countX)
invoice_ln1 = []
invoice_ln2 = []
for i in range(0, countX):
for z in root[18][i]:
if i == 0:
#invoice_hdr0.append(z.text)
if z.tag == 'InvoiceLineNum':
invoice_ln1.append(z.text)
if z.tag == 'POLineNum':
invoice_ln1.append(z.text)
if z.tag == 'Quantity':
invoice_ln1.append(z.text)
if z.tag == 'UOM':
invoice_ln1.append(z.text)
if z.tag == 'Unit_Price':
invoice_ln1.append(z.text)
if z.tag == 'LineAmount':
invoice_ln1.append(z.text)
if z.tag == 'SalesTaxPercent':
invoice_ln1.append(z.text)
if z.tag == 'SupplierPartNum':
invoice_ln1.append(z.text)
if z.tag == 'ShortDescription':
invoice_ln1.append(z.text)
if z.tag == 'LongDescription':
invoice_ln1.append(z.text)
if z.tag == 'DeliveryChargeCode':
invoice_ln1.append(z.text)
print(invoice_ln1)
else:
#invoice_hdr1.append(z.text)
if z.tag == 'InvoiceLineNum':
invoice_ln2.append(z.text)
if z.tag == 'POLineNum':
invoice_ln2.append(z.text)
if z.tag == 'Quantity':
invoice_ln2.append(z.text)
if z.tag == 'UOM':
invoice_ln2.append(z.text)
if z.tag == 'Unit_Price':
invoice_ln2.append(z.text)
if z.tag == 'LineAmount':
invoice_ln2.append(z.text)
if z.tag == 'SalesTaxPercent':
invoice_ln2.append(z.text)
if z.tag == 'SupplierPartNum':
invoice_ln2.append(z.text)
if z.tag == 'ShortDescription':
invoice_ln2.append(z.text)
if z.tag == 'LongDescription':
invoice_ln2.append(z.text)
if z.tag == 'DeliveryChargeCode':
invoice_ln2.append(z.text)
print(invoice_ln2)
items = """[your xml above]"""
import lxml.html
import pandas as pd
categories = ["invoicelinenum", "polinenum","quantity","uom","unitprice","lineamount","salestaxpercent","supplierpartnum","shortdescription",
"longdescription","deliverychargecode]"]
columns = ['ILI Line Num','ILI PO Line',
'ILI QTY', 'ILI UOM','ILI Unit Price','ILI Line Amt','ILI Sales Tax %',
'ILI Supply','ShortDesc','LongDesc','ChargeCode']
doc = lxml.html.fromstring(items)
invoices = doc.xpath('//InvoiceLineItems/LineItem'.lower())
def dict_to_list(d, keys):
return [d.get(key, None) for key in keys]
#credit: https://stackoverflow.com/a/58192327/9448090
all_inv = []
fin_dicts=[]
fin_list = []
for invoice in invoices:
items = []
for item in invoice:
item_dict = {}
item_dict[item.tag]= item.text
items.append(item_dict)
all_inv.append(items)
for inv in all_inv:
temp_dict={}
for d in inv:
temp_dict.update(d)
fin_dicts.append(temp_dict)
for dict in fin_dicts:
fin_list.append(dict_to_list(dict, categories))
df = pd.DataFrame(fin_list,columns=columns)
df
这应该会给您您想要的桌子。