[使用带有列映射的python将XML文件加载到MySQL数据库中

问题描述 投票:0回答:1

我已经编写了使用pymysql库将数据加载到MySQL表的代码。我正在通过以下方式将数据加载到mysql表中:

    import pymysql
    con = pymysql.connect(host=host,user=user,password=passwd,db=db,port=int(port),autocommit=True,local_infile=1)
    sql = "LOAD XML INFILE '" + path + "' INTO TABLE "+ ds_name +"."+table_name +" SET dataset="+ds_name+", factor_date="+factor_date+","+column_map+ " ROWS IDENTIFIED BY '<LoanInfo>'"
    cursor.execute(sql)
    cursos.commit()

ds_name和factor_date不是从xml文件编译的,因此我将它们写为静态的所有行。

我有一个CSV / excel文件,其中包含XML文件列与100多个列的MySQL表列名称之间的映射。我在某处读到可以将引用列映射添加到SQL查询中,例如“ SET ABC_AGE = @ Age,UNIQUE_ID = @ID,BALANCE = @ Money”。我以以下方式创建映射列表:

ls = []
for value in zip(map_df['XML Columns'],map_df['SQL Columns']):
    ls.append(value[0]+"=@"+value[1])
column_map = ",".join(ls)

我的问题是,是否有更好的方法来使用带有映射的python将XML文件加载到MySQL?

column map example

mysql python-3.x xml-parsing
1个回答
0
投票

我找到了一种将xml文件转换为pandas数据帧,然后通过executemany将其加载到mysql数据库的方法。这是一段将xml转换为数据帧的代码:

#reading mapping file and converting mapping to dictionary
import os
import pandas as pd
map_path = 'Mapping.xlsx'
if os.path.isfile(map_path):
    map_df = pd.read_excel(map_path,worksheet='Mapping')
    mapping_dict = pd.Series(map_df['XML Columns'].values,index=map_df['SQL columns']).to_dict()

#Reading XML file

import xml.etree.ElementTree as ET
xml_path = 'test.xml'
if os.path.isfile(xml_path):
        root = ET.parse(xml_path).getroot()

#Reading xml elements one by one and storing attributes in a dictionary.

missing_col = set()
xmldf_dict = {"df_dicts":[]}
for elem in root:
    df_dict = {}
    for k,v in mapping_dict.items():
        if k in [list of columns to skip]:
            continue
        try:
            df_dict[k] =  elem.attrib[v]
        except KeyError:
            missing_col.add(k)

    xmldf_dict["df_dicts"].append(df_dict)

#Merging missing columns dataframe with xml dataframe

missing_col_df = pd.DataFrame(columns=missing_col)
xml_df = pd.DataFrame(xmldf_dict["df_dicts"])
final_df = pd.concat([xml_df,missing_col_df],axis=1)

© www.soinside.com 2019 - 2024. All rights reserved.