我已经编写了使用pymysql库将数据加载到MySQL表的代码。我正在通过以下方式将数据加载到mysql表中:
import pymysql
con = pymysql.connect(host=host,user=user,password=passwd,db=db,port=int(port),autocommit=True,local_infile=1)
sql = "LOAD XML INFILE '" + path + "' INTO TABLE "+ ds_name +"."+table_name +" SET dataset="+ds_name+", factor_date="+factor_date+","+column_map+ " ROWS IDENTIFIED BY '<LoanInfo>'"
cursor.execute(sql)
cursos.commit()
ds_name和factor_date不是从xml文件编译的,因此我将它们写为静态的所有行。
我有一个CSV / excel文件,其中包含XML文件列与100多个列的MySQL表列名称之间的映射。我在某处读到可以将引用列映射添加到SQL查询中,例如“ SET ABC_AGE = @ Age,UNIQUE_ID = @ID,BALANCE = @ Money”。我以以下方式创建映射列表:
ls = []
for value in zip(map_df['XML Columns'],map_df['SQL Columns']):
ls.append(value[0]+"=@"+value[1])
column_map = ",".join(ls)
我的问题是,是否有更好的方法来使用带有映射的python将XML文件加载到MySQL?
我找到了一种将xml文件转换为pandas数据帧,然后通过executemany将其加载到mysql数据库的方法。这是一段将xml转换为数据帧的代码:
#reading mapping file and converting mapping to dictionary
import os
import pandas as pd
map_path = 'Mapping.xlsx'
if os.path.isfile(map_path):
map_df = pd.read_excel(map_path,worksheet='Mapping')
mapping_dict = pd.Series(map_df['XML Columns'].values,index=map_df['SQL columns']).to_dict()
#Reading XML file
import xml.etree.ElementTree as ET
xml_path = 'test.xml'
if os.path.isfile(xml_path):
root = ET.parse(xml_path).getroot()
#Reading xml elements one by one and storing attributes in a dictionary.
missing_col = set()
xmldf_dict = {"df_dicts":[]}
for elem in root:
df_dict = {}
for k,v in mapping_dict.items():
if k in [list of columns to skip]:
continue
try:
df_dict[k] = elem.attrib[v]
except KeyError:
missing_col.add(k)
xmldf_dict["df_dicts"].append(df_dict)
#Merging missing columns dataframe with xml dataframe
missing_col_df = pd.DataFrame(columns=missing_col)
xml_df = pd.DataFrame(xmldf_dict["df_dicts"])
final_df = pd.concat([xml_df,missing_col_df],axis=1)