Am使用xlrd和xlwt编写python代码以比较两个Excel工作表,并将输出写入第三工作表。例如
Sheet 1
nativeEMSName
HR_MEWT_XX5906_TR_I_HR10001
HR_LOHN_5811X_T01_C_X_HO55001
HR_PHKL_XX6541_TR_I_HR10001
HR_RWRI_XX3608_TR_I_HR10001
HR_KTHL_XX6382_AR_I_HR50001
ABC
HR_KURU_XX3714_TR_I_HR10001
HR_RWRI_XX1142_TR_I_HR10001
HR_SAHU_SAHUW_B01_C_X_EX10001
HR_KTHL_XX3622_TR_I_HR10001
Sheet2
nativeEMSName id
HR_KURU_XX3714_TR_I_HR10001 66
HR_PHKL_XX6541_TR_I_HR10001 999
HR_MEWT_XX5906_TR_I_HR10001 2
HR_KTHL_XX6382_AR_I_HR50001 7777
HR_KTHL_XX3622_TR_I_HR10001 4
HR_SAHU_SAHUW_B01_C_X_EX10001 3
HR_LOHN_5811X_T01_C_X_HO55001 111
HR_RWRI_XX1142_TR_I_HR10001 55
HR_RWRI_XX3608_TR_I_HR10001 888
我在sheet2中找到sheet1的nativeEMSName,并在sheeet3中写入nativeEMSName和相应的ID。以下代码用于相同
conls=0
colnd=0
for rowsr in range(sheet1.nrows):
test=sheet1.cell(rowsr,colns).value
for rowdr in range(sheet2.nrows):
test1=sheet2.cell(rowdr,colnd).value
if test==test1:
ID = sheet2.cell(rowdr, colnd +1).value
sheet3.write(rowsr,colns,ID)
sheet3.write(rowsr,colnd+1,test1)
wb.save('test.xls')
break
但是挑战是当行数在两张纸上都像30k时,代码需要花费太多时间来执行。我想减少执行时间。对于优化此代码或使用其他方法在最短的时间内获得输出,任何帮助将不胜感激。
您的查找为O(mxn),其中m是sheet1中的行数,n是sheet2中的行数。
我假设没有重复项。我认为这会更好。也很容易发现错误,因为您的地图中有数据
1。将sheet1读入列表
sheet1_list = [HR_MEWT_XX5906_TR_I_HR10001,
HR_LOHN_5811X_T01_C_X_HO55001,
HR_PHKL_XX6541_TR_I_HR10001,
HR_RWRI_XX3608_TR_I_HR10001, and_so_on.. ]
2。将sheet2(查找表)阅读到地图中
sheet2_map = {
'HR_KURU_XX3714_TR_I_HR10001' : '66'
'HR_PHKL_XX6541_TR_I_HR10001' : '999'
'HR_MEWT_XX5906_TR_I_HR10001' : '2
'HR_KTHL_XX6382_AR_I_HR50001' : '7777'
<So on..>
}
3。循环列表并找到ID,在这里,个人查找是O(1),总时间是O(n),n是sheet1中的条目数,因此减少了时间。
for key in sheet1_list:
print(key, sheet2_map[key]) #check by print
sheet3_map[key] = sheet2_map[key] # inserts key:id, like {'HR_MEWT_XX5906_TR_I_HR10001': '2' }
4。 convert_map_to_excel(sheet3_map),有库,但是很容易,使用xlwt编写xls。 https://xlwt.readthedocs.io/en/latest/