我正在生成一个脚本来验证 XML(每个验证规则的具体代码都写在 validation_rules 模块中)并且代码执行时间太长了。因此,我决定使用 multiprocessing 模块来并行处理 DataFrame。但是,当我运行我的代码时,出现以下错误:
_pickle.PicklingError:无法腌制
这是我试过的代码:
def validate_xml(df_results):
# Create a Pool of processes
pool = Pool(cpu_count())
# Map the function to the rows in parallel
result_list = pool.map(apply_validation_function (df_results.itertuples(index=False), df_results))
# Combine the results into a dataframe
df_results[['status', 'comments']] = pd.DataFrame(result_list)
return df_results
def apply_validation_function(row, df_results):
function_name = str(row['function'])
if isinstance(function_name, str) and function_name != 'nan':
try:
function = getattr(validation_rules, function_name)
result = function(df_results, row.name)
return pd.Series({'status': result[0], 'comments': result[1]})
except Exception as e:
return pd.Series({'status': 'Error', 'comments': f'Error: {e}'})
else:
return pd.Series({'status': '', 'comments': ''})
验证前,df_results有以下列:
并且在验证之后,之前的加上: