我正在修改一个一直用于提供 ML 模型的工作 Flask 应用程序,现在可以通过新的、更新的模型提供预测。
该模型是我使用 Pickle 库训练和序列化的 sklearn 管道对象。 它包括几个列转换器步骤、插补、编码,最后是预测步骤。根据一些自定义列转换器的编写方式,重要的是每个中间步骤的输出是 Pandas Dataframe 而不是数组,这就是模型的训练和序列化方式。
这里开始变得奇怪:
我的问题是:
这是我的基本代码:
import os
import pickle
import pandas as pd
from flask import (Flask, redirect, make_response)
import logging
#define app
app = Flask(__name__)
# load the trained Model
model = pickle.load(open("model.pkl"), "rb")
# load test record
test_record = pd.read_csv("test_record.csv")
# Make a prediction using the model and test record. This step works.
try:
model.predict(test_record)
except Exception as e:
log.debug(str(e), stack_info=True)
@app.route('/predict')
def predict():
# Make a prediction using the model and test record. This step *doesn't* work.
try:
prediction = model.predict(test_record)
except Exception as e:
log.debug(str(e), stack_info=True)
return make_response('Test Record Prediction: ' + str(prediction),200)
# Start the Flask app
if __name__ == '__main__':
if os.environ['ENV'] in {'local','local_w_db','DEV'}:
app.run(debug=True)
else:
app.run()
环境规格:
python==3.11.7
Flask==3.0.1
scikit-learn==1.3.2
scikit-learn-intelex==2023.2.1
scipy==1.11.4
pandas==2.1.4
category_encoders==2.6.3
werkzeug==3.0.1
joblib==1.2.0
我已经尝试过:
通过在我的 Flask 应用程序代码的开头运行这些行来将 sklearn 转换输出设置为始终为 pandas,这没有什么区别:
import sklearn
sklearn.set_config(transform_output="pandas")
这是在预测步骤中随异常返回的堆栈跟踪,它指出了线程配置中的问题,而不是 sklearn 设置中的问题:
[2024-01-30 08:58:44,666] DEBUG [app.predict:109] - Specifying the columns using strings is only supported for pandas DataFrames
Stack (most recent call last):
File "c:\Users\user\.vscode\extensions\ms-python.python-2023.22.1\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydev_bundle\pydev_monkey.py", line 1118, in __call__
ret = self.original_func(*self.args, **self.kwargs)
File "..\miniforge3\envs\APP_ENV\Lib\threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
File "..\miniforge3\envs\APP_ENV\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "..\miniforge3\envs\APP_ENV\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "..\miniforge3\envs\APP_ENV\Lib\socketserver.py", line 691, in process_request_thread
self.finish_request(request, client_address)
File "..\miniforge3\envs\APP_ENV\Lib\socketserver.py", line 361, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "..\miniforge3\envs\APP_ENV\Lib\socketserver.py", line 755, in __init__
self.handle()
File "..\miniforge3\envs\APP_ENV\Lib\site-packages\werkzeug\serving.py", line 390, in handle
super().handle()
File "..\miniforge3\envs\APP_ENV\Lib\http\server.py", line 436, in handle
self.handle_one_request()
File "..\miniforge3\envs\APP_ENV\Lib\http\server.py", line 424, in handle_one_request
method()
File "..\miniforge3\envs\APP_ENV\Lib\site-packages\werkzeug\serving.py", line 362, in run_wsgi
execute(self.server.app)
File "..\miniforge3\envs\APP_ENV\Lib\site-packages\werkzeug\serving.py", line 323, in execute
application_iter = app(environ, start_response)
File "..\miniforge3\envs\APP_ENV\Lib\site-packages\flask\app.py", line 1488, in __call__
return self.wsgi_app(environ, start_response)
File "..\miniforge3\envs\APP_ENV\Lib\site-packages\flask\app.py", line 1463, in wsgi_app
response = self.full_dispatch_request()
File "..\miniforge3\envs\APP_ENV\Lib\site-packages\flask\app.py", line 870, in full_dispatch_request
rv = self.dispatch_request()
File "..\miniforge3\envs\APP_ENV\Lib\site-packages\flask\app.py", line 855, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "C:\Users\user\app_directory\app.py", line 109, in predict
log.debug(str(e), stack_info=True)
出现此问题的原因是
sklearn.set_config(transform_output="pandas")
是线程本地的,并且不会跨不同线程持续存在,例如 Flask
用于处理请求的线程。要解决此问题,请通过在 sklearn.set_config(transform_output="pandas")
路由开头调用 predict()
来显式设置请求上下文中的配置。这可确保管道在请求处理过程中为 pandas 输出使用正确的配置。