机器学习预测项目：在本地主机上的 Flask 应用程序中提交预测表单后出现“ValueError”

Question

我在做什么？

我正在开发一个机器学习项目，该项目可以预测美国不同州的电动汽车价格。我的目标是巩固我的实践技能。我已经完成了项目中的所有操作，例如执行 one-hot 编码、训练模型以及在本地主机上运行 Flask 应用程序。在本地主机中，我使用以下值填写了表单，然后单击提交按钮：

County: Jefferson
City: PORT TOWNSEND
ZIP Code: 98368
Model Year: 2012
Make: NISSAN
Model: LEAF
Electric Vehicle Type: Battery Electric Vehicle (BEV)
CAFV Eligibility: Clean Alternative Fuel Vehicle Eligible
Legislative District: 24

我面临什么问题？

提交表单后，我收到此错误：

ValueError
ValueError: Found unknown categories \['98368'\] in column 2 during transform

Traceback (most recent call last)
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\flask\\app.py", line 1498, in __call__
return self.wsgi_app(environ, start_response)
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\flask\\app.py", line 1476, in wsgi_app
response = self.handle_exception(e)
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\flask\\app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\flask\\app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\flask\\app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\flask\\app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions\[rule.endpoint\])(\*\*view_args)  # type: ignore\[no-any-return\]
File "G:\\Machine_Learning_Projects\\austin\\electric_vehicle_price_prediction_2\\app\\routes.py", line 38, in predict
price = predict_price(features)
File "G:\\Machine_Learning_Projects\\austin\\electric_vehicle_price_prediction_2\\app\\model.py", line 29, in predict_price
transformed_features = encoder.transform(features_df)
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\sklearn\\utils_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, \*args, \*\*kwargs)
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\sklearn\\preprocessing_encoders.py", line 1027, in transform
X_int, X_mask = self.\_transform(
File "C:\\Users\\austin.conda\\envs\\electric_vehicle_price_prediction_2\\lib\\site-packages\\sklearn\\preprocessing_encoders.py", line 200, in \_transform
raise ValueError(msg)
ValueError: Found unknown categories \['98368'\] in column 2 during transform\

我尝试了什么？

我尝试使用以下代码：

routes.py

文件夹内

app

文件的代码：

from flask import render_template, request, jsonify
from app import app
from app.model import predict_price
from jinja2 import Environment, FileSystemLoader, PackageLoader, select_autoescape

@app.route('/')
def index():
env = Environment(
loader=PackageLoader("app"),
autoescape=select_autoescape()
)
template = env.get_template("index.html")
return render_template(template)

@app.route('/predict', methods=\['POST'\])
def predict():
data = request.form.to_dict()

    # Convert the form data into the correct format for prediction
    features = [
        data['county'],
        data['city'],
        data['zip_code'],
        data['model_year'],
        data['make'],
        data['model'],
        data['ev_type'],
        data['cafv_eligibility'],
        data['legislative_district']
    ]
    
    # Get the prediction result
    price = predict_price(features)
    
    return jsonify({'predicted_price': price})

model.py

文件夹内

app

文件的代码：

import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestRegressor
import joblib
from flask import Flask, render_template
from jinja2 import Environment, FileSystemLoader, PackageLoader, select_autoescape

env = Environment(
loader=PackageLoader("app"),
autoescape=select_autoescape()
)

model = joblib.load('model/ev_price_model.pkl')

def predict_price(features):
    encoder = joblib.load('model/encoder.pkl')  # Load encoder if needed
    
    features_df = pd.DataFrame([features], columns=['County', 'City', 'ZIP Code', 'Model Year', 'Make', 'Model', 'Electric Vehicle Type', 'Clean Alternative Fuel Vehicle (CAFV) Eligibility', 'Legislative District'])
    
    # Apply encoding, scaling, etc., if necessary
    transformed_features = encoder.transform(features_df)
    
    # Make the prediction
    price = model.predict(transformed_features)
    
    return price[0]  # Assuming it returns a single value

我的 GitHub 存储库的链接是什么？

这是我的存储库的链接：

https://github.com/SteveAustin583/electric-vehicle-price-prediction

我在期待什么？

我希望能够毫无问题地得到预测结果。因为我已经进行了one-hot编码。

你能帮我解决这个问题吗？预先感谢。

Answer 1

我来自 stackoverflow。我已经在其他地方发布了答案，所以我无法发布另一个答案。编码器应用于整个数据集，这可能会给出不正确的结果

我认为最好定义单独的变换

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

num_transformer = StandardScaler()
cat_transformer = OneHotEncoder(handle_unknown='ignore')

# create a preprocessor using columntransformer from sklearn
preprocessor = ColumnTransformer(
    transformers=[
        ('num', num_transformer, numerical_cols),
        ('cat', cat_transformer, categorical_cols),
    ]
)

# combine into single pipeline
model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', RandomForestRegressor(random_state=42))
])

然后申请

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model.fit(X_train, y_train)

同样修改推论。让我知道这是否有助于解决您的问题。

机器学习预测项目：在本地主机上的 Flask 应用程序中提交预测表单后出现“ValueError”

问题描述投票：0回答：1

1个回答

最新问题

机器学习预测项目：在本地主机上的 Flask 应用程序中提交预测表单后出现“ValueError”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1