XGboost Google-AI-Model期望浮点值,而不是使用分类值并将其转换

问题描述 投票:1回答:1

我正在尝试使用此简单示例https://cloud.google.com/ml-engine/docs/scikit/getting-predictions-xgboost#get_online_predictions基于Google Cloud运行一个简单的XGBoost预测>

该模型构建良好,但是当我尝试使用示例输入JSON运行预测时,它失败,并显示错误“无法从输入初始化DMatrix:无法将字符串转换为浮点数

:”,如屏幕所示下面。我知道发生这种情况是因为测试输入包含字符串,我希望Google机器学习模型应该具有将分类值转换为浮点数的信息。我不能期望我的用户提交带有浮点值的在线预测请求。

基于本教程,它应该能够在不将分类值转换为浮点数的情况下工作。请告知,我已附上GIF的更多详细信息。谢谢

enter image description here

import json
import numpy as np
import os
import pandas as pd
import pickle
import xgboost as xgb
from sklearn.preprocessing import LabelEncoder

# these are the column labels from the census data files
COLUMNS = (
    'age',
    'workclass',
    'fnlwgt',
    'education',
    'education-num',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'capital-gain',
    'capital-loss',
    'hours-per-week',
    'native-country',
    'income-level'
)

# categorical columns contain data that need to be turned into numerical
# values before being used by XGBoost
CATEGORICAL_COLUMNS = (
    'workclass',
    'education',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'native-country'
)

# load training set
with open('./census_data/adult.data', 'r') as train_data:
    raw_training_data = pd.read_csv(train_data, header=None, names=COLUMNS)
# remove column we are trying to predict ('income-level') from features list
train_features = raw_training_data.drop('income-level', axis=1)
# create training labels list
train_labels = (raw_training_data['income-level'] == ' >50K')


# load test set
with open('./census_data/adult.test', 'r') as test_data:
    raw_testing_data = pd.read_csv(test_data, names=COLUMNS, skiprows=1)
# remove column we are trying to predict ('income-level') from features list
test_features = raw_testing_data.drop('income-level', axis=1)
# create training labels list
test_labels = (raw_testing_data['income-level'] == ' >50K.')

# convert data in categorical columns to numerical values
encoders = {col:LabelEncoder() for col in CATEGORICAL_COLUMNS}
for col in CATEGORICAL_COLUMNS:
    train_features[col] = encoders[col].fit_transform(train_features[col])
for col in CATEGORICAL_COLUMNS:
    test_features[col] = encoders[col].fit_transform(test_features[col])

# load data into DMatrix object
dtrain = xgb.DMatrix(train_features, train_labels)
dtest = xgb.DMatrix(test_features)

# train XGBoost model
bst = xgb.train({}, dtrain, 20)
bst.save_model('./model.bst')

我正在尝试使用此简单示例https://cloud.google.com/ml-engine/docs/scikit/getting-predictions-xgboost#get_online_predictions该基于Google Cloud的简单XGBoost预测...] >

google-cloud-platform scikit-learn xgboost
1个回答
0
投票

您可以使用熊猫将分类字符串转换为用于模型输入的代码,例如,用于工作类:

df['workclass_cat'] = self.df['workclass'].astype('category')
df['workclass_cat'] = self.df['workclass_cat'].cat.codes
© www.soinside.com 2019 - 2024. All rights reserved.