我正在尝试使用此简单示例https://cloud.google.com/ml-engine/docs/scikit/getting-predictions-xgboost#get_online_predictions基于Google Cloud运行一个简单的XGBoost预测>
该模型构建良好,但是当我尝试使用示例输入JSON运行预测时,它失败,并显示错误“无法从输入初始化DMatrix:无法将字符串转换为浮点数
:”,如屏幕所示下面。我知道发生这种情况是因为测试输入包含字符串,我希望Google机器学习模型应该具有将分类值转换为浮点数的信息。我不能期望我的用户提交带有浮点值的在线预测请求。基于本教程,它应该能够在不将分类值转换为浮点数的情况下工作。请告知,我已附上GIF的更多详细信息。谢谢
import json
import numpy as np
import os
import pandas as pd
import pickle
import xgboost as xgb
from sklearn.preprocessing import LabelEncoder
# these are the column labels from the census data files
COLUMNS = (
'age',
'workclass',
'fnlwgt',
'education',
'education-num',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'capital-gain',
'capital-loss',
'hours-per-week',
'native-country',
'income-level'
)
# categorical columns contain data that need to be turned into numerical
# values before being used by XGBoost
CATEGORICAL_COLUMNS = (
'workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country'
)
# load training set
with open('./census_data/adult.data', 'r') as train_data:
raw_training_data = pd.read_csv(train_data, header=None, names=COLUMNS)
# remove column we are trying to predict ('income-level') from features list
train_features = raw_training_data.drop('income-level', axis=1)
# create training labels list
train_labels = (raw_training_data['income-level'] == ' >50K')
# load test set
with open('./census_data/adult.test', 'r') as test_data:
raw_testing_data = pd.read_csv(test_data, names=COLUMNS, skiprows=1)
# remove column we are trying to predict ('income-level') from features list
test_features = raw_testing_data.drop('income-level', axis=1)
# create training labels list
test_labels = (raw_testing_data['income-level'] == ' >50K.')
# convert data in categorical columns to numerical values
encoders = {col:LabelEncoder() for col in CATEGORICAL_COLUMNS}
for col in CATEGORICAL_COLUMNS:
train_features[col] = encoders[col].fit_transform(train_features[col])
for col in CATEGORICAL_COLUMNS:
test_features[col] = encoders[col].fit_transform(test_features[col])
# load data into DMatrix object
dtrain = xgb.DMatrix(train_features, train_labels)
dtest = xgb.DMatrix(test_features)
# train XGBoost model
bst = xgb.train({}, dtrain, 20)
bst.save_model('./model.bst')
我正在尝试使用此简单示例https://cloud.google.com/ml-engine/docs/scikit/getting-predictions-xgboost#get_online_predictions该基于Google Cloud的简单XGBoost预测...] >
您可以使用熊猫将分类字符串转换为用于模型输入的代码,例如,用于工作类:
df['workclass_cat'] = self.df['workclass'].astype('category')
df['workclass_cat'] = self.df['workclass_cat'].cat.codes