我正在编写一个稍后将进行优化的函数,因此无法将数据作为参数传递给该函数。我对该函数的参数仅限于进行优化的参数。
我需要在函数中的某个地方传递数据,我想知道如何使用Global变量或类来做到这一点。当前,我正在读取函数中的“ dtrain”,这是不正确的,因为每次数据更新时,我都必须更新该函数。
如果我在脚本中编写该函数,那么它将正常工作。但我将其编写为模块,稍后将其导入到脚本中。
这是我的职能:
def bo_xgb_evaluate(learning_rate, subsample, colsample_bytree, gamma,
min_child_weight, max_depth, tweedie_variance_power):
import numpy as np
import xgboost as xgb
xgbparams = {'eval_metric': 'tweedie-nloglik@' + str(np.round(tweedie_variance_power, 2)),
'objective': 'reg:tweedie',
'nthread': 4,
'learning_rate': learning_rate,
'max_depth': int(max_depth),
'subsample': max(min(subsample, 1), 0),
'colsample_bytree': max(min(colsample_bytree, 1), 0),
'gamma': gamma,
'min_child_weight': int(min_child_weight),
'seed': 1001}
folds = 4
print("\n Search parameters:\n %s" % (xgbparams))
dtrain = xgb.DMatrix('/train.buffer')
cv_result = xgb.cv(xgbparams,
dtrain,
num_boost_round=1000,
nfold=folds,
# stop the training when validation scores have not improved for 10 estimators
early_stopping_rounds=10,
metrics='tweedie-nloglik@' + str(np.round(tweedie_variance_power, 2)),
verbose_eval=5,
seed=1367)
val_score = -1.0 * cv_result['test-tweedie-nloglik@' + str(np.round(tweedie_variance_power, 2)) + '-mean'].iloc[-1]
train_score = -1.0 * cv_result['train-tweedie-nloglik@' + str(np.round(tweedie_variance_power, 2)) + '-mean'].iloc[
-1]
print('\n Stopped after %d iterations with train-deviance = %f val-deviance = %f ( diff = %f )' %
(len(cv_result), train_score, val_score, (train_score - val_score)))
return val_score
也许分解您的函数,所以您将有一个函数返回xgbparams
,然后有一个函数返回dtrain
,然后有一个函数需要xgbparams
和dtrain
进行计算。
def1():
return xgbparams
def2():
return dtrain
def3(xgbparams, dtrain);
run the thing...
此外,通过将其拆分,您将有更好的机会弄清有效的方法。
否则,如果要创建一个类,则可以使用self
,self.dtrain
传递数据。
class myclass(object):
import xgboost as xgb
def __init__(self, data):
self.dtrain = xgb.DMatrix(data)
etc..
并且在初始化类时,只需将其定义为输入,然后将其导入:
myclass('/train.buffer')