我正在尝试使用Deepchem的GaussianProcessHyperparamOpt中的hyperparam_search
函数。我正在遵循HyperparamOpt类的测试脚本中所做的操作(没有对gaussianprocess类进行测试):
加载训练集等后,我为超参数定义了一个数字:
hps = {
'layer_sizes': [1500],
'weight_init_stddevs': [0.02],
'bias_init_consts': [1.],
'dropouts': [0.5],
'penalty': 0.1,
'penalty_type': 'l2',
'batch_size': 50,
'nb_epoch': 10,
'learning_rate': 0.001 }
然后,我正在制作一个model_builder函数(这是我从测试脚本中复制/粘贴的::
def model_builder(model_params, model_dir):
return dc.models.MultitaskRegressor(
len(tasks), n_features, model_dir=model_dir, **model_params)
然后我定义度量,根据代码,该度量应以长度1的列表形式给出:
regression_metric = [dc.metrics.Metric(dc.metrics.r2_score)]
然后我调用hyperparameter_search
函数,并提供四个必需的参数:
optimizer = dc.hyper.GaussianProcessHyperparamOpt(model_builder)
best_hyper_params, best_performance, all_results = optimizer.hyperparam_search(
hps,
train,
valid,
transformers,
regression_metric
)
并且我收到KeyValue错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-85-be984f768f5b> in <module>()
5 valid,
6 transformers,
----> 7 metric = regression_metric
8 )
9 # metric = regression_metric
1 frames
/usr/lib/python3.6/os.py in __getitem__(self, key)
667 except KeyError:
668 # raise KeyError with the original key value
--> 669 raise KeyError(key) from None
670 return self.decodevalue(value)
671
KeyError: 'DEEPCHEM_DATA_DIR'
为了证明,我并不疯狂,这是该函数的代码,您可以看到它期望将度量作为第四个参数
23 def hyperparam_search(
24 self,
25 params_dict,
26 train_dataset,
27 valid_dataset,
28 output_transformers,
29 metric,
30 direction=True,
31 n_features=1024,
32 n_tasks=1,
33 max_iter=20,
34 search_range=4,
35 hp_invalid_list=[
36 'seed', 'nb_epoch', 'penalty_type', 'dropouts', 'bypass_dropouts',
37 'n_pair_feat', 'fit_transformers', 'min_child_weight',
38 'max_delta_step', 'subsample', 'colsample_bylevel',
39 'colsample_bytree', 'reg_alpha', 'reg_lambda', 'scale_pos_weight',
40 'base_score'
41 ],
42 log_file='GPhypersearch.log'):
43 """Perform hyperparams search using a gaussian process assumption
44
45 params_dict include single-valued parameters being optimized,
46 which should only contain int, float and list of int(float)
47
48 parameters with names in hp_invalid_list will not be changed.
49
50 For Molnet models, self.model_class is model name in string,
51 params_dict = dc.molnet.preset_hyper_parameters.hps[self.model_class]
52
53 Parameters
54 ----------
55 params_dict: dict
56 dict including parameters and their initial values
57 parameters not suitable for optimization can be added to hp_invalid_list
58 train_dataset: dc.data.Dataset struct
59 dataset used for training
60 valid_dataset: dc.data.Dataset struct
61 dataset used for validation(optimization on valid scores)
62 output_transformers: list of dc.trans.Transformer
63 transformers for evaluation
64 metric: list of dc.metrics.Metric
65 metric used for evaluation
这里是时髦的:如果我将指标作为对象而不是作为包含对象的列表传递:
regression_metric = dc.metrics.Metric(dc.metrics.r2_score)
我没有收到KeyValue错误,但是当函数检查指标是否为长度1的列表时,它崩溃了。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-87-be984f768f5b> in <module>()
5 valid,
6 transformers,
----> 7 metric = regression_metric
8 )
9 # metric = regression_metric
/usr/local/lib/python3.7/site-packages/deepchem/hyper/gaussian_process.py in hyperparam_search(self, params_dict, train_dataset, valid_dataset, output_transformers, metric, direction, n_features, n_tasks, max_iter, search_range, hp_invalid_list, log_file)
89 """
90
---> 91 assert len(metric) == 1, 'Only use one metric'
92 hyper_parameters = params_dict
93 hp_list = list(hyper_parameters.keys())
TypeError: object of type 'Metric' has no len()
任何Deepchem用户在那里,如果您已使此功能正常工作,请发出提示!
我们在gitter通道中进行了一些来回调试,结果发现存在一些导致此问题的潜在错误。这是记录修复程序的几个问题:
感谢您帮助解决这些问题! DeepChem的下一个稳定版本应已解决这些问题。