我正在运行的代码脚本旨在与TF 2.0配合使用,以针对NLP任务在预训练的BERT基本模型上生成预测。我正在使用Cloud-TPU托管实例在Google Colab笔记本中使用Python 3.7和TF 2.1。我能够成功运行脚本而不会出错,并且可以使用云GPU生成预测,但是当我尝试使用TPU运行脚本时(在启用TPU并指向TPU的对应IP地址后,我得到了以下错误输出)。
2020-02-09 01:17:36.155906: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-02-09 01:17:36.156040: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-02-09 01:17:36.156061: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:tensorflow:From tf_kaggle_test.py:188: The name tf.estimator.tpu.InputPipelineConfig is deprecated. Please use tf.compat.v1.estimator.tpu.InputPipelineConfig instead.
WARNING:tensorflow:From tf_kaggle_test.py:189: The name tf.estimator.tpu.RunConfig is deprecated. Please use tf.compat.v1.estimator.tpu.RunConfig instead.
WARNING:tensorflow:From tf_kaggle_test.py:194: The name tf.estimator.tpu.TPUConfig is deprecated. Please use tf.compat.v1.estimator.tpu.TPUConfig instead.
WARNING:tensorflow:From tf_kaggle_test.py:212: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7f55bb7600d0>) includes params argument, but params are not passed to Estimator.
FLAGS.predict_file data/simplified-nq-test.jsonl
***** Running predictions *****
Num orig examples = 346
Num split examples = 9409
Batch size = 8
Num split into 3 = 8
.
.
Num split into 187 = 1
output/eval.tf_record
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-02-09 01:18:52.589210: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:373] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:From /content/gdrive/My Drive/Capstone - Google QA/natural-questions-nlp-drive/tf2_0_baseline_w_bert.py:1112: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
2020-02-09 01:18:53.890592: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/tracking/util.py:1262: NameBasedSaverStatus.__init__ (from tensorflow.python.training.tracking.util) is deprecated and will be removed in a future version.
Instructions for updating:
Restoring a name-based tf.train.Saver checkpoint using the object-based restore API. This mode uses global names to match variables, and so is somewhat fragile. It also adds new restore ops to the graph each time it is called when graph building. Prefer re-encoding training checkpoints in the object-based format: run save() on the object-based saver (the same one this message is coming from) and use that checkpoint in the future.
WARNING:tensorflow:From /content/gdrive/My Drive/Capstone - Google QA/natural-questions-nlp-drive/tf2_0_baseline_w_bert.py:1057: The name tf.estimator.tpu.TPUEstimatorSpec is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimatorSpec instead.
上面的警告都很好并且仍然可以运行,它们中的许多指出由于过时而导致的问题,因为原始脚本是针对TF 1.0构建的,然后转换为可与TF 2.0一起使用。似乎是问题所在,并且下面的tpu_estimator和error_handling脚本正在发生故障。与异常捕获过程有关。我不确定当它指向AttributeError时所指的是什么:'NameError'对象没有属性'op'并且未定义名称'assignment_map'。
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3075, in predict
rendezvous.record_error('prediction_loop', sys.exc_info())
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 81, in record_error
if value and value.op and value.op.type == _CHECK_NUMERIC_OP_NAME:
AttributeError: 'NameError' object has no attribute 'op'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tf_kaggle_test.py", line 267, in <module>
predict_input_fn, yield_single_examples=True):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3078, in predict
rendezvous.raise_errors()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 143, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
yield_single_examples=yield_single_examples):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 626, in predict
features, None, ModeKeys.PREDICT, self.config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1152, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3394, in _model_fn
scaffold = _get_scaffold(scaffold_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3749, in _get_scaffold
scaffold = scaffold_fn()
File "/content/gdrive/My Drive/Capstone - Google QA/natural-questions-nlp-drive/tf2_0_baseline_w_bert.py", line 994, in tpu_scaffold
tf.compat.v1.train.init_from_checkpoint(init_checkpoint, assignment_map)
NameError: name 'assignment_map' is not defined
我正在使用脚本的笔记本(它可以与GPU / CPU完美配合使用,位于以下位置:https://www.kaggle.com/abhinand05/bert-for-humans-tutorial-baseline/data#Code-Implementation-in-Tensorflow-2.0
是否与我需要更改的使用Google Colab或要与TPU一起使用进行的其他更改有关?
请问您可以使用代码快照来解释更多吗?