我是 google colab pro+ 用户。我正在尝试训练 DL 模型,但在训练模型时出现错误“传输端点未连接”。我以前一直使用专业版并遇到同样的错误,所以我考虑升级,但它也没有帮助。代码运行良好 3 小时,每次大约 3-4 小时,我都会遇到错误。 我尝试使用 JavaScript 让运行时继续运行,并且运行良好。完整的错误信息是:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
18% 54466/300000 [3:57:19<17:49:51, 3.83it/s]
Traceback (most recent call last):
File "/content/drive/MyDrive/Ub4D/exp_runner.py", line 998, in <module>
File "/content/drive/MyDrive/Ub4D/exp_runner.py", line 257, in train
File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 391, in add_scalar
self._get_file_writer().add_summary(summary, global_step, walltime)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 113, in add_summary
self.run()
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 244, in run
self.add_event(event, global_step, walltime)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 98, in add_event
self._run()
self.event_writer.add_event(event)
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 117, in add_event
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 275, in _run
self._record_writer.write(data)
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/record_writer.py", line 40, in write
self._async_writer.write(event.SerializeToString())
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 171, in write
self._writer.write(header + header_crc + data + footer_crc)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 101, in write
self._check_worker_status()
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 212, in _check_worker_status
self._writable_file.append(
raise exception
tensorflow.python.framework.errors_impl.FailedPreconditionError: exp/Cactus/paper_config/logs/events.out.tfevents.1691166619.f28574666dfe.12707.0; Transport endpoint is not connected
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 244, in run
self._run()
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 275, in _run
self._record_writer.write(data)
File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/record_writer.py", line 40, in write
self._writer.write(header + header_crc + data + footer_crc)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 101, in write
self._writable_file.append(
tensorflow.python.framework.errors_impl.FailedPreconditionError: exp/Cactus/paper_config/logs/events.out.tfevents.1691166619.f28574666dfe.12707.0; Transport endpoint is not connected