Colab Pro+ 中未连接传输端点

问题描述 投票:0回答:1

我是 google colab pro+ 用户。我正在尝试训练 DL 模型,但在训练模型时出现错误“传输端点未连接”。我以前一直使用专业版并遇到同样的错误,所以我考虑升级,但它也没有帮助。代码运行良好 3 小时,每次大约 3-4 小时,我都会遇到错误。 我尝试使用 JavaScript 让运行时继续运行,并且运行良好。完整的错误信息是:

Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner 18% 54466/300000 [3:57:19<17:49:51, 3.83it/s] Traceback (most recent call last): File "/content/drive/MyDrive/Ub4D/exp_runner.py", line 998, in <module> File "/content/drive/MyDrive/Ub4D/exp_runner.py", line 257, in train File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 391, in add_scalar self._get_file_writer().add_summary(summary, global_step, walltime) File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 113, in add_summary self.run() File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 244, in run self.add_event(event, global_step, walltime) File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 98, in add_event self._run() self.event_writer.add_event(event) File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 117, in add_event File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 275, in _run self._record_writer.write(data) File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/record_writer.py", line 40, in write self._async_writer.write(event.SerializeToString()) File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 171, in write self._writer.write(header + header_crc + data + footer_crc) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 101, in write self._check_worker_status() File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 212, in _check_worker_status self._writable_file.append( raise exception tensorflow.python.framework.errors_impl.FailedPreconditionError: exp/Cactus/paper_config/logs/events.out.tfevents.1691166619.f28574666dfe.12707.0; Transport endpoint is not connected File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 244, in run self._run() File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 275, in _run self._record_writer.write(data) File "/usr/local/lib/python3.10/dist-packages/tensorboard/summary/writer/record_writer.py", line 40, in write self._writer.write(header + header_crc + data + footer_crc) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 101, in write self._writable_file.append( tensorflow.python.framework.errors_impl.FailedPreconditionError: exp/Cactus/paper_config/logs/events.out.tfevents.1691166619.f28574666dfe.12707.0; Transport endpoint is not connected


tensorflow deep-learning google-colaboratory
1个回答
0
投票

© www.soinside.com 2019 - 2024. All rights reserved.