Tensorboard 上看不到任何内容

问题描述 投票:0回答:1

我刚刚使用 PyTorch Lightning 完成了模型训练(2000 个时期)。我以为 PL 有自动张量板日志记录,但我不确定。这是我训练步骤的回报:

log = {
            "total_reward": torch.tensor(self.total_reward).to(device),
            "reward": torch.tensor(reward).to(device),
            "train_loss": loss,
        }
        status = {
            "steps": torch.tensor(self.global_step).to(device),
            "total_reward": torch.tensor(self.total_reward).to(device),
        }

        return OrderedDict({"loss": loss, "log": log, "progress_bar": status})

这是我的 lighting_logs 文件夹的结构:

.
├── version_0
│   ├── checkpoints
│   │   └── epoch=2-step=191.ckpt
│   └── hparams.yaml
├── version_1
│   ├── checkpoints
│   │   └── epoch=2-step=191.ckpt
│   └── hparams.yaml
└── version_2
    ├── checkpoints
    │   └── epoch=2-step=191.ckpt
    └── hparams.yaml

6 directories, 6 files

运行张量板:

tensorboard --logdir=lightning_logs
2022-02-21 19:41:13.915945: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-21 19:41:13.915968: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-02-21 19:41:15.602607: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-02-21 19:41:15.602639: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-02-21 19:41:15.602653: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (scrungus-pc): /proc/driver/nvidia/version does not exist

但是当我打开张量板时,我得到:

No dashboards are active for the current data set. 

我做错了什么?

python tensorflow pytorch tensorboard pytorch-lightning
1个回答
0
投票

在 PyTorch Lightning 中,您可以使用

loss
方法将
self.log
等指标记录到 TensorBoard(或任何其他记录器)。例如:

def training_step(self, batch, batch_idx):
    # Your training logic here
    loss = ...
    self.log('loss', loss)  # Logs the loss to TensorBoard
    return loss

您使用 self.log 记录的每个值都会在 TensorBoard 界面中自动创建自己的绘图。默认情况下,PyTorch Lightning 使用 TensorBoard 作为记录器,但您可以通过将记录器参数传递给 Trainer 来更改或自定义记录器。例如:

from pytorch_lightning.loggers import WandbLogger

# Example of using WandbLogger instead of TensorBoard
wandb_logger = WandbLogger(project="my-project")
trainer = Trainer(logger=wandb_logger) 

使用默认的 TensorBoard 记录器时,您不需要任何额外的设置。记录的值(损失、准确性等)将显示在 TensorBoard 界面中的单独图表下。

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.