pytorch 相关问题

PyTorch是一个深度学习框架,它实现了一个动态计算图,它允许您改变神经网络在运行中的行为方式,并能够执行向后自动区分。

无法使用pytorch中的自定义损失函数更新潜在向量

我正在尝试实现这个功能,但没有成功。我正在使用一个 VAE 模型,同时还有编码器和解码器。我正在冻结 VAE 解码器的权重,并且

回答 1 投票 0

用不同的损失训练模型的不同阶段

我正在尝试以端到端的方式训练两阶段模型。但是,我想更新具有不同损失的模型的不同阶段。例如,假设端到端模型由...组成

回答 1 投票 0

如何在具有相同 CUDA 的虚拟环境中设置 TF 和 Torch

我想在一个具有相同 CUDA 的虚拟环境中设置 TensorFlow 和 pytorch。但是,我找不到可以同时支持tensorflow和pytorch的CUDA版本:对于tensorflow 2.10,我选择了...

回答 1 投票 0

Azure Synapse 中的 Pytorch 导致问题

我在 Azure Synapse 中有一个笔记本正在使用这些库 将 pandas 导入为 pd 将 numpy 导入为 np 从 sqlalchemy 导入 create_engine, 文本 将 sqlalchemy 导入为 sa 来自 azure.core.credentials

回答 1 投票 0

使用自定义容器将基于 Flask 的机器学习模型部署到 Vertex AI 端点时遇到问题

我正在部署一个 Flask 应用程序,该应用程序使用 PyTorch 提供机器学习模型(打包为 Docker 容器)到 Vertex AI 端点以进行在线预测。尽管 烧瓶

回答 1 投票 0

用于 jaxtyping 的 Hello World?

我找不到任何有关 jaxtyping 入门的说明或教程。我尝试了最简单的程序,但无法解析。我使用的是Python 3.11。我在 GitHub jaxtyp 上没有看到任何内容...

回答 1 投票 0

在win10上使用uv安装支持cudnn的pytorch

我想在win 10上用uv安装pytorch。 首先我用 pip 测试了它并且它有效。我用过: pip3 安装 torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 比,我

回答 1 投票 0

句子转换器的降维

我需要在预处理中计算大量句子(比如说10K)的嵌入,并且在运行时我必须一次计算一个句子的嵌入向量(用户查询),然后...

回答 1 投票 0

Pytorch LSTM 与 LSTMCell

Pytorch(目前版本为1.1)中的LSTM和LSTMCell有什么区别?看起来 LSTMCell 是 LSTM 的一个特例(即只有一层、单向、无 dropout)。 然后,...

回答 2 投票 0

无法训练我的 UNET 多类分割模型[重复]

我尝试用 pytorch 从头开始制作我的 UNET。我的模型输出除了黑色面具之外什么也没有。我需要对汽车的损坏进行分段,因此我实现了彩色图。我确信 70% 的事情是......

回答 2 投票 0

PyTorch 复杂的矩阵向量乘法在 CPU 上速度很慢

我发现在 CPU 上进行复值矩阵向量乘法时 pyTorch 比 numpy 慢得多: 一些注意事项: 这对我来说在多个系统中都是如此 内存不是问题 复杂

回答 1 投票 0

pytorch s3fd pga_attack,loss.backward()获取grad.data的问题

来自检测.sfd 导入 sfd_Detector def pgd_attack(模型,input_data,eps=0.03,alpha=0.01,attack_steps=1,device='cpu'): ta = input_data.requires_grad_(True).to(设备) 扰动 = ...

回答 1 投票 0

(超参数)使用 Ray 进行调整会引发酸洗错误

我正在尝试使用 Ray 来调整我的 Pytorch 模型。当我尝试运行 tune.run 函数时,它会抛出错误 PicklingError: Can't pickle 我正在尝试使用 Ray 来调整我的 Pytorch 模型。当我尝试运行 tune.run 函数时,它会抛出错误 PicklingError: Can't pickle <cyfunction LocalFileSystem._reconstruct at 0x7f7d0c8ce4d0>: it's not the same object as pyarrow._fs.LocalFileSystem._reconstruct。调优的代码主要基于pytorchs教程here。我在教程中导入 from ray.air import Checkpoint 时遇到了问题,所以我将其切换为 from ray.train import Checkpoint 这是完整的错误消息: 2023-12-30 20:12:57,961 WARNING tune_controller.py:743 -- Trial controller checkpointing failed: Can't pickle <cyfunction LocalFileSystem._reconstruct at 0x7f7d0c8ce4d0>: it's not the same object as pyarrow._fs.LocalFileSystem._reconstruct --------------------------------------------------------------------------- TypeError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/ray/tune/utils/serialization.py:19, in TuneFunctionEncoder.default(self, obj) 18 try: ---> 19 return super(TuneFunctionEncoder, self).default(obj) 20 except Exception: File /usr/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o) 161 """Implement this method in a subclass such that it returns 162 a serializable object for ``o``, or calls the base implementation 163 (to raise a ``TypeError``). (...) 177 178 """ --> 179 raise TypeError(f'Object of type {o.__class__.__name__} ' 180 f'is not JSON serializable') TypeError: Object of type StorageContext is not JSON serializable During handling of the above exception, another exception occurred: PicklingError Traceback (most recent call last) Cell In[97], line 16 14 # ... 15 data_dir = os.path.abspath(os. getcwd()) ---> 16 result = tune.run( 17 train_ray, 18 resources_per_trial={"gpu": gpus_per_trial}, 19 num_samples=10, 20 scheduler=scheduler, 21 keep_checkpoints_num=10 22 ) 23 best_trial = result.get_best_trial("loss", "min", "last") 24 print(f"Best trial config: {best_trial.config}") File /usr/local/lib/python3.10/dist-packages/ray/tune/tune.py:1002, in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, storage_path, storage_filesystem, search_alg, scheduler, checkpoint_config, verbose, progress_reporter, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, resume, reuse_actors, raise_on_failed_trial, callbacks, max_concurrent_trials, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, chdir_to_trial_dir, local_dir, _remote, _remote_string_queue, _entrypoint) 991 pass 992 else: 993 logger.warning( 994 "Tune detects GPUs, but no trials are using GPUs. " 995 "To enable trials to use GPUs, wrap `train_func` with " 996 "`tune.with_resources(train_func, resources_per_trial={'gpu': 1})` " 997 "which allows Tune to expose 1 GPU to each trial. " 998 "For Ray AIR Trainers, you can specify GPU resources " 999 "through `ScalingConfig(use_gpu=True)`. " 1000 "You can also override " 1001 "`Trainable.default_resource_request` if using the " -> 1002 "Trainable API." 1003 ) 1005 experiment_interrupted_event = _setup_signal_catching() 1007 if progress_reporter and air_verbosity is not None: File /usr/local/lib/python3.10/dist-packages/ray/tune/execution/tune_controller.py:744, in TuneController.step(self) 742 except Exception as e: 743 logger.warning(f"Trial controller checkpointing failed: {str(e)}") --> 744 raise e 746 self._iteration += 1 748 with warn_if_slow("on_step_end"): File /usr/local/lib/python3.10/dist-packages/ray/tune/execution/tune_controller.py:741, in TuneController.step(self) 739 # Maybe save experiment state 740 try: --> 741 self.checkpoint() 742 except Exception as e: 743 logger.warning(f"Trial controller checkpointing failed: {str(e)}") File /usr/local/lib/python3.10/dist-packages/ray/tune/execution/tune_controller.py:478, in TuneController.checkpoint(self, force, wait) 452 """Saves execution state to the local experiment path. 453 454 Overwrites the current session checkpoint, which starts when self (...) 463 464 """ 465 with warn_if_slow( 466 "experiment_checkpoint", 467 message="Checkpointing the experiment state took " (...) 476 disable=self._checkpoint_manager.auto_checkpoint_enabled or force or wait, 477 ): --> 478 self._checkpoint_manager.checkpoint( 479 save_fn=self.save_to_dir, force=force, wait=wait 480 ) File /usr/local/lib/python3.10/dist-packages/ray/tune/execution/experiment_state.py:224, in _ExperimentCheckpointManager.checkpoint(self, save_fn, force, wait) 218 # NOTE: This context manager is for Datasets captured in a trial config. 219 # This is the case when *tuning over datasets*. 220 # If the datasets have already been full executed, then serializing 221 # block refs means that this checkpoint is not usable in a new Ray cluster. 222 # This context will serialize the dataset execution plan instead, if available. 223 with out_of_band_serialize_dataset(): --> 224 save_fn() 226 # Sync to cloud 227 self.sync_up(force=force, wait=wait) File /usr/local/lib/python3.10/dist-packages/ray/tune/execution/tune_controller.py:355, in TuneController.save_to_dir(self) 350 experiment_dir = self._storage.experiment_local_path 352 # Get state from trial executor and runner 353 runner_state = { 354 # Trials --> 355 "trial_data": list(self._get_trial_checkpoints().values()), 356 # Experiment data 357 "runner_data": self.__getstate__(), 358 # Metadata 359 "stats": { 360 "start_time": self._start_time, 361 "timestamp": self._last_checkpoint_time, 362 }, 363 } 365 tmp_file_name = os.path.join( 366 experiment_dir, f".tmp_experiment_state_{uuid.uuid4()}" 367 ) 369 with open(tmp_file_name, "w") as f: File /usr/local/lib/python3.10/dist-packages/ray/tune/execution/tune_controller.py:803, in TuneController._get_trial_checkpoints(self) 801 def _get_trial_checkpoints(self) -> Dict[str, str]: 802 for trial in self._trials_to_cache: --> 803 self._trial_metadata[trial.trial_id] = trial.get_json_state() 804 self._trials_to_cache.clear() 805 return self._trial_metadata File /usr/local/lib/python3.10/dist-packages/ray/tune/experiment/trial.py:964, in Trial.get_json_state(self) 962 state = self.__getstate__() 963 state.pop("run_metadata", None) --> 964 self._state_json = json.dumps(state, indent=2, cls=TuneFunctionEncoder) 966 runtime_metadata_json = self.run_metadata.get_json_state() 968 return self._state_json, runtime_metadata_json File /usr/lib/python3.10/json/__init__.py:238, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 232 if cls is None: 233 cls = JSONEncoder 234 return cls( 235 skipkeys=skipkeys, ensure_ascii=ensure_ascii, 236 check_circular=check_circular, allow_nan=allow_nan, indent=indent, 237 separators=separators, default=default, sort_keys=sort_keys, --> 238 **kw).encode(obj) File /usr/lib/python3.10/json/encoder.py:201, in JSONEncoder.encode(self, o) 199 chunks = self.iterencode(o, _one_shot=True) 200 if not isinstance(chunks, (list, tuple)): --> 201 chunks = list(chunks) 202 return ''.join(chunks) File /usr/lib/python3.10/json/encoder.py:431, in _make_iterencode.<locals>._iterencode(o, _current_indent_level) 429 yield from _iterencode_list(o, _current_indent_level) 430 elif isinstance(o, dict): --> 431 yield from _iterencode_dict(o, _current_indent_level) 432 else: 433 if markers is not None: File /usr/lib/python3.10/json/encoder.py:405, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level) 403 else: 404 chunks = _iterencode(value, _current_indent_level) --> 405 yield from chunks 406 if newline_indent is not None: 407 _current_indent_level -= 1 File /usr/lib/python3.10/json/encoder.py:438, in _make_iterencode.<locals>._iterencode(o, _current_indent_level) 436 raise ValueError("Circular reference detected") 437 markers[markerid] = o --> 438 o = _default(o) 439 yield from _iterencode(o, _current_indent_level) 440 if markers is not None: File /usr/local/lib/python3.10/dist-packages/ray/tune/utils/serialization.py:23, in TuneFunctionEncoder.default(self, obj) 21 if log_once(f"tune_func_encode:{str(obj)}"): 22 logger.debug("Unable to encode. Falling back to cloudpickle.") ---> 23 return self._to_cloudpickle(obj) File /usr/local/lib/python3.10/dist-packages/ray/tune/utils/serialization.py:28, in TuneFunctionEncoder._to_cloudpickle(self, obj) 25 def _to_cloudpickle(self, obj): 26 return { 27 "_type": "CLOUDPICKLE_FALLBACK", ---> 28 "value": binary_to_hex(cloudpickle.dumps(obj)), 29 } File /usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle_fast.py:88, in dumps(obj, protocol, buffer_callback) 86 with io.BytesIO() as file: 87 cp = CloudPickler(file, protocol=protocol, buffer_callback=buffer_callback) ---> 88 cp.dump(obj) 89 return file.getvalue() File /usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle_fast.py:733, in CloudPickler.dump(self, obj) 731 def dump(self, obj): 732 try: --> 733 return Pickler.dump(self, obj) 734 except RuntimeError as e: 735 if "recursion" in e.args[0]: PicklingError: Can't pickle <cyfunction LocalFileSystem._reconstruct at 0x7f7d0c8ce4d0>: it's not the same object as pyarrow._fs.LocalFileSystem._reconstruc 这是我的模型: class TransformerClassifier(nn.Module): def __init__(self, num_features=24, num_classes=3, heads=8): super().__init__() self.conv_backbone = nn.Sequential( nn.Conv1d(num_features, 128, 10, stride=8, bias=False), nn.BatchNorm1d(128), nn.LeakyReLU(), nn.Conv1d(128, 256, 10, stride=8, bias=False), nn.BatchNorm1d(256), nn.LeakyReLU(), ) self.transformer = nn.TransformerEncoder( nn.TransformerEncoderLayer(256, nhead=heads, dim_feedforward=1024, batch_first=True), num_layers=6) self.classifier = nn.Linear(256, num_classes) def forward(self, x): x = x.float() # convolutional feature extractor x = self.conv_backbone(x) # transformer encoder x = torch.transpose(x, 1, 2) x = self.transformer(x) # linear classifier x = x.mean(1) x = self.classifier(x) return x 这是我调整模型的代码: from functools import partial import os import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import random_split from ray import tune import ray.util import ray.air from ray.train import Checkpoint from ray.air import session from ray.tune.schedulers import ASHAScheduler def train_ray(config): use_weighting = True best_f1s = [] class_f1s = [] all_accs = [] batch_size = int(config["batch_size"]) tr_label = open('labelTrain.pickle', 'rb') train_label = pickle.load(tr_label) tr_pickle= open('train_d.pickle', "rb") train_d = pickle.load(tr_pickle) vl_label = open('labelVal.pickle', 'rb') val_label = pickle.load(vl_label) print(np.unique(val_label)) vl_pickle= open('val_d.pickle', "rb") val_d = pickle.load(vl_pickle) t_label = open('labelTest.pickle', 'rb') test_label = pickle.load(t_label) t_pickle= open('test_d.pickle', "rb") test_d = pickle.load(t_pickle) dataset_train =torch.utils.data.TensorDataset(train_d, train_label) dataset_test = torch.utils.data.TensorDataset(test_d, test_label) dataset_val = torch.utils.data.TensorDataset(val_d, val_label) train_dataloader = torch.utils.data.DataLoader(dataset_train, batch_size) val_dataloader = torch.utils.data.DataLoader(dataset_val, batch_size) test_dataloader = torch.utils.data.DataLoader(dataset_test, batch_size) train_labels = train_label # Calculate Weights for Loss weights = None if use_weighting: x = torch.reciprocal(torch.bincount(torch.tensor(train_labels.long()))).float()**0.25 x/=x.mean() weights = x # Create Model model = TransformerClassifier(num_features=24, num_classes=3, heads=config["heads"]) checkpoint = session.get_checkpoint() if checkpoint: checkpoint_state = checkpoint.to_dict() start_epoch = checkpoint_state["epoch"] model.load_state_dict(checkpoint_state["model_state_dict"]) optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"]) else: start_epoch = 0 epoch =int(config["epoch"]) lr=config["lr"] for epoch in range(start_epoch,10): train_f1, train_acc, f1, loss, optimizer = train_optuna(model, epoch, train_dataloader, val_dataloader,test_dataloader, lr_decay=0.98, lr=lr,device="cuda",weights=weights) checkpoint_data = { "epoch": epoch, "net_state_dict": model.state_dict(), "optimizer_state_dict": optimizer.state_dict(), } checkpoint = Checkpoint.from_dict(checkpoint_data) session.report( {"loss": loss, "accuracy": train_acc, "f1_score": f1}, checkpoint=checkpoint, ) print("Finished Training") config = { "heads":tune.choice([8,16]), "lr": tune.loguniform(1e-6, 1e-3), "batch_size": tune.choice([16, 32, 8]) } scheduler = ASHAScheduler( metric="loss", mode="min", max_t=20, grace_period=1, reduction_factor=2, ) gpus_per_trial = 1 data_dir = os.path.abspath(os. getcwd()) result = tune.run( train_ray, resources_per_trial={"gpu": gpus_per_trial}, num_samples=10, scheduler=scheduler, keep_checkpoints_num=8 ) 根据 Pytorch 教程,我应该使用 checkpoint_at_end=True。然而,这也给了我一个错误。我正在 Docker 容器中的 jupyter 笔记本中运行代码。 docker 容器连接到一个卷。 老实说我没有收到错误消息。我将不胜感激任何形式的帮助。 我认为关键问题在于错误消息上说 StorageContext 不是 JSON 可序列化?

回答 1 投票 0

CUDA、PyTorch 和 YOLO 集成问题

我安装了Python 3.11,然后用CUDA重新编译了OpenCV。我也在使用 CUDA 开发 YOLO,但现在当我使用 CUDA 运行 PyTorch 12.4 时,我遇到了 YOLO 错误: 无法加载系统...

回答 0 投票 0

有条件地更改 RGB 图像中的像素值,然后删除通道

我只是想比较图像的像素,如果该像素是粉红色的(R值= 0.502,G值= 0.0,B值= 0.502),则将其更改为黑色,否则将其更改为白色。之后...

回答 1 投票 0

如何在 PyTorchLightning 中手动指定检查点路径

目前我正在使用 TensorBoardLogger 来满足我的所有需求,它很完美,但我不喜欢它处理检查点命名的方式。我希望能够指定文件名和文件夹...

回答 2 投票 0

在 Pytorch 中通过字典映射张量元素

我有一个张量是: t = torch.tensor([1, 2, 3]) 和映射: 映射 = {1: 0.2, 2: 1.2, 3: 3.0} 我想通过映射来映射 t 中的元素,因此预期结果将是 火炬.张量([0.2,...

回答 1 投票 0

在 pytorch 中,如何并行化(在 GPU 上)重复执行的一组布尔函数?

我有一组独立的布尔函数,并且(假设)可以并行执行。我想重复调用这些相同的函数。请参阅下面的代码,其中的输出...

回答 1 投票 0

RTX 3080 出现分段错误(核心转储)

嗨~我刚拿到RTX 3080,遇到了很多版本问题。其中之一是弹出“分段错误(核心转储)”的问题和多处理问题。 下面是我的设备列表: 显卡:...

回答 1 投票 0

将布尔掩码与 PyTorch 张量中的切片相结合

我想用布尔掩码和普通索引来索引 pytorch 张量。像这样的东西: 我 = 2 j = 0 掩码 = torch.randn(480, 360, 3) > 0 张量 = torch.zeros(480, 360, 4, 80) 十...

回答 1 投票 0

© www.soinside.com 2019 - 2024. All rights reserved.