Pytorch + Ray Tune 报告 ImplicitFunc 太大,不知道哪个引用大

问题描述 投票:0回答:1

这个问题类似,Ray Tune正在向我汇报:

ValueError:actor ImplicitFunc 太大(421 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB)。检查其定义是否未隐式捕获范围内的大型数组或其他对象。提示:使用 ray.put() 将大对象放入 Ray 对象存储中。

我不知道我的范围内捕获了什么。无论我做出什么改变,它似乎都会报告这一点。我尝试从函数中取出十几个不同的引用并将它们放入 Ray 的内部存储(ray.get() 和 ray.put())中,但它几乎没有任何作用。取出模型定义、训练/测试数据和折叠函数仍然有 421 MiB。哪个参考 >400 MiB?

型号定义:

INPUT_DIM = tch_train.features.shape[1] - 1 #Removing an input feature because the sample weight is included with the input data
OUTPUT_DIM = tch_train.labels.shape[1]

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(INPUT_DIM, OUTPUT_DIM)

    def forward(self, input):
        
        output = F.softmax(F.relu(self.fc1(input)), dim=1)
        
        return output

主要功能:

K_FOLDS = 5
loss_function = nn.CrossEntropyLoss(reduction='none')
kfold = KFold(n_splits=K_FOLDS, shuffle=True)

fold_indices = [(train_ids, test_ids) for train_ids, test_ids in kfold.split(tch_train)]
fold_indices_ref = ray.put(fold_indices)

tch_train_ref = ray.put(tch_train)

# This function is the "Main stuff" of the machine learning.
# This will be called by RayTune and will be expected to train a machine learning model and report the results.
def objective(config):
    optimizer = torch.optim.SGD(  # Tune the optimizer
        model.parameters(), lr=config["lr"], momentum=config["momentum"]
    )
    
    # Make a model for each fold.
    fold_models = []
    for fold in range(K_FOLDS):
        fold_models.append(Net().to("cuda"))

    # Epoch loop
    while True:
        fold_losses = []

        for fold in range(K_FOLDS):
            train_ids, test_ids = ray.get(fold_indices_ref)[fold]
            
            # Take Epoch sample from the 4/1 train/test fold chunks.
            train_subsampler = torch.utils.data.SubsetRandomSampler(train_ids)
            test_subsampler = torch.utils.data.SubsetRandomSampler(test_ids)
            trainloader = torch.utils.data.DataLoader(ray.get(tch_train_ref), batch_size=config["batch_size"], sampler=train_subsampler)
            testloader = torch.utils.data.DataLoader(ray.get(tch_train_ref), batch_size=config["batch_size"], sampler=test_subsampler)
            
            # Iterate over the DataLoader for training data
            for i, data in enumerate(trainloader, 0):
                # Get inputs
                features, targets = data
                inputs = features[:,1:]
                sample_weights = features[:,0]
                
                # Zero the gradients
                optimizer.zero_grad()
                
                # Perform forward pass
                outputs = fold_models[fold](inputs)
                
                # Compute loss
                loss = loss_function(outputs, targets) * sample_weights
                
                # Perform backward pass
                loss.mean().backward()
                
                # Perform optimization
                optimizer.step()

            # Test on test fold
            fold_losses[fold] = 0.0
            with torch.no_grad():
                # Iterate over the test data and generate predictions
                for i, data in enumerate(testloader, 0):
                    # Get inputs
                    features, targets = data
                    inputs = features[:,1:]
                    sample_weights = features[:,0]
                    
                    # Generate outputs
                    outputs = net(inputs)

                    #Add test loss
                    fold_losses[fold] += (loss_function(outputs, targets) * sample_weights).sum() 
                
        # Report average fold losses
        train.report({"averaged_CEL": sum(fold_losses) / float(K_FOLDS)})  # Report to Tune

调整配置:

search_space = {"lr": ray.tune.loguniform(1e-4, 1e-2), "momentum": ray.tune.uniform(0.1, 0.9)}
algo = OptunaSearch() 

tuner = ray.tune.Tuner(
    objective,
    tune_config=ray.tune.TuneConfig(
        metric="averaged_CEL",
        mode="min",
        search_alg=algo,
    ),
    run_config=ray.train.RunConfig(
        stop={"training_iteration": 5},
    ),
    param_space=search_space,
)
results = tuner.fit()
print("Best config is:", results.get_best_result().config)
python machine-learning pytorch cross-validation ray
1个回答
0
投票

我明白了。

台词:

optimizer = torch.optim.SGD(  # Tune the optimizer
    model.parameters(), lr=config["lr"], momentum=config["momentum"]
)

我们在编译时没有失败,因为

model
恰好是一个不同的有效变量,在文件的更上方有一个
.parameters()
函数。
model
没有指我在这种情况下训练的模型。

© www.soinside.com 2019 - 2024. All rights reserved.