Logistic 回归的随机梯度下降总是返回 Inf 的成本,而权重向量永远不会变得更接近

问题描述 投票:0回答:1

我正在尝试在 MATLAB 中实现逻辑回归求解器,并通过随机梯度下降找到权重。我遇到了一个问题,我的数据似乎产生了无限的成本,而且无论发生什么,它都永远不会下降......

这是我的梯度下降函数:

function weightVector = logisticWeightsByGradientDescentStochastic(trueClass,features)
    %% This function attemps to converge on the best set of weights for a logistic regression order 1
    %% Input:
    % trueClass - the training data's vector of true class values
    % features
    %% Output:
    % weightVector - vector of size n+1 (n is number of features)
    % corresponding to convergent weights
    
    %% Get Data Size
    dataSize = size(features);
    
    %% Initial pick for weightVector
    weightVector = zeros(dataSize(2)+1, 1) %create a zero vector equal to size of number of features plus 1
    
    %% Choose learning Rate
    learningRate = 0.0001;
    
    %% Initial Cost
    cost = logisticCost(weightVector, features, trueClass)
    
    
    %% Stochastic Gradient Descent
    costThresh = 0.05 %define cost threshold
    
    iterCount = 0;
    while(cost > costThresh)
        for m=1:dataSize(1) %for all samples
            
            %% test Statement
            curFeatures = transpose([1.0 features(m,:)])
            
            %% calculate Sigmoid predicted 
            predictedClass = evaluateSigmoid(weightVector , [1.0 features(m,:)] )

            %% test Statement
            truth = trueClass(m)
                        
            %% Calculate gradient for all features
            gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])

            %% Update weight vector by subtrating gradient from the old one weight vector
            weightVector = weightVector - gradient 
            
            %% Re-evaluate Cost with new weight vector
            cost = logisticCost(weightVector, features, trueClass)
            
            if(cost < costThresh)
                break
            end
            iterCount = iterCount + 1
            
        end %for m
    end %while cost > 0.05
    
    weightVector
    iterCount
end

这是我的成本函数:

function cost = logisticCost(weightVector, features, trueClass)
    %% Calculates the total cost of applying weightVector to all samples
    %% for a linear regression model according to
    %% J(theta) = -(1/m) sum[ trueClass(log(predictedClass) + (1-trueClass)log(predictedClass)]
    %% Input:
    % weightVector - vector of n+1 weights where n is number of features
    % plus 1
    % features - matrix of features
    % trueClass - the training data's true class
    %% Output:
    % cost - the total cost
   
    dataSize = size(features); %get size of data
    
    errorSum = 0.0; %stores sum of errors
    for m = 1:dataSize(1) %for each row
        predictedClass = evaluateSigmoid(weightVector, [1.0 features(m,:)]); %evaluate the sigmoid to predict a class for sample m
        if trueClass(m) == 1
            errorSum = errorSum + log(predictedClass);
        else
            errorSum = errorSum + log(1 - predictedClass);
        end
    end
        
    cost = errorSum / (-1 .* dataSize(1)); %multiply by -(1/m) to get cost
end

这两个看起来都很好,我无法想象为什么我的成本函数总是返回无限。

这是我的训练数据,其中第一列是类别(1 或 0),接下来的七列是我要回归的特征。

matlab machine-learning regression logistic-regression
1个回答
3
投票

你的梯度符号错误:

gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)]) 

应该是:

gradient = learningRate .* (predictedClass - trueClass(m)) .* transpose([1.0 features(m,:)])

详情请参阅吴恩达的[注释]。 [注]:http://cs229.stanford.edu/notes/cs229-notes1.pdf

关于第j个参数的梯度如下:(其中

h(x)
是逻辑函数;
y
是真实标签;
x
是特征向量。) enter image description here

否则,当你取梯度的负值时,你就是在进行梯度上升。我相信这就是为什么你最终会得到无限的成本,因为它是死循环,你永远无法摆脱它。

更新规则应该还是:

weightVector = weightVector - gradient 
© www.soinside.com 2019 - 2024. All rights reserved.