我正在尝试在 MATLAB 中实现逻辑回归求解器,并通过随机梯度下降找到权重。我遇到了一个问题,我的数据似乎产生了无限的成本,而且无论发生什么,它都永远不会下降......
这是我的梯度下降函数:
function weightVector = logisticWeightsByGradientDescentStochastic(trueClass,features)
%% This function attemps to converge on the best set of weights for a logistic regression order 1
%% Input:
% trueClass - the training data's vector of true class values
% features
%% Output:
% weightVector - vector of size n+1 (n is number of features)
% corresponding to convergent weights
%% Get Data Size
dataSize = size(features);
%% Initial pick for weightVector
weightVector = zeros(dataSize(2)+1, 1) %create a zero vector equal to size of number of features plus 1
%% Choose learning Rate
learningRate = 0.0001;
%% Initial Cost
cost = logisticCost(weightVector, features, trueClass)
%% Stochastic Gradient Descent
costThresh = 0.05 %define cost threshold
iterCount = 0;
while(cost > costThresh)
for m=1:dataSize(1) %for all samples
%% test Statement
curFeatures = transpose([1.0 features(m,:)])
%% calculate Sigmoid predicted
predictedClass = evaluateSigmoid(weightVector , [1.0 features(m,:)] )
%% test Statement
truth = trueClass(m)
%% Calculate gradient for all features
gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])
%% Update weight vector by subtrating gradient from the old one weight vector
weightVector = weightVector - gradient
%% Re-evaluate Cost with new weight vector
cost = logisticCost(weightVector, features, trueClass)
if(cost < costThresh)
break
end
iterCount = iterCount + 1
end %for m
end %while cost > 0.05
weightVector
iterCount
end
这是我的成本函数:
function cost = logisticCost(weightVector, features, trueClass)
%% Calculates the total cost of applying weightVector to all samples
%% for a linear regression model according to
%% J(theta) = -(1/m) sum[ trueClass(log(predictedClass) + (1-trueClass)log(predictedClass)]
%% Input:
% weightVector - vector of n+1 weights where n is number of features
% plus 1
% features - matrix of features
% trueClass - the training data's true class
%% Output:
% cost - the total cost
dataSize = size(features); %get size of data
errorSum = 0.0; %stores sum of errors
for m = 1:dataSize(1) %for each row
predictedClass = evaluateSigmoid(weightVector, [1.0 features(m,:)]); %evaluate the sigmoid to predict a class for sample m
if trueClass(m) == 1
errorSum = errorSum + log(predictedClass);
else
errorSum = errorSum + log(1 - predictedClass);
end
end
cost = errorSum / (-1 .* dataSize(1)); %multiply by -(1/m) to get cost
end
这两个看起来都很好,我无法想象为什么我的成本函数总是返回无限。
这是我的训练数据,其中第一列是类别(1 或 0),接下来的七列是我要回归的特征。
你的梯度符号错误:
gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])
应该是:
gradient = learningRate .* (predictedClass - trueClass(m)) .* transpose([1.0 features(m,:)])
详情请参阅吴恩达的[注释]。 [注]:http://cs229.stanford.edu/notes/cs229-notes1.pdf
关于第j个参数的梯度如下:(其中
h(x)
是逻辑函数;y
是真实标签;x
是特征向量。)
否则,当你取梯度的负值时,你就是在进行梯度上升。我相信这就是为什么你最终会得到无限的成本,因为它是死循环,你永远无法摆脱它。
更新规则应该还是:
weightVector = weightVector - gradient