Exercise 4: Neural Networks Learning


一天两发,是因为这两次的练习内容都是神经网络,比较接近。

回顾第四周的作业,最后使用神经网络进行多分类预测的时候,Ng给出了训练好的Θ,本周的主要内容就是学习如何训练一个神经网络,最终得出Θ。

正向传播:Cost Function

参考公式:

K表示分类的数量,$ y^{(i)}$ 表示第 $i$ 个数据集的预测值,这个预测值的可能取值如下:

那么 $y_k^{(i)} = {0,1}$ ,正向传播的意思就是从左到右一层一层的计算,它的Cost Function就是每一层之间的逻辑回归Cost累加。

反向传播:误差计算和梯度

为求得 $ min_Θ J(Θ)$,使用梯度下降算法,则需要计算下面两个值:

  • $J(Θ)$
  • $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ)$

对于$J(Θ)$ ,用正向传播可以求得,而$ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ)$ 可以使用反向传播来求得,反向传播的计算流程如下。

$δ_j^{(l)}$ 表示 $Layer_l$ 的第 $j$ 个单元的误差,所以

  • $δ_j^{(4)}$ = $a_j^{(4)} - y_i$

而 $δ_j^{(3)}$ 、$δ_j^{(2)}$ 的推导过程比较复杂,这里直接给出计算公式。

并且 $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ) = a_j^{(l)}δ_i^{(l+1)} $ ,这个证明过程也很繁琐。

梯度检测

在计算 $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ) $ 时,代码上可能会出现一些bug,所以为了检测代码是否写对了,在测试时可以对其计算的梯度进行检测,原理是:

我们取 $\epsilon$ 是一个很小的值,就能近似的计算偏导数,检查两者的差值就能得到我们的偏导数算法是否写对了。

作业1:实现正向传播

由于分类数量是K,所以对于每一组训练数据,都需要计算Cost值

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 
% 
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);
         
% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================

X = [ones(m,1) X];
a1 = X;
a2 = sigmoid(X * Theta1');
a2 = [ones(size(a2, 1), 1) a2];
a3 = sigmoid(a2 * Theta2');

for i = 1:m
    yi = zeros(num_labels, 1);
    yi(y(i),1) = 1;
    a3i = a3(i,:)';
    J = J + sum(-yi .* log(a3i) - (1 - yi) .* log(1 - a3i));
end

J = 1/m * J;
rTheta1 = Theta1(:,2:size(Theta1,2));
rTheta2 = Theta2(:,2:size(Theta2,2));

J = J + lambda/(2*m) * (sum(sum(rTheta1 .^ 2)) + sum(sum(rTheta2 .^ 2)));

% -------------------------------------------------------------

% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

作业2:sigmoid的导数

其实就是对sigmoid函数求导

作业3:反向传播

其实和正向传播是写在同一个地方的,因为本质上实在计算偏导数。

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 
% 
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);
         
% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================
X = [ones(m,1) X];
a1 = X;
a2 = sigmoid(X * Theta1');
a2 = [ones(size(a2, 1), 1) a2];
a3 = sigmoid(a2 * Theta2');

bdelta_2 = 0;
bdelta_1 = 0;

for i = 1:m
    yi = zeros(num_labels, 1);
    yi(y(i),1) = 1;
    a3i = a3(i,:)';
    a2i = a2(i,:)';
    a1i = a1(i,:)';
    
    delta_3 = (a3i - yi);
    delta_2 = Theta2' * delta_3;
    delta_2 = delta_2(2:size(delta_2,1));
    delta_2 = delta_2 .* sigmoidGradient(Theta1 * a1i);
    
    bdelta_2 = bdelta_2 + delta_3*(a2i)';
    bdelta_1 = bdelta_1 + delta_2*(a1i)';
end


Theta1_grad = 1/m * bdelta_1 + [zeros(size(Theta1, 1),1) lambda/m*rTheta1];
Theta2_grad = 1/m * bdelta_2 + [zeros(size(Theta2, 1),1) lambda/m*rTheta2];


% -------------------------------------------------------------

% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

总结

感觉机器学习理解起来可能不难,但是写代码上却需要非常小心,尤其是对偏置单元的处理,做矩阵乘法时要检查size是否匹配,这些都很重要。


文章作者: jerrycheese
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 jerrycheese !
  目录