Exercise 4: Neural Networks Learning

机器学习

研

发布日期: 2017-11-24

序

一天两发，是因为这两次的练习内容都是神经网络，比较接近。

回顾第四周的作业，最后使用神经网络进行多分类预测的时候，Ng给出了训练好的Θ，本周的主要内容就是学习如何训练一个神经网络，最终得出Θ。

正向传播：Cost Function

参考公式：

K表示分类的数量，$ y^{(i)}$ 表示第 $i$ 个数据集的预测值，这个预测值的可能取值如下：

那么 $y_k^{(i)} = {0,1}$ ，正向传播的意思就是从左到右一层一层的计算，它的Cost Function就是每一层之间的逻辑回归Cost累加。

反向传播：误差计算和梯度

为求得 $ min_Θ J(Θ)$，使用梯度下降算法，则需要计算下面两个值：

$J(Θ)$
$ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ)$

对于$J(Θ)$ ，用正向传播可以求得，而$ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ)$ 可以使用反向传播来求得，反向传播的计算流程如下。

$δ_j^{(l)}$ 表示 $Layer_l$ 的第 $j$ 个单元的误差，所以

$δ_j^{(4)}$ = $a_j^{(4)} - y_i$

而 $δ_j^{(3)}$ 、$δ_j^{(2)}$ 的推导过程比较复杂，这里直接给出计算公式。

并且 $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ) = a_j^{(l)}δ_i^{(l+1)} $ ，这个证明过程也很繁琐。

梯度检测

在计算 $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ) $ 时，代码上可能会出现一些bug，所以为了检测代码是否写对了，在测试时可以对其计算的梯度进行检测，原理是：

我们取 $\epsilon$ 是一个很小的值，就能近似的计算偏导数，检查两者的差值就能得到我们的偏导数算法是否写对了。

作业1：实现正向传播

由于分类数量是K，所以对于每一组训练数据，都需要计算Cost值

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 
% 
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);
         
% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================

X = [ones(m,1) X];
a1 = X;
a2 = sigmoid(X * Theta1');
a2 = [ones(size(a2, 1), 1) a2];
a3 = sigmoid(a2 * Theta2');

for i = 1:m
    yi = zeros(num_labels, 1);
    yi(y(i),1) = 1;
    a3i = a3(i,:)';
    J = J + sum(-yi .* log(a3i) - (1 - yi) .* log(1 - a3i));
end

J = 1/m * J;
rTheta1 = Theta1(:,2:size(Theta1,2));
rTheta2 = Theta2(:,2:size(Theta2,2));

J = J + lambda/(2*m) * (sum(sum(rTheta1 .^ 2)) + sum(sum(rTheta2 .^ 2)));

% -------------------------------------------------------------

% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

作业2：sigmoid的导数

其实就是对sigmoid函数求导

作业3：反向传播

其实和正向传播是写在同一个地方的，因为本质上实在计算偏导数。

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 
% 
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);
         
% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================
X = [ones(m,1) X];
a1 = X;
a2 = sigmoid(X * Theta1');
a2 = [ones(size(a2, 1), 1) a2];
a3 = sigmoid(a2 * Theta2');

bdelta_2 = 0;
bdelta_1 = 0;

for i = 1:m
    yi = zeros(num_labels, 1);
    yi(y(i),1) = 1;
    a3i = a3(i,:)';
    a2i = a2(i,:)';
    a1i = a1(i,:)';
    
    delta_3 = (a3i - yi);
    delta_2 = Theta2' * delta_3;
    delta_2 = delta_2(2:size(delta_2,1));
    delta_2 = delta_2 .* sigmoidGradient(Theta1 * a1i);
    
    bdelta_2 = bdelta_2 + delta_3*(a2i)';
    bdelta_1 = bdelta_1 + delta_2*(a1i)';
end


Theta1_grad = 1/m * bdelta_1 + [zeros(size(Theta1, 1),1) lambda/m*rTheta1];
Theta2_grad = 1/m * bdelta_2 + [zeros(size(Theta2, 1),1) lambda/m*rTheta2];


% -------------------------------------------------------------

% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

总结

感觉机器学习理解起来可能不难，但是写代码上却需要非常小心，尤其是对偏置单元的处理，做矩阵乘法时要检查size是否匹配，这些都很重要。

jerrycheese

https://hijerry.cn/p/8039.html

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 jerrycheese !

机器学习

上一篇

Exercise 5: Regularized Linear Regression and Bias v.s. Variance

Exercise 5: Regularized Linear Regression and Bias v.s. Variance

2017-11-25 研

机器学习

下一篇

Exercise 3:Multi-class Classification and Neural Networks

Exercise 3:Multi-class Classification and Neural Networks

2017-11-24 研

机器学习