序
一天两发,是因为这两次的练习内容都是神经网络,比较接近。
回顾第四周的作业,最后使用神经网络进行多分类预测的时候,Ng给出了训练好的Θ,本周的主要内容就是学习如何训练一个神经网络,最终得出Θ。
正向传播:Cost Function
参考公式:
K表示分类的数量,$ y^{(i)}$ 表示第 $i$ 个数据集的预测值,这个预测值的可能取值如下:
那么 $y_k^{(i)} = {0,1}$ ,正向传播的意思就是从左到右一层一层的计算,它的Cost Function就是每一层之间的逻辑回归Cost累加。
反向传播:误差计算和梯度
为求得 $ min_Θ J(Θ)$,使用梯度下降算法,则需要计算下面两个值:
- $J(Θ)$
- $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ)$
对于$J(Θ)$ ,用正向传播可以求得,而$ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ)$ 可以使用反向传播来求得,反向传播的计算流程如下。
$δ_j^{(l)}$ 表示 $Layer_l$ 的第 $j$ 个单元的误差,所以
- $δ_j^{(4)}$ = $a_j^{(4)} - y_i$
而 $δ_j^{(3)}$ 、$δ_j^{(2)}$ 的推导过程比较复杂,这里直接给出计算公式。
并且 $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ) = a_j^{(l)}δ_i^{(l+1)} $ ,这个证明过程也很繁琐。
梯度检测
在计算 $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ) $ 时,代码上可能会出现一些bug,所以为了检测代码是否写对了,在测试时可以对其计算的梯度进行检测,原理是:
我们取 $\epsilon$ 是一个很小的值,就能近似的计算偏导数,检查两者的差值就能得到我们的偏导数算法是否写对了。
作业1:实现正向传播
由于分类数量是K,所以对于每一组训练数据,都需要计算Cost值
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
%
% The returned parameter grad should be a "unrolled" vector of the
% partial derivatives of the neural network.
%
% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
% ====================== YOUR CODE HERE ======================
X = [ones(m,1) X];
a1 = X;
a2 = sigmoid(X * Theta1');
a2 = [ones(size(a2, 1), 1) a2];
a3 = sigmoid(a2 * Theta2');
for i = 1:m
yi = zeros(num_labels, 1);
yi(y(i),1) = 1;
a3i = a3(i,:)';
J = J + sum(-yi .* log(a3i) - (1 - yi) .* log(1 - a3i));
end
J = 1/m * J;
rTheta1 = Theta1(:,2:size(Theta1,2));
rTheta2 = Theta2(:,2:size(Theta2,2));
J = J + lambda/(2*m) * (sum(sum(rTheta1 .^ 2)) + sum(sum(rTheta2 .^ 2)));
% -------------------------------------------------------------
% =========================================================================
% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
作业2:sigmoid的导数
其实就是对sigmoid函数求导
作业3:反向传播
其实和正向传播是写在同一个地方的,因为本质上实在计算偏导数。
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
%
% The returned parameter grad should be a "unrolled" vector of the
% partial derivatives of the neural network.
%
% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
% ====================== YOUR CODE HERE ======================
X = [ones(m,1) X];
a1 = X;
a2 = sigmoid(X * Theta1');
a2 = [ones(size(a2, 1), 1) a2];
a3 = sigmoid(a2 * Theta2');
bdelta_2 = 0;
bdelta_1 = 0;
for i = 1:m
yi = zeros(num_labels, 1);
yi(y(i),1) = 1;
a3i = a3(i,:)';
a2i = a2(i,:)';
a1i = a1(i,:)';
delta_3 = (a3i - yi);
delta_2 = Theta2' * delta_3;
delta_2 = delta_2(2:size(delta_2,1));
delta_2 = delta_2 .* sigmoidGradient(Theta1 * a1i);
bdelta_2 = bdelta_2 + delta_3*(a2i)';
bdelta_1 = bdelta_1 + delta_2*(a1i)';
end
Theta1_grad = 1/m * bdelta_1 + [zeros(size(Theta1, 1),1) lambda/m*rTheta1];
Theta2_grad = 1/m * bdelta_2 + [zeros(size(Theta2, 1),1) lambda/m*rTheta2];
% -------------------------------------------------------------
% =========================================================================
% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
总结
感觉机器学习理解起来可能不难,但是写代码上却需要非常小心,尤其是对偏置单元的处理,做矩阵乘法时要检查size是否匹配,这些都很重要。