神经网络基本概念
前向传播
前向传播是指输入数据通过网络各层进行计算,直到输出层获得预测结果。其数学表示为:
其中, 是权重矩阵, 是输入向量, 是偏置向量。
激活函数
激活函数引入非线性,使神经网络能够拟合复杂的函数。常见的激活函数有 ReLU、Sigmoid 和 Softmax。在本例中,我们使用 Softmax 作为输出层的激活函数:
损失函数
损失函数用于衡量模型预测结果与真实结果的差异。在分类问题中,常用的损失函数是交叉熵损失:
其中, 是真实标签, 是模型预测的概率。
梯度下降
梯度下降是一种优化算法,通过最小化损失函数来更新模型参数。其更新公式为:
其中, 是学习率。
反向传播
反向传播用于计算梯度,是神经网络训练的核心。通过链式法则,将损失函数对输出的梯度反向传递,计算各层参数的梯度。
数学公式
-
输出层: 计算损失对输出的梯度:
计算损失对参数的梯度:
-
隐藏层: 计算损失对激活值的梯度(使用链式法则):
其中, 是 ReLU 函数的导数:
计算损失对参数的梯度:
代码实现
数据加载
首先,定义加载数据的函数:
import os
import gzip
import numpy as np
import matplotlib.pyplot as plt
def get_data(inputs_file_path, labels_file_path, num_examples):
with gzip.open(inputs_file_path, 'rb') as inputfile:
input_data = np.frombuffer(inputfile.read(), dtype=np.uint8, offset=16)
input_data = input_data.reshape(num_examples, 28*28)
input_data = input_data / 255.0
print(f'input_data shape: {input_data.shape}')
with gzip.open(labels_file_path, 'rb') as labelfile:
label_data = np.frombuffer(labelfile.read(), dtype=np.uint8, offset=8)
print(f'label_data shape: {label_data.shape}')
return input_data, label_data
mnist_data_folder = './MNIST_data'
train_inputs, train_labels = get_data(
os.path.join(mnist_data_folder, 'train-images-idx3-ubyte.gz'),
os.path.join(mnist_data_folder, 'train-labels-idx1-ubyte.gz'),
60000
)
test_inputs, test_labels = get_data(
os.path.join(mnist_data_folder, 't10k-images-idx3-ubyte.gz'),
os.path.join(mnist_data_folder, 't10k-labels-idx1-ubyte.gz'),
10000
)
参数初始化
初始化权重和偏置:
input_size = 784
num_classes = 10
learning_rate = 0.05
W = np.random.rand(num_classes, input_size) * 0.01
b = np.zeros((num_classes, 1))
前向传播
定义前向传播函数:
def forward_pass(inputs, W, b):
return np.dot(inputs, W.T) + b.T
损失函数和梯度计算
定义计算损失和梯度的函数:
def compute_loss_and_gradients(inputs, outputs, labels, W, b):
num_samples = inputs.shape[0]
y_true = np.eye(num_classes)[labels]
exp_z = np.exp(outputs - np.max(outputs, axis=1, keepdims=True))
probs = exp_z / np.sum(exp_z, axis=1, keepdims=True)
probs = np.clip(probs, 1e-10, 1.0)
loss = -np.sum(y_true * np.log(probs)) / num_samples
dL_dz = probs - y_true
gradW = np.dot(dL_dz.T, inputs) / num_samples
gradB = np.sum(dL_dz.T, axis=1, keepdims=True) / num_samples
return loss, gradW, gradB
参数更新
定义参数更新函数:
def update_parameters(W, b, gradW, gradB, learning_rate):
W -= learning_rate * gradW
b -= learning_rate * gradB
return W, b
训练模型
定义训练模型的函数:
def train_model(train_inputs, train_labels, W, b, learning_rate, batch_size, num_epochs):
num_samples = train_inputs.shape[0]
for epoch in range(num_epochs):
for start in range(0, num_samples, batch_size):
end = start + batch_size
inputs = train_inputs[start:end]
labels = train_labels[start:end]
outputs = forward_pass(inputs, W, b)
loss, gradW, gradB = compute_loss_and_gradients(inputs, outputs, labels, W, b)
W, b = update_parameters(W, b, gradW, gradB, learning_rate)
print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss}')
return W, b
num_epochs = 10
batch_size = 100
W, b = train_model(train_inputs, train_labels, W, b, learning_rate, batch_size, num_epochs)
测试模型
定义测试模型的函数:
def test_model(test_inputs, test_labels, W, b):
outputs = forward_pass(test_inputs, W, b)
predictions = np.argmax(outputs, axis=1)
accuracy = np.mean(predictions == test_labels)
return accuracy
accuracy = test_model(test_inputs, test_labels, W, b)
print(f'Test accuracy: {accuracy}')
可视化结果
定义可视化预测结果的函数:
def visualize_predictions(test_inputs, test_labels, W, b, num_samples=10):
indices = np.random.choice(test_inputs.shape[0], num_samples, replace=False)
sample_inputs = test_inputs[indices]
sample_labels = test_labels[indices]
outputs = forward_pass(sample_inputs, W, b)
predictions = np.argmax(outputs, axis=1)
fig, axes = plt.subplots(1, num_samples, figsize=(15, 3))
for i in range(num_samples):
axes[i].imshow(sample_inputs[i].reshape(28, 28), cmap='gray')
axes[i].set_title(f'True: {sample_labels[i]}\nPred: {predictions[i]}')
axes[i].axis('off')
plt.show()
visualize_predictions(test_inputs, test_labels, W, b)