Neural networks are the foundation of modern deep learning and artificial intelligence. They are designed to mimic the human brain’s ability to recognize patterns and make decisions. Neural networks power applications like image recognition, natural language processing, and autonomous systems. While libraries like TensorFlow and PyTorch simplify the process, building a neural network from scratch is an invaluable learning experience. It gives you a deeper understanding of how these models work internally, which can be essential when debugging or developing custom solutions.
This guide walks you through building a simple neural network using Python, NumPy, and basic mathematical operations. Each step is covered in detail, from initializing weights to backpropagation and training the model.
A neural network is a computational model inspired by the human brain. It consists of layers of neurons, each taking inputs, processing them, and passing the output to the next layer.
Term | Description |
---|---|
Neuron | A unit that receives input, applies a function, and outputs a value. |
Layer | A collection of neurons. Input layer, hidden layers, output layer. |
Weights | Parameters that define the importance of inputs. |
Activation Function | Introduces non-linearity (e.g., Sigmoid, ReLU). |
Loss Function | Measures the difference between predictions and true values. |
Backpropagation | Algorithm to update weights based on error gradients. |
Neural networks learn by adjusting their weights based on the errors they make during training. This process is called backpropagation. The goal is to minimize the error (loss) by tuning the weights.
A neural network is built upon several core components:
Ensure you have Python and NumPy installed. NumPy is a fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices.
We are setting up the development environment by installing and importing the necessary libraries. NumPy is essential for performing numerical computations required for matrix operations in the neural network.
pip install numpy
import numpy as np
Setting a random seed ensures reproducibility when generating random numbers.
np.random.seed(42)
We will create a simple 2-layer neural network with:
We initialize the network’s weights and biases, as these parameters will be adjusted during training to minimize loss. Initializing weights randomly helps break symmetry, ensuring neurons learn different features.
def initialize_network(input_size, hidden_size, output_size):
W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))
return W1, b1, W2, b2
Forward propagation is the process of calculating outputs from the input through the network.
We are computing the weighted sum of inputs, adding biases, and applying an activation function to introduce non-linearity. This is done layer by layer until the final output is produced.
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def forward_propagation(X, W1, b1, W2, b2):
Z1 = np.dot(X, W1) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = sigmoid(Z2)
return Z1, A1, Z2, A2
Loss quantifies how well the network’s predictions match the true labels.
We use Binary Cross-Entropy Loss to measure the error in prediction for binary classification. Minimizing this loss improves the network’s performance.
def compute_loss(y_true, y_pred):
m = y_true.shape[0]
loss = -np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) / m
return loss
Backpropagation calculates gradients to adjust weights and minimize loss.
We compute the gradients of the loss function with respect to the network’s weights. These gradients indicate how much each weight contributes to the error, allowing us to adjust the weights in a way that reduces the loss.
def backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2):
m = X.shape[0]
dZ2 = A2 - y
dW2 = np.dot(A1.T, dZ2) / m
db2 = np.sum(dZ2, axis=0, keepdims=True) / m
dA1 = np.dot(dZ2, W2.T)
dZ1 = dA1 * (A1 * (1 - A1)) # Derivative of sigmoid
dW1 = np.dot(X.T, dZ1) / m
db1 = np.sum(dZ1, axis=0, keepdims=True) / m
return dW1, db1, dW2, db2
We update the weights and biases using the gradients computed during backpropagation. We subtract the gradient scaled by a learning rate from the current weights to move towards minimizing the loss.
def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
W2 -= learning_rate * dW2
b2 -= learning_rate * db2
return W1, b1, W2, b2
We bring everything together. We run forward propagation, calculate loss, perform backpropagation, and update weights iteratively for a defined number of epochs. This trains the network to make better predictions over time.
def train(X, y, input_size, hidden_size, output_size, epochs, learning_rate):
W1, b1, W2, b2 = initialize_network(input_size, hidden_size, output_size)
for i in range(epochs):
Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
loss = compute_loss(y, A2)
dW1, db1, dW2, db2 = backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2)
W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)
if i % 100 == 0:
print(f"Epoch {i}, Loss: {loss}")
return W1, b1, W2, b2
This section demonstrates how all the functions work together to create and train a simple neural network to solve the XOR problem.
import numpy as np
np.random.seed(42)
def initialize_network(input_size, hidden_size, output_size):
W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))
return W1, b1, W2, b2
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def forward_propagation(X, W1, b1, W2, b2):
Z1 = np.dot(X, W1) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = sigmoid(Z2)
return Z1, A1, Z2, A2
def compute_loss(y_true, y_pred):
m = y_true.shape[0]
loss = -np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) / m
return loss
def backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2):
m = X.shape[0]
dZ2 = A2 - y
dW2 = np.dot(A1.T, dZ2) / m
db2 = np.sum(dZ2, axis=0, keepdims=True) / m
dA1 = np.dot(dZ2, W2.T)
dZ1 = dA1 * (A1 * (1 - A1)) # Derivative of sigmoid
dW1 = np.dot(X.T, dZ1) / m
db1 = np.sum(dZ1, axis=0, keepdims=True) / m
return dW1, db1, dW2, db2
def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
W2 -= learning_rate * dW2
b2 -= learning_rate * db2
return W1, b1, W2, b2
def train(X, y, input_size, hidden_size, output_size, epochs, learning_rate):
W1, b1, W2, b2 = initialize_network(input_size, hidden_size, output_size)
for i in range(epochs):
Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
loss = compute_loss(y, A2)
dW1, db1, dW2, db2 = backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2)
W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)
if i % 1000 == 0:
print(f"Epoch {i}, Loss: {loss}")
return W1, b1, W2, b2
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
W1, b1, W2, b2 = train(X, y, input_size=2, hidden_size=4, output_size=1, epochs=10000, learning_rate=0.1)
After training, we test the network on the input data to see if it has learned to approximate the XOR function. We expect the outputs to be close to [0, 1, 1, 0].
_, _, _, A2 = forward_propagation(X, W1, b1, W2, b2)
print("Predictions:", A2)
Predictions: [[0.01]
[0.98]
[0.98]
[0.02]]
These values are close to the expected outputs [0, 1, 1, 0].
Building a neural network from scratch is an essential exercise for understanding the fundamentals of deep learning. This guide covered the initialization, forward propagation, loss calculation, backpropagation, and weight updates necessary to train a simple neural network. Mastering these basics will provide a strong foundation for using more advanced libraries like TensorFlow or PyTorch.
Further Reading:
For more practical implementations:
Multimodal models integrate text, images, audio, and video into unified AI systems. Learn how they…
Explore how Large Language Models (LLMs) reason step-by-step using CoT, RAG, tools, and more to…
A detailed comparison of CPUs, GPUs, and TPUs, covering their architecture, performance, and real-world applications,…
Learn TensorFlow from scratch with this beginner-friendly guide. Build, train, and evaluate a neural network…
Transformers power AI models like GPT-4 and BERT, enabling machines to understand and generate human-like…
Discover how to create, fine-tune, and deploy powerful LLMs with customized Retrieval-Augmented Generation (RAG) using…
This website uses cookies.