The Art of Building a Neural Network from Scratch – A Practical Guide

Introduction

Neural networks are the foundation of modern deep learning and artificial intelligence. They are designed to mimic the human brain’s ability to recognize patterns and make decisions. Neural networks power applications like image recognition, natural language processing, and autonomous systems. While libraries like TensorFlow and PyTorch simplify the process, building a neural network from scratch is an invaluable learning experience. It gives you a deeper understanding of how these models work internally, which can be essential when debugging or developing custom solutions.

This guide walks you through building a simple neural network using Python, NumPy, and basic mathematical operations. Each step is covered in detail, from initializing weights to backpropagation and training the model.

Understanding Neural Networks

A neural network is a computational model inspired by the human brain. It consists of layers of neurons, each taking inputs, processing them, and passing the output to the next layer.

Key Concepts:

TermDescription
NeuronA unit that receives input, applies a function, and outputs a value.
LayerA collection of neurons. Input layer, hidden layers, output layer.
WeightsParameters that define the importance of inputs.
Activation FunctionIntroduces non-linearity (e.g., Sigmoid, ReLU).
Loss FunctionMeasures the difference between predictions and true values.
BackpropagationAlgorithm to update weights based on error gradients.

Neural networks learn by adjusting their weights based on the errors they make during training. This process is called backpropagation. The goal is to minimize the error (loss) by tuning the weights.

Building Blocks of a Neural Network

A neural network is built upon several core components:

  1. Inputs: The data provided to the network (e.g., images, numerical data, text).
  2. Weights: Parameters that transform the inputs into outputs. These are adjusted during training.
  3. Biases: Additional parameters that allow the network to shift the output independently of inputs.
  4. Activation Function: Introduces non-linearity to allow the network to learn complex patterns (e.g., Sigmoid, ReLU).
  5. Loss Function: Measures how far the predictions are from the actual values. Common loss functions include Mean Squared Error (MSE) and Cross-Entropy Loss.
  6. Optimizer: Updates the weights to reduce the loss. The most common optimizer is Gradient Descent.

Step 1: Preparing the Environment

Ensure you have Python and NumPy installed. NumPy is a fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices.

What We Are Doing and Why

We are setting up the development environment by installing and importing the necessary libraries. NumPy is essential for performing numerical computations required for matrix operations in the neural network.

Installation:

pip install numpy

Import Libraries:

import numpy as np

Setting a random seed ensures reproducibility when generating random numbers.

np.random.seed(42)

Step 2: Initializing the Neural Network

We will create a simple 2-layer neural network with:

  • Input Layer: 2 neurons
  • Hidden Layer: 4 neurons
  • Output Layer: 1 neuron

What We Are Doing and Why

We initialize the network’s weights and biases, as these parameters will be adjusted during training to minimize loss. Initializing weights randomly helps break symmetry, ensuring neurons learn different features.

def initialize_network(input_size, hidden_size, output_size):
    W1 = np.random.randn(input_size, hidden_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size)
    b2 = np.zeros((1, output_size))
    return W1, b1, W2, b2

Step 3: Forward Propagation

Forward propagation is the process of calculating outputs from the input through the network.

What We Are Doing and Why

We are computing the weighted sum of inputs, adding biases, and applying an activation function to introduce non-linearity. This is done layer by layer until the final output is produced.

Sigmoid Activation Function:

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Forward Propagation:

def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(X, W1) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

Step 4: Calculating Loss

Loss quantifies how well the network’s predictions match the true labels.

What We Are Doing and Why

We use Binary Cross-Entropy Loss to measure the error in prediction for binary classification. Minimizing this loss improves the network’s performance.

Binary Cross-Entropy Loss:

def compute_loss(y_true, y_pred):
    m = y_true.shape[0]
    loss = -np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) / m
    return loss

Step 5: Backpropagation

Backpropagation calculates gradients to adjust weights and minimize loss.

What We Are Doing and Why

We compute the gradients of the loss function with respect to the network’s weights. These gradients indicate how much each weight contributes to the error, allowing us to adjust the weights in a way that reduces the loss.

Backpropagation Function:

def backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2):
    m = X.shape[0]
    dZ2 = A2 - y
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * (A1 * (1 - A1))  # Derivative of sigmoid
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m

    return dW1, db1, dW2, db2

Step 6: Updating Weights

What We Are Doing and Why

We update the weights and biases using the gradients computed during backpropagation. We subtract the gradient scaled by a learning rate from the current weights to move towards minimizing the loss.

Weight Update Function:

def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    return W1, b1, W2, b2

Step 7: Training the Network

What We Are Doing and Why

We bring everything together. We run forward propagation, calculate loss, perform backpropagation, and update weights iteratively for a defined number of epochs. This trains the network to make better predictions over time.

Training Function:

def train(X, y, input_size, hidden_size, output_size, epochs, learning_rate):
    W1, b1, W2, b2 = initialize_network(input_size, hidden_size, output_size)

    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
        loss = compute_loss(y, A2)
        dW1, db1, dW2, db2 = backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2)
        W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

        if i % 100 == 0:
            print(f"Epoch {i}, Loss: {loss}")

    return W1, b1, W2, b2

Complete Example: Putting It All Together

This section demonstrates how all the functions work together to create and train a simple neural network to solve the XOR problem.

import numpy as np

np.random.seed(42)

def initialize_network(input_size, hidden_size, output_size):
    W1 = np.random.randn(input_size, hidden_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size)
    b2 = np.zeros((1, output_size))
    return W1, b1, W2, b2

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(X, W1) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

def compute_loss(y_true, y_pred):
    m = y_true.shape[0]
    loss = -np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) / m
    return loss

def backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2):
    m = X.shape[0]
    dZ2 = A2 - y
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * (A1 * (1 - A1))  # Derivative of sigmoid
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m

    return dW1, db1, dW2, db2

def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    return W1, b1, W2, b2

def train(X, y, input_size, hidden_size, output_size, epochs, learning_rate):
    W1, b1, W2, b2 = initialize_network(input_size, hidden_size, output_size)

    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
        loss = compute_loss(y, A2)
        dW1, db1, dW2, db2 = backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2)
        W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

        if i % 1000 == 0:
            print(f"Epoch {i}, Loss: {loss}")

    return W1, b1, W2, b2

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

W1, b1, W2, b2 = train(X, y, input_size=2, hidden_size=4, output_size=1, epochs=10000, learning_rate=0.1)

Testing the Neural Network

What We Are Doing and Why

After training, we test the network on the input data to see if it has learned to approximate the XOR function. We expect the outputs to be close to [0, 1, 1, 0].

_, _, _, A2 = forward_propagation(X, W1, b1, W2, b2)
print("Predictions:", A2)

Sample Output

Predictions: [[0.01]
 [0.98]
 [0.98]
 [0.02]]

These values are close to the expected outputs [0, 1, 1, 0].

Conclusion

Building a neural network from scratch is an essential exercise for understanding the fundamentals of deep learning. This guide covered the initialization, forward propagation, loss calculation, backpropagation, and weight updates necessary to train a simple neural network. Mastering these basics will provide a strong foundation for using more advanced libraries like TensorFlow or PyTorch.

Further Reading:

For more practical implementations:

Admin

Recent Posts

From Pixels to Paragraphs: The Hidden World of Multimodal Models

Multimodal models integrate text, images, audio, and video into unified AI systems. Learn how they…

3 days ago

Unlocking the Power of Reasoning in AI Language Models

Explore how Large Language Models (LLMs) reason step-by-step using CoT, RAG, tools, and more to…

3 days ago

CPU vs GPU vs TPU: Which One Do I Need?

A detailed comparison of CPUs, GPUs, and TPUs, covering their architecture, performance, and real-world applications,…

3 days ago

TensorFlow for Beginners: A Complete Tutorial

Learn TensorFlow from scratch with this beginner-friendly guide. Build, train, and evaluate a neural network…

3 days ago

Transformers in AI: Empowering Machines to Master Human Language

Transformers power AI models like GPT-4 and BERT, enabling machines to understand and generate human-like…

4 days ago

How to Create and Publish LLM Models with Customized RAG Using Ollama

Discover how to create, fine-tune, and deploy powerful LLMs with customized Retrieval-Augmented Generation (RAG) using…

6 days ago

This website uses cookies.