The Art of Building a Neural Network from Scratch – A Practical Guide

Introduction

Neural networks are the foundation of modern deep learning and artificial intelligence. They are designed to mimic the human brain’s ability to recognize patterns and make decisions. Neural networks power applications like image recognition, natural language processing, and autonomous systems. While libraries like TensorFlow and PyTorch simplify the process, building a neural network from scratch is an invaluable learning experience. It gives you a deeper understanding of how these models work internally, which can be essential when debugging or developing custom solutions.

This guide walks you through building a simple neural network using Python, NumPy, and basic mathematical operations. Each step is covered in detail, from initializing weights to backpropagation and training the model.

Understanding Neural Networks

A neural network is a computational model inspired by the human brain. It consists of layers of neurons, each taking inputs, processing them, and passing the output to the next layer.

Key Concepts:

TermDescription
NeuronA unit that receives input, applies a function, and outputs a value.
LayerA collection of neurons. Input layer, hidden layers, output layer.
WeightsParameters that define the importance of inputs.
Activation FunctionIntroduces non-linearity (e.g., Sigmoid, ReLU).
Loss FunctionMeasures the difference between predictions and true values.
BackpropagationAlgorithm to update weights based on error gradients.

Neural networks learn by adjusting their weights based on the errors they make during training. This process is called backpropagation. The goal is to minimize the error (loss) by tuning the weights.

Building Blocks of a Neural Network

A neural network is built upon several core components:

  1. Inputs: The data provided to the network (e.g., images, numerical data, text).
  2. Weights: Parameters that transform the inputs into outputs. These are adjusted during training.
  3. Biases: Additional parameters that allow the network to shift the output independently of inputs.
  4. Activation Function: Introduces non-linearity to allow the network to learn complex patterns (e.g., Sigmoid, ReLU).
  5. Loss Function: Measures how far the predictions are from the actual values. Common loss functions include Mean Squared Error (MSE) and Cross-Entropy Loss.
  6. Optimizer: Updates the weights to reduce the loss. The most common optimizer is Gradient Descent.

Step 1: Preparing the Environment

Ensure you have Python and NumPy installed. NumPy is a fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices.

What We Are Doing and Why

We are setting up the development environment by installing and importing the necessary libraries. NumPy is essential for performing numerical computations required for matrix operations in the neural network.

Installation:

pip install numpy

Import Libraries:

import numpy as np

Setting a random seed ensures reproducibility when generating random numbers.

np.random.seed(42)

Step 2: Initializing the Neural Network

We will create a simple 2-layer neural network with:

  • Input Layer: 2 neurons
  • Hidden Layer: 4 neurons
  • Output Layer: 1 neuron

What We Are Doing and Why

We initialize the network’s weights and biases, as these parameters will be adjusted during training to minimize loss. Initializing weights randomly helps break symmetry, ensuring neurons learn different features.

def initialize_network(input_size, hidden_size, output_size):
    W1 = np.random.randn(input_size, hidden_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size)
    b2 = np.zeros((1, output_size))
    return W1, b1, W2, b2

Step 3: Forward Propagation

Forward propagation is the process of calculating outputs from the input through the network.

What We Are Doing and Why

We are computing the weighted sum of inputs, adding biases, and applying an activation function to introduce non-linearity. This is done layer by layer until the final output is produced.

Sigmoid Activation Function:

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Forward Propagation:

def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(X, W1) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

Step 4: Calculating Loss

Loss quantifies how well the network’s predictions match the true labels.

What We Are Doing and Why

We use Binary Cross-Entropy Loss to measure the error in prediction for binary classification. Minimizing this loss improves the network’s performance.

Binary Cross-Entropy Loss:

def compute_loss(y_true, y_pred):
    m = y_true.shape[0]
    loss = -np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) / m
    return loss

Step 5: Backpropagation

Backpropagation calculates gradients to adjust weights and minimize loss.

What We Are Doing and Why

We compute the gradients of the loss function with respect to the network’s weights. These gradients indicate how much each weight contributes to the error, allowing us to adjust the weights in a way that reduces the loss.

Backpropagation Function:

def backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2):
    m = X.shape[0]
    dZ2 = A2 - y
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * (A1 * (1 - A1))  # Derivative of sigmoid
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m

    return dW1, db1, dW2, db2

Step 6: Updating Weights

What We Are Doing and Why

We update the weights and biases using the gradients computed during backpropagation. We subtract the gradient scaled by a learning rate from the current weights to move towards minimizing the loss.

Weight Update Function:

def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    return W1, b1, W2, b2

Step 7: Training the Network

What We Are Doing and Why

We bring everything together. We run forward propagation, calculate loss, perform backpropagation, and update weights iteratively for a defined number of epochs. This trains the network to make better predictions over time.

Training Function:

def train(X, y, input_size, hidden_size, output_size, epochs, learning_rate):
    W1, b1, W2, b2 = initialize_network(input_size, hidden_size, output_size)

    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
        loss = compute_loss(y, A2)
        dW1, db1, dW2, db2 = backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2)
        W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

        if i % 100 == 0:
            print(f"Epoch {i}, Loss: {loss}")

    return W1, b1, W2, b2

Complete Example: Putting It All Together

This section demonstrates how all the functions work together to create and train a simple neural network to solve the XOR problem.

import numpy as np

np.random.seed(42)

def initialize_network(input_size, hidden_size, output_size):
    W1 = np.random.randn(input_size, hidden_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size)
    b2 = np.zeros((1, output_size))
    return W1, b1, W2, b2

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(X, W1) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

def compute_loss(y_true, y_pred):
    m = y_true.shape[0]
    loss = -np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) / m
    return loss

def backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2):
    m = X.shape[0]
    dZ2 = A2 - y
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * (A1 * (1 - A1))  # Derivative of sigmoid
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m

    return dW1, db1, dW2, db2

def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    return W1, b1, W2, b2

def train(X, y, input_size, hidden_size, output_size, epochs, learning_rate):
    W1, b1, W2, b2 = initialize_network(input_size, hidden_size, output_size)

    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
        loss = compute_loss(y, A2)
        dW1, db1, dW2, db2 = backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2)
        W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

        if i % 1000 == 0:
            print(f"Epoch {i}, Loss: {loss}")

    return W1, b1, W2, b2

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

W1, b1, W2, b2 = train(X, y, input_size=2, hidden_size=4, output_size=1, epochs=10000, learning_rate=0.1)

Testing the Neural Network

What We Are Doing and Why

After training, we test the network on the input data to see if it has learned to approximate the XOR function. We expect the outputs to be close to [0, 1, 1, 0].

_, _, _, A2 = forward_propagation(X, W1, b1, W2, b2)
print("Predictions:", A2)

Sample Output

Predictions: [[0.01]
 [0.98]
 [0.98]
 [0.02]]

These values are close to the expected outputs [0, 1, 1, 0].

Conclusion

Building a neural network from scratch is an essential exercise for understanding the fundamentals of deep learning. This guide covered the initialization, forward propagation, loss calculation, backpropagation, and weight updates necessary to train a simple neural network. Mastering these basics will provide a strong foundation for using more advanced libraries like TensorFlow or PyTorch.

Further Reading:

For more practical implementations:

Admin

Recent Posts

AI Prompts for Developers: Think Like a Principal Engineer

Developers often struggle to get actionable results from AI coding assistants. This guide provides 7…

4 weeks ago

How to Train and Publish Your Own LLM with Hugging Face (Part 3: Publishing & Sharing)

In the final part of our Hugging Face LLM training series, learn how to publish…

1 month ago

How to Train and Publish Your Own LLM with Hugging Face (Part 2: Fine-Tuning Your Model)

In Part 2 of our Hugging Face series, you’ll fine-tune your own AI model step…

1 month ago

How to Train and Publish Your Own LLM with Hugging Face (Part 1: Getting Started)

Kickstart your AI journey with Hugging Face. In this beginner-friendly guide, you’ll learn how to…

1 month ago

The Hidden 2017 Breakthrough Behind ChatGPT, Claude, and Gemini

Discover how the 2017 paper Attention Is All You Need introduced Transformers, sparking the AI…

1 month ago

OpenAI’s New Budget Plan: Everything to Know About ChatGPT Go

OpenAI just launched ChatGPT Go, a new low-cost plan priced at ₹399/month—India-only for now. You…

1 month ago

This website uses cookies.