Categories: AI & Machine LearningAI Tools & LibrariesDeep LearningProgramming Languages

The Art of Building a Neural Network from Scratch – A Practical Guide

Introduction
Understanding Neural Networks
- Key Concepts:
Building Blocks of a Neural Network
Step 1: Preparing the Environment
Step 2: Initializing the Neural Network
- What We Are Doing and Why
Step 3: Forward Propagation
Step 4: Calculating Loss
- What We Are Doing and Why
- Binary Cross-Entropy Loss:
Step 5: Backpropagation
- What We Are Doing and Why
- Backpropagation Function:
Step 6: Updating Weights
- What We Are Doing and Why
- Weight Update Function:
Step 7: Training the Network
- What We Are Doing and Why
- Training Function:
Complete Example: Putting It All Together
Testing the Neural Network
- What We Are Doing and Why
- Sample Output
Conclusion

Introduction

Neural networks are the foundation of modern deep learning and artificial intelligence. They are designed to mimic the human brain’s ability to recognize patterns and make decisions. Neural networks power applications like image recognition, natural language processing, and autonomous systems. While libraries like TensorFlow and PyTorch simplify the process, building a neural network from scratch is an invaluable learning experience. It gives you a deeper understanding of how these models work internally, which can be essential when debugging or developing custom solutions.

This guide walks you through building a simple neural network using Python, NumPy, and basic mathematical operations. Each step is covered in detail, from initializing weights to backpropagation and training the model.

Understanding Neural Networks

A neural network is a computational model inspired by the human brain. It consists of layers of neurons, each taking inputs, processing them, and passing the output to the next layer.

Key Concepts:

Term	Description
Neuron	A unit that receives input, applies a function, and outputs a value.
Layer	A collection of neurons. Input layer, hidden layers, output layer.
Weights	Parameters that define the importance of inputs.
Activation Function	Introduces non-linearity (e.g., Sigmoid, ReLU).
Loss Function	Measures the difference between predictions and true values.
Backpropagation	Algorithm to update weights based on error gradients.

Neural networks learn by adjusting their weights based on the errors they make during training. This process is called backpropagation. The goal is to minimize the error (loss) by tuning the weights.

Building Blocks of a Neural Network

A neural network is built upon several core components:

Inputs: The data provided to the network (e.g., images, numerical data, text).
Weights: Parameters that transform the inputs into outputs. These are adjusted during training.
Biases: Additional parameters that allow the network to shift the output independently of inputs.
Activation Function: Introduces non-linearity to allow the network to learn complex patterns (e.g., Sigmoid, ReLU).
Loss Function: Measures how far the predictions are from the actual values. Common loss functions include Mean Squared Error (MSE) and Cross-Entropy Loss.
Optimizer: Updates the weights to reduce the loss. The most common optimizer is Gradient Descent.

Step 1: Preparing the Environment

Ensure you have Python and NumPy installed. NumPy is a fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices.

What We Are Doing and Why

We are setting up the development environment by installing and importing the necessary libraries. NumPy is essential for performing numerical computations required for matrix operations in the neural network.

Installation:

pip install numpy

Import Libraries:

import numpy as np

Setting a random seed ensures reproducibility when generating random numbers.

np.random.seed(42)

Step 2: Initializing the Neural Network

We will create a simple 2-layer neural network with:

Input Layer: 2 neurons
Hidden Layer: 4 neurons
Output Layer: 1 neuron

What We Are Doing and Why

We initialize the network’s weights and biases, as these parameters will be adjusted during training to minimize loss. Initializing weights randomly helps break symmetry, ensuring neurons learn different features.

def initialize_network(input_size, hidden_size, output_size):
    W1 = np.random.randn(input_size, hidden_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size)
    b2 = np.zeros((1, output_size))
    return W1, b1, W2, b2

Step 3: Forward Propagation

Forward propagation is the process of calculating outputs from the input through the network.

What We Are Doing and Why

We are computing the weighted sum of inputs, adding biases, and applying an activation function to introduce non-linearity. This is done layer by layer until the final output is produced.

Sigmoid Activation Function:

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Forward Propagation:

def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(X, W1) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

Step 4: Calculating Loss

Loss quantifies how well the network’s predictions match the true labels.

What We Are Doing and Why

We use Binary Cross-Entropy Loss to measure the error in prediction for binary classification. Minimizing this loss improves the network’s performance.

Binary Cross-Entropy Loss:

def compute_loss(y_true, y_pred):
    m = y_true.shape[0]
    loss = -np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) / m
    return loss

Step 5: Backpropagation

Backpropagation calculates gradients to adjust weights and minimize loss.

What We Are Doing and Why

We compute the gradients of the loss function with respect to the network’s weights. These gradients indicate how much each weight contributes to the error, allowing us to adjust the weights in a way that reduces the loss.

Backpropagation Function:

def backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2):
    m = X.shape[0]
    dZ2 = A2 - y
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * (A1 * (1 - A1))  # Derivative of sigmoid
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m

    return dW1, db1, dW2, db2

Step 6: Updating Weights

What We Are Doing and Why

We update the weights and biases using the gradients computed during backpropagation. We subtract the gradient scaled by a learning rate from the current weights to move towards minimizing the loss.

Weight Update Function:

def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    return W1, b1, W2, b2

Step 7: Training the Network

What We Are Doing and Why

We bring everything together. We run forward propagation, calculate loss, perform backpropagation, and update weights iteratively for a defined number of epochs. This trains the network to make better predictions over time.

Training Function:

def train(X, y, input_size, hidden_size, output_size, epochs, learning_rate):
    W1, b1, W2, b2 = initialize_network(input_size, hidden_size, output_size)

    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
        loss = compute_loss(y, A2)
        dW1, db1, dW2, db2 = backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2)
        W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

        if i % 100 == 0:
            print(f"Epoch {i}, Loss: {loss}")

    return W1, b1, W2, b2

Complete Example: Putting It All Together

This section demonstrates how all the functions work together to create and train a simple neural network to solve the XOR problem.

import numpy as np

np.random.seed(42)

def initialize_network(input_size, hidden_size, output_size):
    W1 = np.random.randn(input_size, hidden_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size)
    b2 = np.zeros((1, output_size))
    return W1, b1, W2, b2

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(X, W1) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

def compute_loss(y_true, y_pred):
    m = y_true.shape[0]
    loss = -np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) / m
    return loss

def backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2):
    m = X.shape[0]
    dZ2 = A2 - y
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * (A1 * (1 - A1))  # Derivative of sigmoid
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m

    return dW1, db1, dW2, db2

def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    return W1, b1, W2, b2

def train(X, y, input_size, hidden_size, output_size, epochs, learning_rate):
    W1, b1, W2, b2 = initialize_network(input_size, hidden_size, output_size)

    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
        loss = compute_loss(y, A2)
        dW1, db1, dW2, db2 = backward_propagation(X, y, W1, b1, W2, b2, Z1, A1, Z2, A2)
        W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

        if i % 1000 == 0:
            print(f"Epoch {i}, Loss: {loss}")

    return W1, b1, W2, b2

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

W1, b1, W2, b2 = train(X, y, input_size=2, hidden_size=4, output_size=1, epochs=10000, learning_rate=0.1)

Testing the Neural Network

What We Are Doing and Why

After training, we test the network on the input data to see if it has learned to approximate the XOR function. We expect the outputs to be close to [0, 1, 1, 0].

_, _, _, A2 = forward_propagation(X, W1, b1, W2, b2)
print("Predictions:", A2)

Sample Output

Predictions: [[0.01]
 [0.98]
 [0.98]
 [0.02]]

These values are close to the expected outputs [0, 1, 1, 0].

Conclusion

Building a neural network from scratch is an essential exercise for understanding the fundamentals of deep learning. This guide covered the initialization, forward propagation, loss calculation, backpropagation, and weight updates necessary to train a simple neural network. Mastering these basics will provide a strong foundation for using more advanced libraries like TensorFlow or PyTorch.

This website uses cookies.

The Art of Building a Neural Network from Scratch – A Practical Guide

Introduction

Understanding Neural Networks

Key Concepts:

Building Blocks of a Neural Network

Step 1: Preparing the Environment

What We Are Doing and Why

Installation:

Import Libraries:

Step 2: Initializing the Neural Network

What We Are Doing and Why

Step 3: Forward Propagation

What We Are Doing and Why

Sigmoid Activation Function:

Forward Propagation:

Step 4: Calculating Loss

What We Are Doing and Why

Binary Cross-Entropy Loss:

Step 5: Backpropagation

What We Are Doing and Why

Backpropagation Function:

Step 6: Updating Weights

What We Are Doing and Why

Weight Update Function:

Step 7: Training the Network

What We Are Doing and Why

Training Function:

Complete Example: Putting It All Together

Testing the Neural Network

What We Are Doing and Why

Sample Output

Conclusion

Recent Posts

From Pixels to Paragraphs: The Hidden World of Multimodal Models

Unlocking the Power of Reasoning in AI Language Models

CPU vs GPU vs TPU: Which One Do I Need?

TensorFlow for Beginners: A Complete Tutorial

Transformers in AI: Empowering Machines to Master Human Language

How to Create and Publish LLM Models with Customized RAG Using Ollama

The Art of Building a Neural Network from Scratch – A Practical Guide

Introduction

Understanding Neural Networks

Key Concepts:

Building Blocks of a Neural Network

Step 1: Preparing the Environment

What We Are Doing and Why

Installation:

Import Libraries:

Step 2: Initializing the Neural Network

What We Are Doing and Why

Step 3: Forward Propagation

What We Are Doing and Why

Sigmoid Activation Function:

Forward Propagation:

Step 4: Calculating Loss

What We Are Doing and Why

Binary Cross-Entropy Loss:

Step 5: Backpropagation

What We Are Doing and Why

Backpropagation Function:

Step 6: Updating Weights

What We Are Doing and Why

Weight Update Function:

Step 7: Training the Network

What We Are Doing and Why

Training Function:

Complete Example: Putting It All Together

Testing the Neural Network

What We Are Doing and Why

Sample Output

Conclusion

Related Post

Recent Posts

From Pixels to Paragraphs: The Hidden World of Multimodal Models

Unlocking the Power of Reasoning in AI Language Models

CPU vs GPU vs TPU: Which One Do I Need?

TensorFlow for Beginners: A Complete Tutorial

Transformers in AI: Empowering Machines to Master Human Language

How to Create and Publish LLM Models with Customized RAG Using Ollama