WikiGalaxy

Personalize

Forward and Backward Propagation in Deep Learning

Introduction to Forward Propagation

  • Forward propagation is the process of passing input through the network to get the output.
  • It involves computing the weighted sum of inputs and applying an activation function.
  • Used to predict the output from given inputs in a neural network.

Introduction to Backward Propagation

  • Backward propagation is used to update the weights of the network based on the error.
  • It involves calculating the gradient of the loss function with respect to each weight by the chain rule.
  • Essential for learning as it minimizes the error by updating weights.

Forward Propagation Examples

Simple Neural Network Forward Pass

A simple neural network with one hidden layer can be used to demonstrate forward propagation.


import numpy as np

# Input data
X = np.array([0.5, 0.2, 0.1])

# Weights
W1 = np.array([[0.4, 0.3], [0.2, 0.7], [0.6, 0.5]])
W2 = np.array([0.8, 0.6])

# Forward pass
Z1 = np.dot(X, W1)
A1 = np.tanh(Z1)
Z2 = np.dot(A1, W2)
output = 1 / (1 + np.exp(-Z2))

print("Output:", output)
        

Explanation

  • The input data is passed through the network by multiplying with weights.
  • An activation function (tanh) is applied to introduce non-linearity.
  • The final output is computed using a sigmoid function, representing the forward pass.

Backward Propagation Examples

Gradient Descent in Backward Propagation

Backward propagation uses gradient descent to update weights and minimize error.


import numpy as np

# Derivative of sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)

# Error calculation
error = output - target
d_output = error * sigmoid_derivative(output)

# Backpropagation
d_hidden_layer = d_output.dot(W2.T) * sigmoid_derivative(A1)
W2 -= A1.T.dot(d_output) * learning_rate
W1 -= X.T.dot(d_hidden_layer) * learning_rate
        

Explanation

  • The error is calculated by subtracting the target from the output.
  • The derivative of the sigmoid function is used to calculate the gradient.
  • Weights are updated using the gradient and learning rate to minimize the error.

Activation Functions in Forward Propagation

Role of Activation Functions

  • Activation functions introduce non-linearity into the network.
  • Common activation functions include sigmoid, tanh, and ReLU.
  • They help the network learn complex patterns and relationships in the data.

import numpy as np

# Activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def relu(x):
    return np.maximum(0, x)

def tanh(x):
    return np.tanh(x)

# Example usage
x = np.array([-1.0, 0.0, 1.0])
print("Sigmoid:", sigmoid(x))
print("ReLU:", relu(x))
print("Tanh:", tanh(x))
        

Explanation

  • Sigmoid squashes input values between 0 and 1, good for binary classification.
  • ReLU sets negative values to zero, helping with sparse activations.
  • Tanh maps input values between -1 and 1, often used in hidden layers.

Weight Initialization in Forward Propagation

Importance of Weight Initialization

  • Proper weight initialization can speed up convergence and improve model performance.
  • Common strategies include random initialization, Xavier, and He initialization.
  • Prevents issues like vanishing or exploding gradients.

import numpy as np

# Random weight initialization
def initialize_weights(shape):
    return np.random.randn(*shape) * 0.01

# Xavier initialization
def xavier_initialization(shape):
    return np.random.randn(*shape) * np.sqrt(1. / shape[0])

# He initialization
def he_initialization(shape):
    return np.random.randn(*shape) * np.sqrt(2. / shape[0])

# Example usage
W1 = initialize_weights((3, 2))
W2 = xavier_initialization((2, 1))
W3 = he_initialization((2, 1))
print("Random:", W1)
print("Xavier:", W2)
print("He:", W3)
        

Explanation

  • Random initialization assigns small random values to weights.
  • Xavier initialization is suitable for layers with sigmoid or tanh activations.
  • He initialization works well with ReLU activation functions.

Loss Functions in Backward Propagation

Role of Loss Functions

  • Loss functions quantify the difference between predicted and actual values.
  • Common loss functions include mean squared error, cross-entropy, and hinge loss.
  • Used to guide the optimization process during training.

import numpy as np

# Mean Squared Error
def mse(y_true, y_pred):
    return np.mean(np.power(y_true - y_pred, 2))

# Cross-Entropy Loss
def cross_entropy(y_true, y_pred):
    return -np.sum(y_true * np.log(y_pred))

# Example usage
y_true = np.array([1, 0, 0])
y_pred = np.array([0.7, 0.2, 0.1])
print("MSE:", mse(y_true, y_pred))
print("Cross-Entropy:", cross_entropy(y_true, y_pred))
        

Explanation

  • Mean squared error measures the average squared difference between predictions and actual values.
  • Cross-entropy loss is used for classification tasks, penalizing incorrect predictions.
  • These functions help adjust weights to reduce the error during training.

Optimization Algorithms in Backward Propagation

Overview of Optimization Algorithms

  • Optimization algorithms are used to minimize the loss function by adjusting weights.
  • Common algorithms include gradient descent, Adam, and RMSprop.
  • Each algorithm has its advantages and is chosen based on the problem and data.

import numpy as np

# Gradient Descent
def gradient_descent(w, grad, lr):
    return w - lr * grad

# Example usage
w = np.array([0.5, 0.3])
grad = np.array([0.1, 0.2])
lr = 0.01
w_updated = gradient_descent(w, grad, lr)
print("Updated Weights:", w_updated)
        

Explanation

  • Gradient descent updates weights by moving in the direction of the negative gradient.
  • Learning rate determines the step size during each iteration of the update.
  • Optimization algorithms improve convergence speed and stability.

Batch Normalization in Forward Propagation

Benefits of Batch Normalization

  • Batch normalization normalizes the input of each layer to improve stability.
  • Reduces internal covariate shift, leading to faster training and better performance.
  • Can act as a regularizer, potentially reducing the need for dropout.

import numpy as np

# Batch normalization
def batch_norm(X, gamma, beta, epsilon=1e-5):
    mu = np.mean(X, axis=0)
    var = np.var(X, axis=0)
    X_norm = (X - mu) / np.sqrt(var + epsilon)
    return gamma * X_norm + beta

# Example usage
X = np.array([[1, 2], [3, 4], [5, 6]])
gamma = np.array([1.0, 1.0])
beta = np.array([0.0, 0.0])
X_bn = batch_norm(X, gamma, beta)
print("Batch Normalized:", X_bn)
        

Explanation

  • Batch normalization normalizes inputs by subtracting the batch mean and dividing by the batch variance.
  • Gamma and beta are learned parameters that scale and shift the normalized value.
  • Helps in stabilizing the learning process and allows for higher learning rates.
logo of wikigalaxy

Newsletter

Subscribe to our newsletter for weekly updates and promotions.

Privacy Policy

 • 

Terms of Service

Copyright © WikiGalaxy 2025