WikiGalaxy

Personalize

Activation Functions in Deep Learning

Introduction

Activation functions are crucial components in neural networks that introduce non-linearity into the model, enabling it to learn complex patterns. They determine the output of a node given an input or set of inputs and are essential for the network's ability to understand intricate data representations.

Sigmoid Function: Maps input values to a range between 0 and 1, often used in binary classification.
Tanh Function: Maps input values to a range between -1 and 1, providing zero-centered outputs.
ReLU (Rectified Linear Unit): Outputs zero for negative inputs and the input value itself for positive inputs, commonly used due to its simplicity and efficiency.
Leaky ReLU: A variant of ReLU that allows a small, non-zero gradient when the unit is not active.
Softmax: Used in multi-class classification problems to convert logits into probabilities.

Sigmoid Function

Characteristics

The sigmoid function is defined as \( \sigma(x) = \frac{1}{1 + e^{-x}} \). It is commonly used in the output layer of a binary classification neural network.

Squashes the input to a range between 0 and 1.
Can suffer from vanishing gradient problem.
Useful in the final layer of binary classifiers.


public class SigmoidExample {
    public static double sigmoid(double x) {
        return 1 / (1 + Math.exp(-x));
    }

    public static void main(String[] args) {
        double input = 0.5;
        System.out.println("Sigmoid of " + input + ": " + sigmoid(input));
    }
}

Why Use Sigmoid?

The sigmoid function is particularly useful for models where the output is expected to be a probability, such as in logistic regression and binary classification tasks.

Console Output:

Sigmoid of 0.5: 0.6224593312018546

Tanh Function

Characteristics

The tanh function is defined as \( \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \). It is a scaled version of the sigmoid function and is often used in hidden layers of neural networks.

Squashes the input to a range between -1 and 1.
Zero-centered, which helps in centering the data.
Can also suffer from vanishing gradients.


public class TanhExample {
    public static double tanh(double x) {
        return Math.tanh(x);
    }

    public static void main(String[] args) {
        double input = 0.5;
        System.out.println("Tanh of " + input + ": " + tanh(input));
    }
}

Why Use Tanh?

Tanh is preferred over sigmoid in hidden layers because it is zero-centered, which can lead to faster convergence in training.

Console Output:

Tanh of 0.5: 0.46211715726000974

ReLU (Rectified Linear Unit)

Characteristics

ReLU is defined as \( f(x) = \max(0, x) \). It is the most commonly used activation function in deep learning models due to its simplicity and computational efficiency.

Does not saturate for large positive inputs.
Efficient computation compared to sigmoid and tanh.
Can suffer from dying ReLU problem where neurons become inactive.


public class ReLUExample {
    public static double relu(double x) {
        return Math.max(0, x);
    }

    public static void main(String[] args) {
        double input = 0.5;
        System.out.println("ReLU of " + input + ": " + relu(input));
    }
}

Why Use ReLU?

ReLU is preferred in many applications because it accelerates the convergence of stochastic gradient descent compared to sigmoid/tanh.

Console Output:

ReLU of 0.5: 0.5

Leaky ReLU

Characteristics

Leaky ReLU is a modification of ReLU that allows a small, non-zero gradient when the unit is not active, defined as \( f(x) = \max(0.01x, x) \).

Addresses the dying ReLU problem.
Allows a small slope for negative inputs.
Maintains advantages of ReLU with added flexibility.


public class LeakyReLUExample {
    public static double leakyRelu(double x) {
        return x > 0 ? x : 0.01 * x;
    }

    public static void main(String[] args) {
        double input = -0.5;
        System.out.println("Leaky ReLU of " + input + ": " + leakyRelu(input));
    }
}

Why Use Leaky ReLU?

Leaky ReLU is beneficial when the model tends to have inactive neurons using standard ReLU, allowing a small gradient to flow through.

Console Output:

Leaky ReLU of -0.5: -0.005

Softmax Function

Characteristics

The softmax function is used in the output layer of neural networks to convert logits into probabilities. It is defined as \( \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} \).

Used in multi-class classification problems.
Outputs a probability distribution.
Ensures the sum of outputs equals 1.


import java.util.Arrays;

public class SoftmaxExample {
    public static double[] softmax(double[] inputs) {
        double sum = Arrays.stream(inputs).map(Math::exp).sum();
        return Arrays.stream(inputs).map(i -> Math.exp(i) / sum).toArray();
    }

    public static void main(String[] args) {
        double[] inputs = {1.0, 2.0, 3.0};
        double[] outputs = softmax(inputs);
        System.out.println("Softmax: " + Arrays.toString(outputs));
    }
}

Why Use Softmax?

Softmax is ideal for multi-class classification problems because it outputs a probability distribution over classes, which is interpretable as probabilities.

Console Output:

Softmax: [0.09003057, 0.24472847, 0.66524096]