WikiGalaxy

Personalize

Activation Functions in Deep Learning

Introduction

Activation functions are crucial components in neural networks that introduce non-linearity into the model, enabling it to learn complex patterns. They determine the output of a node given an input or set of inputs and are essential for the network's ability to understand intricate data representations.

  • Sigmoid Function: Maps input values to a range between 0 and 1, often used in binary classification.
  • Tanh Function: Maps input values to a range between -1 and 1, providing zero-centered outputs.
  • ReLU (Rectified Linear Unit): Outputs zero for negative inputs and the input value itself for positive inputs, commonly used due to its simplicity and efficiency.
  • Leaky ReLU: A variant of ReLU that allows a small, non-zero gradient when the unit is not active.
  • Softmax: Used in multi-class classification problems to convert logits into probabilities.

Sigmoid Function

Characteristics

The sigmoid function is defined as \( \sigma(x) = \frac{1}{1 + e^{-x}} \). It is commonly used in the output layer of a binary classification neural network.

  • Squashes the input to a range between 0 and 1.
  • Can suffer from vanishing gradient problem.
  • Useful in the final layer of binary classifiers.

public class SigmoidExample {
    public static double sigmoid(double x) {
        return 1 / (1 + Math.exp(-x));
    }

    public static void main(String[] args) {
        double input = 0.5;
        System.out.println("Sigmoid of " + input + ": " + sigmoid(input));
    }
}
        

Why Use Sigmoid?

The sigmoid function is particularly useful for models where the output is expected to be a probability, such as in logistic regression and binary classification tasks.

Console Output:

Sigmoid of 0.5: 0.6224593312018546

Tanh Function

Characteristics

The tanh function is defined as \( \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \). It is a scaled version of the sigmoid function and is often used in hidden layers of neural networks.

  • Squashes the input to a range between -1 and 1.
  • Zero-centered, which helps in centering the data.
  • Can also suffer from vanishing gradients.

public class TanhExample {
    public static double tanh(double x) {
        return Math.tanh(x);
    }

    public static void main(String[] args) {
        double input = 0.5;
        System.out.println("Tanh of " + input + ": " + tanh(input));
    }
}
        

Why Use Tanh?

Tanh is preferred over sigmoid in hidden layers because it is zero-centered, which can lead to faster convergence in training.

Console Output:

Tanh of 0.5: 0.46211715726000974

ReLU (Rectified Linear Unit)

Characteristics

ReLU is defined as \( f(x) = \max(0, x) \). It is the most commonly used activation function in deep learning models due to its simplicity and computational efficiency.

  • Does not saturate for large positive inputs.
  • Efficient computation compared to sigmoid and tanh.
  • Can suffer from dying ReLU problem where neurons become inactive.

public class ReLUExample {
    public static double relu(double x) {
        return Math.max(0, x);
    }

    public static void main(String[] args) {
        double input = 0.5;
        System.out.println("ReLU of " + input + ": " + relu(input));
    }
}
        

Why Use ReLU?

ReLU is preferred in many applications because it accelerates the convergence of stochastic gradient descent compared to sigmoid/tanh.

Console Output:

ReLU of 0.5: 0.5

Leaky ReLU

Characteristics

Leaky ReLU is a modification of ReLU that allows a small, non-zero gradient when the unit is not active, defined as \( f(x) = \max(0.01x, x) \).

  • Addresses the dying ReLU problem.
  • Allows a small slope for negative inputs.
  • Maintains advantages of ReLU with added flexibility.

public class LeakyReLUExample {
    public static double leakyRelu(double x) {
        return x > 0 ? x : 0.01 * x;
    }

    public static void main(String[] args) {
        double input = -0.5;
        System.out.println("Leaky ReLU of " + input + ": " + leakyRelu(input));
    }
}
        

Why Use Leaky ReLU?

Leaky ReLU is beneficial when the model tends to have inactive neurons using standard ReLU, allowing a small gradient to flow through.

Console Output:

Leaky ReLU of -0.5: -0.005

Softmax Function

Characteristics

The softmax function is used in the output layer of neural networks to convert logits into probabilities. It is defined as \( \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} \).

  • Used in multi-class classification problems.
  • Outputs a probability distribution.
  • Ensures the sum of outputs equals 1.

import java.util.Arrays;

public class SoftmaxExample {
    public static double[] softmax(double[] inputs) {
        double sum = Arrays.stream(inputs).map(Math::exp).sum();
        return Arrays.stream(inputs).map(i -> Math.exp(i) / sum).toArray();
    }

    public static void main(String[] args) {
        double[] inputs = {1.0, 2.0, 3.0};
        double[] outputs = softmax(inputs);
        System.out.println("Softmax: " + Arrays.toString(outputs));
    }
}
        

Why Use Softmax?

Softmax is ideal for multi-class classification problems because it outputs a probability distribution over classes, which is interpretable as probabilities.

Console Output:

Softmax: [0.09003057, 0.24472847, 0.66524096]

logo of wikigalaxy

Newsletter

Subscribe to our newsletter for weekly updates and promotions.

Privacy Policy

 • 

Terms of Service

Copyright © WikiGalaxy 2025