Activation Functions and Loss Functions

In the field of deep learning, activation functions and loss functions play a vital role in training neural networks. Activation functions introduce non-linearity to the network, allowing it to learn complex patterns, while loss functions quantify the model's performance during training. In this article, we will dive into the world of activation functions and loss functions used in PyTorch, a popular deep learning framework.

Activation Functions

Activation functions are applied to the output of each neuron in a neural network, allowing the network to learn complex relationships between inputs and outputs. PyTorch provides various activation functions, each serving a specific purpose. Let's explore some common activation functions:

1. ReLU (Rectified Linear Unit)

ReLU, one of the most widely used activation functions, is defined as f(x) = max(0, x). It replaces negative values with zero while leaving positive values unchanged. ReLU introduces non-linearity and helps models learn complex patterns effectively. It is computationally efficient and addresses the vanishing gradient problem.

2. Sigmoid

The sigmoid activation function f(x) = 1 / (1 + exp(-x)) maps the input to a value between 0 and 1. It is commonly used in binary classification problems where each output represents the probability of belonging to a particular class. However, sigmoid suffers from the vanishing gradient problem, making it less suitable for deep networks.

3. Tanh (Hyperbolic Tangent)

Similar to the sigmoid function, the tanh activation function compresses the input values between -1 and 1. It is defined as f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)). Tanh can be used in hidden layers of a neural network and is advantageous over sigmoid as it has a steeper gradient and is zero-centered.

4. Softmax

The softmax activation function is typically used in the output layer for multi-class classification tasks. It converts the network's raw output into a probability distribution over multiple classes, ensuring the sum of all probabilities equals one. Softmax is defined as f(x_i) = exp(x_i) / sum(exp(x_j)), where x_i represents the raw output for class i.

Loss Functions

Loss functions, also known as cost functions or objective functions, quantify the model's performance by measuring the difference between predicted and actual values. PyTorch offers a variety of loss functions suitable for different problem domains. Let's explore some commonly used loss functions:

1. Mean Squared Error (MSE)

MSE is widely used in regression tasks and measures the average squared difference between predicted and actual values. It is defined as MSE = (1 / N) * sum((y_pred - y_true) ** 2), where y_pred represents the predicted values, y_true stands for the ground truth values, and N indicates the number of samples.

2. Binary Cross Entropy

Binary Cross Entropy (BCE) is a popular loss function for binary classification problems. It compares the predicted probability of the positive class with the actual class labels. BCE is defined as BCE = -((y_true * log(y_pred)) + ((1 - y_true) * log(1 - y_pred))), where y_pred is the predicted probability and y_true is the actual label.

3. Categorical Cross Entropy

Categorical Cross Entropy (CCE) is suitable for multi-class classification tasks. It computes the average loss over all classes, comparing the predicted class probabilities to the actual class labels. CCE is defined as CCE = -sum(y_true * log(y_pred)), where y_pred represents the predicted probabilities and y_true contains one-hot encoded class labels.

4. Kullback-Leibler Divergence

Kullback-Leibler Divergence (KLD) measures the difference between two probability distributions. It is often used in tasks like generative modeling. KLD is defined as KLD = sum(y_true * log(y_true / y_pred)), where y_pred represents the predicted probabilities and y_true is the true distribution.


Activation functions and loss functions are essential components in training neural networks. PyTorch provides a wide range of activation functions and loss functions to suit various deep learning tasks. Understanding the purpose and characteristics of different activation functions and loss functions helps in effectively designing and training deep learning models.

noob to master © copyleft