Popular Activation Functions for Deep Learning - IN PROGRESS
Category > Deep Learning
Jun 09, 2020RELU, Sigmoid, and the Hyperbolic Tangent activation functions:¶
Sigmoid functions: mostly used in in shallow networks , binary classification problems
Tanh function: became the preferred function compared to the sigmoid function in that it gives better training performance for multi-layer neural networks by remedy of drawbacks from sigmoid AF. However, the tanh function could not solve the vanishing gradient problem suffered by the sigmoid functions as well.
Finally the Rectified Linear Unit (ReLU) Function:
It offers the better performance and generalization in deep learning compared to the Sigmoid and Tanh activation function.
Advantages
Limitations
ReLU has alimitation that it easily overfits compared to the sigmoid function although the dropout technique has been adopted to reduce the effect of overfitting of ReLUsSometimes fragile during training thereby causing some of the gradients to die. This leads to some neurons being dead as well, thereby causing the weight updates not to activate in future data points, thereby hindering learning as dead neurons gives zero activation. hink about the chain rule in the backward pass. If the derivative of the slope of the ReLU is of 0, absolutely no learning is performed on the layers below the dead ReLU, because 0 will be multiplied to the accumulated gradient for the weight update.
Solution?
To resolve the dead neuron issues, the leaky ReLU was proposed with a small negative slope to the ReLU to sustain and keep the weight updates alive during the entire propagation process.But, The LReLU has an identical result when compared to the standard ReLU. Perhaps because the ReLU can introduce a sort of “optimal brain damage” regularization in your machine learning algorithm thus avoiding overfitting and improving genrelaization.
Activation Functions: Comparison of Trends in Practice and Research for Deep Learning[https://arxiv.org/pdf/1811.03378.pdf]
https://www.quora.com/What-are-the-advantages-of-using-Leaky-Rectified-Linear-Units-Leaky-ReLU-over-normal-ReLU-in-deep-learning
QUIZ¶
QUIZ QUESTION 1: What are the advantages of the ReLU activation functions
The ReLU represents a nearly linear function and therefore preserves the properties of linear models that made them easy to optimize, with the gradient-descent method. Eliminates the vanishing gradient problem by rectifying the values of the inputs less than zero and forcing them to zero. Guarantee faster computation since it does not compute exponentials and divisions, with overall speed of computation enhanced
# Use an official Python runtime as an image
FROM python:3.6
# The EXPOSE instruction indicates the ports on which a container
# will listen for connections
# Since Flask apps listen to port 5000 by default, we expose it
EXPOSE 5000
`
def hello():
print("hello world")