Activation functions also known as transfer function is used to map input nodes to output nodes in certain fashion.
They are used to impart non linearity to the output of a neural network layer.
Some commonly used functions and their curves are given below:
The sigmoid is a squashing function whose output is in the range [0, 1]
.
The code for implementing sigmoid along with its derivative with numpy
is shown below:
def sigmoid(z):
return 1.0 / (1.0 + np.exp(-z))
def sigmoid_prime(z):
return sigmoid(z) * (1 - sigmoid(z))
The basic difference between the tanh and sigmoid functions is that tanh is 0 centred, squashing inputs into the range [-1, 1] and is more efficient to compute.
You can easily use the np.tanh
or math.tanh
functions to compute the activation of a hidden layer.
A rectified linear unit does simply max(0,x)
. It is the one of the most common choices for activation functions of neural network units.
ReLUs address the vanishing gradient problem of sigmoid / hyperbolic tangent units, thus allowing for efficient gradient propagation in deep networks.
The name ReLU comes from Nair and Hinton's paper, Rectified Linear Units Improve Restricted Boltzmann Machines.
It has some variations, for example, leaky ReLUs (LReLUs) and Exponential Linear Units (ELUs).
The code for implementing vanilla ReLU along with its derivative with numpy
is shown below:
def relU(z):
return z * (z > 0)
def relU_prime(z):
return z > 0