machine-learning Neural Networks Activation Functions


Activation functions also known as transfer function is used to map input nodes to output nodes in certain fashion.

They are used to impart non linearity to the output of a neural network layer.

Some commonly used functions and their curves are given below: Activation functions

Sigmoid Function

The sigmoid is a squashing function whose output is in the range [0, 1].

enter image description here

The code for implementing sigmoid along with its derivative with numpy is shown below:

def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))

def sigmoid_prime(z):
    return sigmoid(z) * (1 - sigmoid(z))

Hyperbolic Tangent Function (tanh)

The basic difference between the tanh and sigmoid functions is that tanh is 0 centred, squashing inputs into the range [-1, 1] and is more efficient to compute.

enter image description here

You can easily use the np.tanh or math.tanh functions to compute the activation of a hidden layer.

ReLU Function

A rectified linear unit does simply max(0,x). It is the one of the most common choices for activation functions of neural network units.

enter image description here

ReLUs address the vanishing gradient problem of sigmoid / hyperbolic tangent units, thus allowing for efficient gradient propagation in deep networks.

The name ReLU comes from Nair and Hinton's paper, Rectified Linear Units Improve Restricted Boltzmann Machines.

It has some variations, for example, leaky ReLUs (LReLUs) and Exponential Linear Units (ELUs).

The code for implementing vanilla ReLU along with its derivative with numpy is shown below:

def relU(z):
    return z * (z > 0)

def relU_prime(z):
    return z > 0