machine-learning Neural Networks Activation Functions

Example

Activation functions also known as transfer function is used to map input nodes to output nodes in certain fashion.

They are used to impart non linearity to the output of a neural network layer.

Some commonly used functions and their curves are given below:

Sigmoid Function

The sigmoid is a squashing function whose output is in the range `[0, 1]`.

The code for implementing sigmoid along with its derivative with `numpy` is shown below:

``````def sigmoid(z):
return 1.0 / (1.0 + np.exp(-z))

def sigmoid_prime(z):
return sigmoid(z) * (1 - sigmoid(z))
``````

Hyperbolic Tangent Function (tanh)

The basic difference between the tanh and sigmoid functions is that tanh is 0 centred, squashing inputs into the range [-1, 1] and is more efficient to compute.

You can easily use the `np.tanh` or `math.tanh` functions to compute the activation of a hidden layer.

ReLU Function

A rectified linear unit does simply `max(0,x)`. It is the one of the most common choices for activation functions of neural network units.

ReLUs address the vanishing gradient problem of sigmoid / hyperbolic tangent units, thus allowing for efficient gradient propagation in deep networks.

The name ReLU comes from Nair and Hinton's paper, Rectified Linear Units Improve Restricted Boltzmann Machines.

It has some variations, for example, leaky ReLUs (LReLUs) and Exponential Linear Units (ELUs).

The code for implementing vanilla ReLU along with its derivative with `numpy` is shown below:

``````def relU(z):
return z * (z > 0)

def relU_prime(z):
return z > 0
``````