Chapter 16 – Other Activation Functions
Chapter 16 – Other Activation Functions#
Data Science and Machine Learning for Geoscientists
The other solution for the vanishing gradient is to use other activation functions. We like the old activation function sigmoid
where
# make the figure be plotted at the centre
from IPython.core.display import HTML
HTML("""
<style>
.output_png {
display: table-cell;
text-align: center;
vertical-align: middle;
}
</style>
""")
import numpy as np
import matplotlib.pyplot as plt
N = 100
def main():
h = np.linspace(-5, 5, N)
tanh = (np.exp(h)-np.exp(-h))/(np.exp(h)+np.exp(-h))
plt.figure()
plt.plot(h, tanh)
plt.xlabel('$h$')
plt.ylabel('$tanh(h)$')
plt.title('Figure 1.4 Tanh function')
plt.show()
if __name__ == '__main__':
main()

\begin{equation} ReLU(h)=max(0,h) \end{equation}
#ReLu function
def relu(X):
return np.maximum(0,X)
N = 100
def main():
h = np.linspace(-5, 5, N)
Relu = relu(h)
plt.figure()
plt.plot(h, Relu)
plt.xlabel('$h$')
plt.ylabel('$Relu(h)$')
plt.title('Figure 1.4 Relu function')
plt.show()
if __name__ == '__main__':
main()

This activation function is wildly used in CNN (Convolutional Neural Network) because of the two characters: it is easy to compute and it does not have a vanishing gradient problem at all. Nevertheless, it has the biggest problem, which is there is no derivative at the point (
The following figure 16.1 is a MLP/DNN model with modified activation functions. In this model, the activation functions are changed to ReLU from sigmoid for all hidden layers but the output layer (in order to predict the probability).