Chapter 18 – Softmax
Chapter 18 – Softmax#
Data Science and Machine Learning for Geoscientists
The cross-entropy cost can be used to address the problem of learning slowdown. However, I want to briefly describe another approach to the problem, based on what are called softmax layers of neurons. Softmax is still worth understanding, in part because it’s intrinsically interesting, and in part because we’ll use softmax layers in our discussion of deep neural networks.
The idea of softmax is to define a new type of output layer for our neural networks. It begins in the same way as with a sigmoid layer, by forming the weighted inputs
However, we don’t apply the sigmoid function to get the output. Instead, in a softmax layer we apply the so-called softmax function
where
The sum of the
By contrast, if the output layer was a sigmoid layer, then we certainly couldn’t assume that the activations formed a probability distribution. I won’t explicitly prove it, but it should be plausible that the activations from a sigmoid layer won’t in general form a probability distribution. And so with a sigmoid output layer we don’t have such a simple interpretation of the output activations.