Chapter 4 – Cost Function#

Data Science and Machine Learning for Geoscientists

The previous 2 examples have the same amount of students, so it is a fair comparison. However, if they have different sizes, then the very accurate model with 1000 data would not have better cross-entropy result than the one with 5 data. So we need to divide the amount of the data to make `per capita’ comparison.

(19)#\[ Cost = -\frac{1}{m}\sum_{i=1}^m [y_i*ln(p_i)+(1-y_i)*ln(1-p_i)] \]

where the \(p_i\) is the AI’s prediction of the probability of the event happening, which is the same as \(\hat{y}\) notation we used before.

(20)#\[ \hat{y} = \sigma(\vec{w}*\vec{x}+b) \]

So finally, we can obtain the following by replacing \(p_i\) by \(\hat{y}\)

(21)#\[ Cost(\vec{w},b) = -\frac{1}{m}\sum_{i=1}^m [y_i*ln( \sigma(\vec{w}*\vec{x}+b))+(1-y_i)*ln(1- \sigma(\vec{w}*\vec{x}+b))] \]

What we want is to find the weights \(w\) and bias \(b\) to minimise the cost. This is a simple multi-variable calculus problem.