Chapter 4 – Cost Function
Chapter 4 – Cost Function#
Data Science and Machine Learning for Geoscientists
The previous 2 examples have the same amount of students, so it is a fair comparison. However, if they have different sizes, then the very accurate model with 1000 data would not have better cross-entropy result than the one with 5 data. So we need to divide the amount of the data to make `per capita’ comparison.
(19)#\[
Cost = -\frac{1}{m}\sum_{i=1}^m [y_i*ln(p_i)+(1-y_i)*ln(1-p_i)]
\]
where the \(p_i\) is the AI’s prediction of the probability of the event happening, which is the same as \(\hat{y}\) notation we used before.
(20)#\[
\hat{y} = \sigma(\vec{w}*\vec{x}+b)
\]
So finally, we can obtain the following by replacing \(p_i\) by \(\hat{y}\)
(21)#\[
Cost(\vec{w},b) = -\frac{1}{m}\sum_{i=1}^m [y_i*ln( \sigma(\vec{w}*\vec{x}+b))+(1-y_i)*ln(1- \sigma(\vec{w}*\vec{x}+b))]
\]
What we want is to find the weights \(w\) and bias \(b\) to minimise the cost. This is a simple multi-variable calculus problem.