Chapter 4 – Cost Function#

Data Science and Machine Learning for Geoscientists

The previous 2 examples have the same amount of students, so it is a fair comparison. However, if they have different sizes, then the very accurate model with 1000 data would not have better cross-entropy result than the one with 5 data. So we need to divide the amount of the data to make `per capita’ comparison.

(19)#Cost=1mi=1m[yiln(pi)+(1yi)ln(1pi)]

where the pi is the AI’s prediction of the probability of the event happening, which is the same as y^ notation we used before.

(20)#y^=σ(wx+b)

So finally, we can obtain the following by replacing pi by y^

(21)#Cost(w,b)=1mi=1m[yiln(σ(wx+b))+(1yi)ln(1σ(wx+b))]

What we want is to find the weights w and bias b to minimise the cost. This is a simple multi-variable calculus problem.