# Chapter 4 – Cost Function#

Data Science and Machine Learning for Geoscientists

The previous 2 examples have the same amount of students, so it is a fair comparison. However, if they have different sizes, then the very accurate model with 1000 data would not have better cross-entropy result than the one with 5 data. So we need to divide the amount of the data to make `per capita’ comparison.

(19)#$Cost = -\frac{1}{m}\sum_{i=1}^m [y_i*ln(p_i)+(1-y_i)*ln(1-p_i)]$

where the $$p_i$$ is the AI’s prediction of the probability of the event happening, which is the same as $$\hat{y}$$ notation we used before.

(20)#$\hat{y} = \sigma(\vec{w}*\vec{x}+b)$

So finally, we can obtain the following by replacing $$p_i$$ by $$\hat{y}$$

(21)#$Cost(\vec{w},b) = -\frac{1}{m}\sum_{i=1}^m [y_i*ln( \sigma(\vec{w}*\vec{x}+b))+(1-y_i)*ln(1- \sigma(\vec{w}*\vec{x}+b))]$

What we want is to find the weights $$w$$ and bias $$b$$ to minimise the cost. This is a simple multi-variable calculus problem.