Cross Entropy

Negative log likelihood

Likelihood refers to the chances of some calculated parameters producing some known data. In ML, the parameters are updated to fit a static dataset.

Cost function that is used as loss for machine learning classification models (the lower the better). We use NEGATIVE because most ML frameworks only have minimization optimization functionality.

We take \(\ln\) because it’s cleaner to use when have high or low numbers.

The output of a classification problem is usually a probability vector. For example,

\[\hat{p} = [0.1,0.3,0.5,0.1]\]

If the correct answer is the fourth class \(y = [0,0,0,1]\), the likelihood of the current state of the model producing the input is:

\[L = \hat{p} y^T = 0.1\]

Therefore, \(-\ln(0.1) = 2.3\)

If the correct category would have been the third class \(y = [0,0,1,0]\), the likelihood would be

\[L = 0.5\]

Therefore \(-\ln(0.5) = 0.69\).

The better the prediction, the lower the number!