Overfitting is a common statistic bias in machine learning. It occurs when a model is too close to a specific data set to be considered general – think of a human who “learns” something by heart without actually understanding. He or she will be unable to answer a previously-unseen question, even if it is similar to questions already seen.

The principle of statistics is simple enough: take real data from a sample and construct one model that works in various scenarios. However, issues with overfitting arise when the chosen model has a wide range of freedom and can remember  the uniqueness of each observation without understanding the underlying phenomenon. Thus the model sends back rules that do not apply, and becomes extremely sensitive to the slightest variation.

For example, an overfitted model that takes age into account might give two very different results for two individuals born just a few days apart.

Would you like another cup of tea?