A Beginner’s Guide to Model Fit: Underfitting vs. Overfitting

August 5, 2023 | by Arround The Web | No comments

A Beginner’s Guide to Model Fit: Underfitting vs. Overfitting

Applications such as ChatGPT, Midjourney, Google Assistant, and Alexa have made everyday tasks as simple as human imagination goes. These applications are based on trained models with large datasets. Model fitness is the most important thing in a machine learning model.

This article will explain model fitness, the difference between underfitting and overfitting, and the reasons behind these irregularities in model fitness.

What is Model Fitness?

Model fitness is the statistical measure of the results of a machine learning model. It shows the model’s prediction ability based on the trained data. A well-fitted model maps the trained data and the input data that is similar quite well. Model fitness in machine learning depends on the trained data sets and the algorithm used.

Before starting with the fitness of models, a general understanding of data sets is necessary. Let us understand what a data set is.

Data Set

It is a collection of samples with a few specific characteristics. Data sets in ML are used to train an ML model. Usually, 70 percent of data is used in training and 30 percent is used to test the accuracy of the model after training.

The irregularities in model fitness can be of two types, i.e., “Underfitting Models” and “Overfitting Models”. A few examples of a general scatter plot based on model fitness can be seen in the figure below:

Let us understand underfitting in machine learning models and the reasons behind it:

What is Underfitting?

Underfitting models perform well on training data but these models do not work well on test data. This is due to the reason that these models fail to establish any pattern in the training data. It usually happens when there is less training data.

Let us head to the reasons for underfitting in machine learning models:

What are the Reasons for Underfitting?

Underfitting in machine learning models can be the result of a few key factors. Some of these are:

Let us discuss these reasons in detail:

High Bias

These models have high bias, i.e., there is a significant difference between predicted values of the model and the ground truth. It happens because of incorrect assumptions by the machine learning model.

Simple Model

Underfitting can be the result of a simple model. A simple model is one that has a smaller number of input features and requires more regularization.

Noise

Underfitting can also be the result of noise or garbage in the training datasets. Noise in data causes the model to train on wrong values which in turn outputs wrong results.

Size of Training Data

Small training data sets can cause underfitting. If the training dataset is small, the model fails to predict the pattern or the trend in dataset values.

These were a few of the reasons behind underfitting. Let us head to overfitting now:

What is Overfitting?

Overfitting models are trained on the sample dataset for a long time. They perform extremely well on the training datasets. Overfitting models interpret patterns in the training dataset, unlike models with underfitting. These models produce wrong predictions because of noise and repetitive training on the sample dataset.

Let us head to the reasons behind overfitting in machine learning models.

What are the Reasons for Overfitting?

Overfitting in the machine learning model can be the result of a number of factors. Some of the reasons behind overfitting are:

Let us explain these reasons in detail:

High Variance

The inconsistency in predictions when different datasets are tested is called variance. It means the algorithm shows inconsistency when different data sets are trained. It will perform extremely well on one dataset and not so much on others.

Complex Model

A complex model can also be the cause of overfitting as complex models take a large number of parameters or variables and these parameters also have a wider range of accepting values.

Noise

Noise in data can cause wrong predictions in any machine-learning model. Overfitting models can automatically train on noise or garbage in the dataset which can result in wrong predictions.

Size of Training Data

An ML model learns less and it tends to overfit if the training data size is smaller.

Tabular Difference of Underfitting and Overfitting

Underfitting	Overfitting
High Bias	High Variance
Simple Model	Complex Model
No model and generalization	Works well on training data but not on test data
High error rate on both training and test data	Low training but high testing error

This was all about model fitness and the concepts of overfitting and underfitting in machine learning.

Conclusion

Model fitness in machine learning is the quality of predictions that a model makes. A number of factors can affect the performance of a model, such as bias, variance, training data size, and model complexity. These factors can cause underfitting or overfitting in the trained model. The article has comprehensively explained underfitting and overfitting along with their causes.

Source: linuxhint.com