Supervised learning

What is supervised learning and how does it work? Find out more in this article.

Definition

Supervised learning is a form of machine learning in which a model is trained using a labeled data set. Labeled means that the desired output (label, e.g. "dog") is known for each input (e.g. an image). The model therefore receives continuous feedback during training as to what would be correct and adjusts its parameters accordingly. The aim is for the trained model to later provide the correct output for new, unknown inputs – it has generalized from the examples.

How it works

In supervised learning, the algorithm is presented with pairs of input and target output. Based on an error measure (e.g. difference between model output and target output), the model parameters are iteratively optimized (typically using gradient descent). This process is called training. There are two main task types in supervised learning:

Classification: The label is a category. Example: Input are emails, labels "spam" or "non-spam". The model learns to classify new emails into these categories. Further examples: Image recognition (multiple classes), speech recognition (words as classes), diagnostic systems (disease X yes/no).
Regression: The label is a continuous value. Example: Input are apartment characteristics, label is the rental price. The model learns a prediction function for numerical values. Other examples: Prediction of stock prices, temperature prediction, estimated value calculation.

During training, part of the data is often used as validation to check whether the model generalizes. After training, the performance is evaluated with test data that the model has never seen before.

Examples

Supervised learning is the most widely used ML method because many tasks can be naturally formulated as a labeling problem. Some everyday examples:

Image recognition: a network has been trained with millions of labeled images (cat, dog, car, etc.) and can now label photos.
Speech recognition: Systems such as Google Speech have been trained with hours of audio recordings plus associated text – they learn to convert sound patterns into letters/words.
Medical diagnosis: An ML model receives patient files with findings and diagnoses and learns which patterns of findings indicate which illness.
Quality control: Sensor or image data from products is used in industry: Label "faulty" or "ok". The model learns to recognize errors automatically.

Limits

Supervised learning requires an extensive labeled data set. Obtaining labels is often expensive or time-consuming (think of manually labeling millions of images). In addition, the model can only be as good as the data: Bias or inconsistencies in the labels lead to corresponding errors in the model. It also learns nothing about areas that were not covered in the training. This is why methods such as self-supervised or unsupervised learning, which do not require complete labeling, are gaining in importance – but wherever high-quality labeled data is available, supervised learning methods are still extremely powerful.

Back to the overview