Transformer

What is a transformer in the context of machine learning and how is it used? Find out more here.

Definition

The Transformer is a neural network architecture based on the concept of self-attention and was originally developed for speech processing. It has replaced RNNs/CNNs in many areas and forms the basis for modern LLMs (e.g. GPT, BERT).

Special features

Self-attention: Each word in a sentence considers all other words context-dependently; this allows parallel processing instead of sequential processing.
Scalability: Transformers can be parallelized very well, making large models such as GPT-4 trainable in the first place.

Applications

Transformers have revolutionized machine translation, text generation and text classification and are now also used in computer vision (vision transformers) and multimodal models (text-image combinations).

Back to the overview