LLM

What are LLMs, so-called Large Language Models, actually? What advantages do they offer and what challenges need to be considered? Find out more in this article.

What are LLMs?

A Large Language Model (LLM) is an AI model for language processing that has been trained with an extremely large number of parameters and on very extensive text data sets. LLMs are specialized in generating and understanding text. They belong to the category of neural networks (usually based on the Transformer architecture) and learn statistical relationships between words and sentences during training. This enables them to predict texts: given a beginning, they suggest the most likely next word – and do so iteratively, allowing them to formulate entire paragraphs.

Properties

The attribute "large" refers primarily to the model size, i.e. the number of trainable parameters, which can run into the billions. In addition, the training corpus is very large (in some cases the entire publicly available Internet of texts). This size gives the model the following capabilities:

Broad world knowledge: Since countless books, articles and websites were included in the training data, LLMs have absorbed an enormous amount of general and specialized knowledge.
Linguistic diversity: LLMs are usually proficient in several languages and dialects, often even programming languages.
Generalization ability: They can perform various tasks in natural language (translating, summarizing, answering questions, writing creative texts), although they are usually not explicitly specialized in individual tasks. Their probability distribution over language learned through training is often sufficient to solve new tasks by prompting.
Context processing: Modern LLMs can take long input sequences into account (e.g. several pages of text), which enables complex dialogs and multi-level queries.

However, LLMs require enormous computational resources for training and operation, and their output may contain falsehoods (hallucinations) as they do not perform true verification.

Examples

Well-known LLMs are OpenAI GPT-3/GPT-4, which are used in ChatGPT and other applications, Google's PaLM and LaMDA, Meta's LLaMA or Anthropic's Claude. They are all based on similar principles, but differ in size, training data and fine-tuning. LLMs have recently caused a stir because their abilities in writing texts, answering questions and even programming have opened up qualitatively new possibilities. They are now being integrated into many applications via APIs – from customer service chatbots to word processing and code assistants.

Back to the overview