LLM
What are LLMs, so-called Large Language Models, actually? What advantages do they offer and what challenges need to be considered? Find out more in this article.
What are LLMs?
A Large Language Model (LLM) is an AI model for language processing that has been trained with an extremely large number of parameters and on very extensive text data sets. LLMs are specialized in generating and understanding text. They belong to the category of neural networks (usually based on the Transformer architecture) and learn statistical relationships between words and sentences during training. This enables them to predict texts: given a beginning, they suggest the most likely next word – and do so iteratively, allowing them to formulate entire paragraphs.
Properties
The attribute "large" refers primarily to the model size, i.e. the number of trainable parameters, which can run into the billions. In addition, the training corpus is very large (in some cases the entire publicly available Internet of texts). This size gives the model the following capabilities:
Examples
Well-known LLMs are OpenAI GPT-3/GPT-4, which are used in ChatGPT and other applications, Google's PaLM and LaMDA, Meta's LLaMA or Anthropic's Claude. They are all based on similar principles, but differ in size, training data and fine-tuning. LLMs have recently caused a stir because their abilities in writing texts, answering questions and even programming have opened up qualitatively new possibilities. They are now being integrated into many applications via APIs – from customer service chatbots to word processing and code assistants.