The power of LLMs: Language models and their use in companies

For about a year now, large language models (LLMs), especially OpenAI applications, have been on everyone's lips. These LLMs are advanced machine learning systems pre-trained on vast amounts of text data to develop a deep understanding of language.

Their goal is to interpret, process, and generate human language, as well as perform other tasks in the field of Natural Language Processing (NLP), such as translations, identifying customer concerns, or analyzing sentiments in texts. In November 2022, ChatGPT emerged as public access to a generative language model via web chat, surpassing all previous ones. The responses appear natural and are less prone to producing toxic content compared to earlier language models. But what sets ChatGPT apart from previous models? And why does this development lead companies to consider deploying generative language models? What new applications arise?

To answer these questions, let's briefly examine a part of the history of large language models.

Evolution of Large Language Models

Early language models employed rather simple statistical methods. This was functional in many cases, such as in email query recognition. At the same time, these models hit their limits when content became more complex or when the extended context of a statement became relevant. This changed with the introduction of neural networks like Recurrent Neural Networks (RNNs) or Long Short-Term Memory Networks (LSTMs), which became increasingly proficient at understanding contexts. However, these systems still faced challenges in understanding very long sequences and were relatively slow due to sequential processing.

A significant milestone in the development of language models was the introduction of the Transformer architecture by Google researchers in 2017 with the paper "Attention is all you need". Unlike traditional sequential processing models like RNNs or the subsequent LSTMs, which process inputs step by step, Transformer models enable more efficient treatment of text data. They achieve this by implementing an innovative mechanism known as 'Self-Attention'. While LSTMs already utilize attention mechanisms to highlight important information from a sequence, Transformer models place the attention mechanism at the core of their architecture.

Self-Attention allows the model to simultaneously capture the relationships between all words in a sentence by calculating attention weights indicating how strongly each word of the input text is related to other words. Any relationship between individual elements of the input sequence is captured independently, without being constrained by the sequential nature of previous models. As a result, the Transformer model can more efficiently utilize contextual information. It evaluates and weights the input sequence to determine which parts are most relevant for the current task. For example, when translating a sentence, the model can understand the meaning of a word in the context of the entire sentence rather than just focusing on the surrounding words. The ability of Transformers to directly model extensive dependencies without information having to flow through many intermediate steps is a significant advantage.

This allows the models to understand both the complex relationships between words and their positions in the sentence, leading to improved language processing and generation.

One of the most well-known and still widely used Transformer models is the open-source model BERT, also introduced by Google. The model is fine-tuned for tasks such as query or emotion recognition. Since the introduction of the BERT model, many other models have been added, including XLNET, GPT3, LAMBDA, MT NLG, OPT, and BLOOM. BERT is a rather large model, with a size ranging between 110 million and 340 million parameters. However, other models, such as GPT3 or MT NLG, are much larger, comprising up to 175 billion and 340 billion parameters respectively. Parameters are essentially "settings" or "screws" that the model uses to learn and make predictions. The number of parameters determines how detailed and adaptable the model is. It helps the model recognize complex patterns and understand contexts better.

Some of these models, like GPT3, can operate "generatively" – meaning they can autonomously generate texts and are directly applicable to many tasks. Others, like BERT, need to be specifically trained for particular tasks, such as customer query recognition.

By striking the right balance between the number of parameters and the amount of available data, these models can achieve impressive results in a variety of applications.

Why did the big hype only come with ChatGPT?

ChatGPT garnered such significant attention because, unlike any model before it, it successfully increased alignment alongside its great performance in natural language generation. Alignment refers to teaching the model to behave as humans would expect – providing helpful responses, containing minimal biases, being as truthful as possible (thus minimizing hallucinations), and being perceived as safe.

Biases in LLMs are unintended and potentially problematic tendencies in the generated texts, which can stem from the data and language patterns encountered during training. Hallucinations are misleading or false statements generated by the model that are not recognizable at first glance because they seem plausible.

Alignment doesn't play a role in the fundamental training of a language model. Instead, the model learns to predict the most probable next word in a given context based on a large dataset from the internet. This results in reproducing unfiltered content that can be highly biased or incorrect. Specialized training is required to address these challenges. ChatGPT is so good because it was trained using reinforcement learning from human feedback (RLHF) based on a very large dataset of human-annotated data. Additionally, the model is better at understanding larger conversational contexts, enabling it to provide appropriate responses even in longer dialogues. It is publicly available and can continue to learn from feedback, avoiding similar missteps to Galactica by Facebook. Galactica, instead of assisting in scientific writing, invented scientific articles and references, leading to its discontinuation shortly after release. The fact that OpenAI dared to release ChatGPT a few weeks after the failure of Galactica (early November 2022) demonstrates the confidence the company had in the performance of its model. However, hallucinations, like those produced by Galactica, remain a problem even with ChatGPT, which needs to be addressed when deploying the model live.

With the release of GPT-4 earlier this year, OpenAI has continued to advance by providing an even better and larger model. Both models are continuously optimized – most recently in early November 2023 with an increase in the maximum input size to 128,000 tokens (approximately 85,000 words). In AI and Large Language Models (LLMs), a "token" refers to the smallest processing unit, often a word or part of a word. These tokens are used by models to understand language, generate text, and recognize complex linguistic patterns. Additionally, OpenAI has made available the direct integration of Retrieval Augmented Generation (RAG), allowing files to be uploaded and questions about the contents of these files to be posed directly to the model.

Is OpenAI the only option for powerful generative AI?

While OpenAI has been at the forefront of generative AI development over the past year and has a partner in Microsoft enabling GDPR-compliant usage in Europe, there are alternative models that boast similar high quality. ChatGPT not only brought public attention to generative AI but also triggered a focus on this technology in academia. This has led to the continuous emergence of new and improved models almost weekly. Particularly, the open-source sector is catching up and driving developments. A prime example is the pre-trained language model Llama2, released by Facebook in July of this year and usable for commercial purposes. Based on this model, continually optimized variants are being developed, including for specific use cases such as in the medical field. Moreover, OpenAI faces competition in Germany from Aleph Alpha (at least in the future, as the company received a total investment of over $500 million in November 2023) and in the United States from companies like Google or Anthropic, which Amazon has invested over $4 billion in.

Therefore, there are numerous alternatives to the well-known OpenAI models that can be just as good for specific use cases as the models of the current market leader.


Anja-Linnenbuerger VIER-Head-of-Research
Dr. Anja LinnenbürgerHead of ResearchVIER GmbH
Back to blog