...Loading

The image features a bold orange "tip" text in the foreground, set against a vibrant, futuristic background of digital data and technology.

It all depends on the context – and a good RAG!

Last updated: 26.05.2025 10:40

Large language models (LLMs) are currently outdoing each other not only through better performance, but also through ever larger so-called context windows. But what does that actually mean?

When it comes to company-specific knowledge, LLMs only have one access point: the so-called context. The context must contain everything the model needs to generate a useful answer – i.e. the actual query, any behavioral rules and, of course, the relevant knowledge. However, the context is subject to certain restrictions:

The larger the context, the longer it takes to generate an answer.
The greater the context, the more expensive the answer. The costs increase linearly with each additional token. And in German, an average of 1.33 tokens are required per word.
There is an upper limit to the number of tokens that a model can process simultaneously.

The last point in particular poses a challenge, especially with large volumes of data. Even a single large document, such as a user manual, can exceed this limit.

RAG systems

To get around this limitation, so-called RAG systems (Retrieval Augmented Generation) are used. Put simply, the existing knowledge is divided into small fragments – known as chunks – and stored in a special database. When a user asks the bot a question or the agent a task, the system first searches for relevant knowledge fragments in this database. The matching fragments are then given to the LLM in context.

In the best case scenario, a large amount of relevant information is found – but this can also be a problem: If more knowledge is found than fits the context, we have to select. In doing so, we risk losing important information. The selection made by RAG systems is a constant balancing act.

New models

In the last month or two, new LLMs have come onto the market that can process significantly larger quantities of tokens, in some cases up to 10 million. Is this a solution to the context problem? Can we do without RAG systems?

Unfortunately, no. Apart from higher costs and longer response times, there is another problem in practice: the larger the context, the more difficult it is for the models to establish connections. Although individual pieces of information can be found reliably (the famous needle in the haystack), logical or temporal links continue to cause difficulties.

And until these problems are solved, the following still applies:
Context – and a good RAG system – is what counts.

Author:

A smiling man wearing a black hoodie with the name "VIER" printed on it, standing in front of an office.

Steffen Eichenberg

Head of Software Engineering

VIER

Back to the blog

...Loading