
It all depends on the context – and a good RAG!
Last updated: 26.05.2025 10:40
Large language models (LLMs) are currently outdoing each other not only through better performance, but also through ever larger so-called context windows. But what does that actually mean?
When it comes to company-specific knowledge, LLMs only have one access point: the so-called context. The context must contain everything the model needs to generate a useful answer – i.e. the actual query, any behavioral rules and, of course, the relevant knowledge. However, the context is subject to certain restrictions:
The last point in particular poses a challenge, especially with large volumes of data. Even a single large document, such as a user manual, can exceed this limit.
RAG systems
To get around this limitation, so-called RAG systems (Retrieval Augmented Generation) are used. Put simply, the existing knowledge is divided into small fragments – known as chunks – and stored in a special database. When a user asks the bot a question or the agent a task, the system first searches for relevant knowledge fragments in this database. The matching fragments are then given to the LLM in context.
In the best case scenario, a large amount of relevant information is found – but this can also be a problem: If more knowledge is found than fits the context, we have to select. In doing so, we risk losing important information. The selection made by RAG systems is a constant balancing act.
New models
In the last month or two, new LLMs have come onto the market that can process significantly larger quantities of tokens, in some cases up to 10 million. Is this a solution to the context problem? Can we do without RAG systems?
Unfortunately, no. Apart from higher costs and longer response times, there is another problem in practice: the larger the context, the more difficult it is for the models to establish connections. Although individual pieces of information can be found reliably (the famous needle in the haystack), logical or temporal links continue to cause difficulties.
And until these problems are solved, the following still applies:
Context – and a good RAG system – is what counts.
Author:

Steffen Eichenberg
Head of Software Engineering
VIER