...Loading
Missing alt text

What role does VIER play in the context of large language models?

Last updated: 05.12.2023 09:00

With the advancements of the past year, the potential for AI applications in customer service has grown tremendously. Language models are more versatile and achieve significantly better results than before, even without extensive training. Furthermore, increased alignment makes the deployment of generative language models in live settings possible.

New applications based on LLMs are needed in order to continue offering optimal, customer-oriented solutions, as natural communication in human-machine interaction is becoming increasingly commonplace. At the same time, new challenges arise in terms of the security and performance of applications.

As a provider of innovative software solutions, VIER selects and optimizes the most powerful models for specific use cases in order to deploy them securely and stably in live customer environments. To this end, VIER has been working with its own AI teams since the beginning of 2023 on

  • developing a model garden that contains the best large language models for the respective use case

  • a gateway that enables the secure and data protection-compliant use of various LLMs

  • a chat solution that enables a connection between knowledge processing and generative language models.

As a company, we have also integrated the services of the LLMs into our products VIER Cognitive Voice Gateway, VIER Copilot and VIER Interaction Analytics and made them available to our customers in June 2023.

VIER model garden

The Model Garden is a place where VIER stores and provides information on LLMs that have been tested for specific use cases. It provides an overview of current developments that are important for live use and also offers insights into the quality, response time, hosting and costs of the various models.

Why our own VIER Model Garden?

There are many LLM benchmarks and most new models are tested against these benchmarks. The results of these benchmarks are summarized in LLM leaderboards, such as the Open LLM Leaderboard or the LMSYS Leaderboard, which also integrates commercial models and human reviews.


Standard benchmarks of the language models

  • MMLU: maps knowledge acquired during pre-training and includes 57 topics from the STEM subjects (mathematics, computer science, natural sciences, technology), the humanities, social sciences and more.

  • HellaSwag: is a general knowledge test for completing sentences that is easy for humans and used to be very challenging for LLMs.

  • Commonsense QA: contains 12,247 general knowledge questions, each with 5 choices.

  • OpenBookQA: consists of 5,957 multiple-choice questions at elementary science level (4,957 training data, 500 development, 500 test).

  • ARC Benchmark: a more challenging QA task that includes common sense reasoning.

  • TriviaQA Benchmark: is a realistic, text-based question-answer dataset that includes 950,000 question-answer pairs from 662,000 documents from Wikipedia and the web.

  • TruthfulQA: measures the extent to which a model reproduces falsehoods that frequently occur on the internet.

  • Chatbot Arena Elo Rating: an LLM battle platform with human ratings. Over 70,000 user votes are combined to calculate Elo ratings.


Of course, FOUR uses this information to keep up with the latest developments. However, there are several reasons why this information is far from sufficient to make a safe decision on which model to use for which use case:

  • None of the benchmarks mentioned above use German as a basis for evaluation. The VIER Model Garden provides information on the quality of the models in German.

  • The benchmarks are not application-specific. Even if a model is good at answering knowledge questions (MMLU) or giving common sense answers (e.g. HellaSwag, CommensenseQA) and does not tend to repeat falsehoods often spread on the internet (e.g. TruthfulQA), this does not automatically mean that German texts are summarized correctly. Even specific summary benchmarks are mostly created with English newspaper datasets, which are not comparable to the relevant application data. The VIER Model Garden provides information on the quality of models in specific use cases with customer-relevant data (e.g. transcripts from telephone calls).

  • Most benchmarks deal with quality, which is undoubtedly the most important criterion. However, for different use cases there are other important aspects such as response time and cost. The VIER Model Garden provides an overview of the most important criteria for users and thus supports the decision for a model in terms of quality, response time, hosting (data security) and costs.

  • The Huggingface Open LLM Leaderboard only contains open source models. The VIER Model Garden compares the quality, response time and costs of open source models with those of commercial models.

  • The Huggingface Leaderboard and most other rankings are aimed at experts and developers. They are usually difficult to understand. The VIER Model Garden is aimed at potential users and provides information in a structured and understandable way.

  • In connection with the use cases that are important for customers, the Model Garden consists of several sections. The VIER Model Garden shows that a model can be good in one use case but not perform well in other use cases.

VIER therefore tests relevant models in detail in order to offer companies the best options for the respective use case. In addition to selecting the right model for the respective use case, there are a few other aspects to consider for the secure use of LLMs at enterprise level.

The VIER AI Gateway and the new way to use Conversational AI in companies

The secure use of LLMs requires expertise in prompt engineering and the systematized testing of different prompt formats against each other, which makes it possible to create powerful applications. VIER has extensive experience in setting up guardrails to keep the models on track in the application. In particular, this involves checking that models adhere to the instructions in the prompt in chat applications, for example, and do not hallucinate or provide information on topics that are not intended in the corresponding use case. To this end, VIER pursues a multi-stage approach that includes fine-tuning the prompt as well as implementing guard rails via our flow management, blacklists and conversation guidelines for the models, which is combined in VIER's NEO-CAI ("New Enterprise Optimized Conversational Artificial Intelligence") project.

NEO-CAI offers Retrieval Augmented Generation (RAG) in a customer-specific version to make know-how available in a targeted manner. VIER thus combines LLM's ability to provide coherent and effective answers with query-based approaches that search for the right information from existing documents. This makes it possible, for example, to process FAQs or questions about product descriptions completely automatically. For these applications to function optimally, it is important, among other things, to cut the content documents into meaningful parts (chunking), to find a good mechanism for translating these documents into vectors (embedding) and a suitable application that retrieves the data for the specific question from the vector database and feeds it into the LLM in the right form to generate answers.

Model access takes place via our AI gateway, which offers detailed data protection features in addition to authentication, billing and monitoring as well as the administration of the various model accesses. This includes optional anonymization or pseudonymization of requests, which ensures that a model never receives customer-specific data such as names, customer numbers or addresses and that the response still has the same naturalness as in direct communication with the selected model. VIER ensures anonymization via an internal VIER Cognesys technology that guarantees that customer data does not leave the VIER systems.

VIER therefore ensures that the best available models can be used securely in the respective use case of our customers. To this end, VIER offers individualized chat solutions as well as the integration of LLMs into our Cognitive Voice Gateway, Copilot and Interaction Analytics products.

Using LLMs securely and in compliance with data protection regulations

The development of LLMs (Large Language Models) is progressing rapidly. We are only at the beginning of a development that will change how we use information and how we communicate. VIER is ready to meet this challenge together with our customers and use the possibilities of LLMs to improve both the customer experience and the employee experience.

VIER relies on a mix of different technologies such as the Model Garden, the AI Gateway and NEO CAI technology to help companies navigate the complex landscape of LLMs. These tools enable companies to find the best models for their needs while ensuring that their applications are secure and privacy compliant.

The journey towards off-the-shelf use of LLMs in customer applications has only just begun. If you would like to learn more about specific use cases, integrations or testing, please contact us.

    Author:

    Missing alt text

    Dr. Anja Linnenbürger

    Head of Research - Psychology & AI

    VIER

    More information

    Read in this blog post how language models have evolved and how ChatGPT differs from other models.

    Back to the blog
    ...Loading