SemanticAssert is a testing library designed to address the unique challenges that arise when working with Large Language Models (LLMs) like OpenAI's GPT-3. When developing applications and systems that utilize LLMs, one of the primary hurdles is dealing with the non-deterministic nature of the model's responses. LLMs can generate different responses for the same input, making it difficult to predict their exact output. This leads to a fundamental question: How can we effectively test LLM-based applications?
Imagine a scenario where you are developing a Question Answering (QA) system. In this context, the model receives user questions, and the expectation is for it to provide answers by searching for relevant information in documents, such as user manuals for washing machines in PDF format. The process involves generating embeddings from these documents, finding the extract that best answers the user's query, and using the LLM to craft a natural language response based on the obtained information. The challenge arises when you attempt to create meaningful tests for the final part of this process, or the entire QA cycle. LLM responses are not deterministic, which means that the same input may yield varying responses, even if the core content remains the same. However, despite the variability in verbosity and phrasing, you have a clear idea of the expected content of the response. SemanticAssert helps you address this challenge by allowing you to define and validate the core content of LLM-generated responses, regardless of their specific wording. It offers a structured approach to testing LLM-powered applications, ensuring that your system consistently provides accurate and relevant information to users.
The concept behind SemanticAssert is straightforward. SemanticAssert leverages a Large Language Model (LLM) to determine the correctness of responses. Let's take a look at a simple example of a test case:
string expected = "Mount Teide has 3718 meters";
string actual = "Mount Teide, located on the island of Tenerife in Spain, has an elevation of approximately 3718 meters above sea level. It is the highest peak in Spain and one of the tallest volcanoes in the world when measured from its base on the ocean floor."
await Async.Assert.AreSimilar(expected, actual);
In this code snippet, Async.Assert.AreSimilar is employed to compare the 'expected' and 'actual' responses. It ensures that the core content in the 'actual' response matches what is expected, allowing for variations in wording, which is especially useful when dealing with non-deterministic LLM responses. This comparison helps verify the accuracy and relevance of the information provided by the LLM.
In addition to the 'AreSimilar' Assert, SemanticAssert provides several other Asserts that can simplify your daily testing routines. Here are a couple of examples:
You can use the 'AreSimilar' Assert with a similarity threshold, which specifies the minimum similarity value you consider valid between the 'expected' and 'actual' responses. This is especially useful when you want to allow for some variation in the responses generated by the Large Language Model (LLM). Here's an example:
string expected = "Mount Teide has 3718 meters";
string actual = "Mount Teide, located on the island of Tenerife in Spain, has an elevation of approximately 3718 meters above sea level. It is the highest peak in Spain and one of the tallest volcanoes in the world when measured from its base on the ocean floor."
await Async.Assert.AreSimilar(expected, actual, similarityThreshold: 0.8);
In this code, the 'similarityThreshold' parameter allows you to define the minimum acceptable similarity between the 'expected' and 'actual' responses.
SemanticAssert also provides an Assert for verifying that your texts are in the same language. This can be crucial when working with multilingual content or LLMs that may produce responses in various languages. Ensuring language consistency in your responses is essential for providing accurate information to users.
string expected = "This is a text in English";
string actual = "This is another text that should not raise an exception because it's in the same language";
await Async.Assert.AreInSameLanguage(expected, actual);
In this code snippet, the 'AreInSameLanguage' Assert is used to confirm that both 'expected' and 'actual' are in the same language, ensuring language consistency for your application's responses. This is crucial for providing coherent and accurate information to users.
To use SemanticAssert, you need to configure it to work with a Large Language Model (LLM). Currently, SemanticAssert is compatible with Azure OpenAI. Configuration is straightforward and can be done as follows:
Configuration.Completion.AddAzureTextCompletion(
"<Your_AzureOpenAI_ChatDeploymentName>",
"<Your_AzureOpenAI_Endpoint>",
"<Your_AzureOpenAI_ApiKey>"
);
Configuration.Embeddings.AddAzureTextEmbeddingGeneration(
"<Your_AzureOpenAI_TextEmbeddingsDeploymentName>",
"<Your_AzureOpenAI_Endpoint>",
"<Your_AzureOpenAI_ApiKey>"
);
By using these configurations, you can set up SemanticAssert to work seamlessly with your Azure OpenAI deployment.
If you wish to customize the default configurations of SemanticAssert, you can do so through some static methods:
Configuration.AssertProvider.AddAssertProvider(new SKCosineAssertProvider());
The above snippet allows you to generate and compare embeddings when using similarity functions with a threshold. This offers an alternative to using a 'Prompt' for comparisons.
Please note that this documentation is a work in progress, and more details will be added in the future to help you make the most of SemanticAssert.