Skip to content

Commit

Permalink
Merge pull request #629 from aws-solutions/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
mobri2a authored Aug 10, 2023
2 parents aab2762 + 9a58ad8 commit 7fb99e0
Show file tree
Hide file tree
Showing 6 changed files with 57 additions and 31 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [5.4.1] - 2023-07-27
### Updated

- LLM README documentation

## [5.4.0] - 2023-07-27

__*Note: we recommend that you first deploy these changes in a non-production environment. This is true for all releases, but especially important for minor and major releases.*__
Expand Down
32 changes: 16 additions & 16 deletions docs/LLM_Retrieval_and_generative_question_answering/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Large Language Model - Query Disambiguation for Conversational Retrieval, and Generative Question Answering

QnABot can now use a large language model (LLM) to **(1) Disambiguate follow up questions to generate good search queries** and/or **(2) Generate answers to questions from retrieved FAQS or passages**.
QnABot can now use a large language model (LLM) to **(1) Disambiguate follow up questions to generate good search queries** and/or **(2) Generate answers to questions from retrieved search results or text passages**.


**(1) Disambiguate follow up questions** that rely on preceding conversation context. The new disambiguated, or standalone, question can then be used as search queries to retrieve the best FAQ, passage or Kendra match.
Expand All @@ -11,14 +11,14 @@ With the new LLM Disambiguation feature enabled, given the chat history context:
`[{"Human":"Who was Little Bo Peep?"},{"AI":"She is a character from a nursery rhyme who lost her sheep."}]`
and a follow up question:
`Did she find them again?`
QnAbot can rewrite that question to provide all the context required to search for the relevant FAQ or passage:
QnABot can rewrite that question to provide all the context required to search for the relevant FAQ or passage:
`Did Little Bo Peep find her sheep again?`.


**(2) Generate answers to questions** from context provided by Kendra search results, or from text passges created or imported directly into QnAbot. Some of the benefits include:
**(2) Generate answers to questions** from context provided by Kendra search results, or from text passages created or imported directly into QnABot. Some of the benefits include:
- Generated answers allow you to reduce the number of FAQs you need to maintain since you can now synthesize concise answers from your existing documents in a Kendra index, or from document passages stored in QnABot as 'text' items.
- Generated answers can be short, concise, and suitable for voice channel contact center bots as well as website / text bots.
- Generated answers are fully compatible with QnABot's multi-language support - users can interact in their chosen languages and recieve generated answers in the same language.
- Generated answers are fully compatible with QnABot's multi-language support - users can interact in their chosen languages and receive generated answers in the same language.

Examples:
With the new LLM QA feature enabled, QnABot can answer questions from the [AWS WhitePapers](https://catalog.us-east-1.prod.workshops.aws/workshops/df64824d-abbe-4b0d-8b31-8752bceabade/en-US/200-ingesting-documents/230-using-the-s3-connector/231-ingesting-documents) such as:
Expand Down Expand Up @@ -50,12 +50,10 @@ You can use disambiguation and generative question answering, as shown below:
- Run throughput testing and inference endpoint scale testing to properly estimate deployment size/costs.. NOTE we do not yet have any scale/costing guidelines, so please share your findings.


With this release, you can choose with LLM to use with QnABot:
1. An open source LLM model automtically deployed and hosted on an Amazon SageMaker endpoint - see https://huggingface.co/tiiuae/falcon-40b-instruct
With this release, you can choose which LLM to use with QnABot:
1. An open source LLM model automatically deployed and hosted on an Amazon SageMaker endpoint - see https://huggingface.co/tiiuae/falcon-40b-instruct
2. Any other LLM model or API you like via a user provided Lambda function.

_**NOTE: Optimize Kendra:** When using Kendra, we recommend requesting a larger document excerpt to be returned from queries. In the browser window you are using for AWS Management Console navigate to [Kendra Service Quota](https://console.aws.amazon.com/servicequotas/home/services/kendra/quotas/L-196E775D), choose Request quota increase, and change quota value to a number up to a max of 750._

### 1. Amazon SAGEMAKER

QnABot provisions a Sagemaker endpoint running the Hugging Face [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) model
Expand All @@ -74,6 +72,8 @@ By default a 1-node ml.g5.12xlarge endpoint is automatically provisioned. For la

Use a custom Lambda function to experiment with LLMs of your choice. Provide your own lambda function that takes a *question*, *context*, and a QnABot *settings* object. Your Lambda function can invoke any LLM you choose, and return the prediction in a JSON object containing the key, `generated_text`. You provide the ARN for your Lambda function when you deploy or update QnABot.

*See [QnABot on AWS Sample Plugins](https://github.com/aws-samples/qnabot-on-aws-plugin-samples/blob/develop/README.md) for some sample customizable plugins to integrate QnABot with your choice of leading LLM providers including our own Amazon Bedrock service (in preview), Anthropic, and AI21. Note that the plugin project is listed here for reference only and is a separate project from the QnABot project.*

#### Deploy Stack for Embedding models invoked by a custom Lambda Function

- *(for Kendra Fallback)* set `DefaultKendraIndexId` to the Index Id (a GUID) of your existing Kendra index containing ingested documents
Expand All @@ -88,7 +88,7 @@ Your Lambda function is passed an event of the form:
{
"prompt": "string", // prompt for the LLM
"parameters":{"temperature":0,...}, // model parameters object containing key / value pairs for the model parameters setting (defined in QnABot settings - see below)
"settings":{"key1":"value1",...} // settings object containing all default and custom QnAbot settings
"settings":{"key1":"value1",...} // settings object containing all default and custom QnABot settings
}
```
and returns a JSON structure of the form:
Expand Down Expand Up @@ -116,7 +116,7 @@ When QnABot stack is installed, open Content Designer **Settings** page:

- **ENABLE_DEBUG_RESPONSES** set to TRUE to add additional debug information to the QnABot response, including any language translations (if using multi language mode), question disambiguation (before and after), and inference times for your LLM model(s).

- **ES_SCORE_TEXT_ITEM_PASSAGES:** should be "true" to enable the new QnABot text passage items to be retrieved and used as input context for geneartive QA Summary answers. NOTE - 'qna' items are queried first, and in none meet the score threshold, then QnABot queries the text field of 'text' items
- **ES_SCORE_TEXT_ITEM_PASSAGES:** should be "true" to enable the new QnABot text passage items to be retrieved and used as input context for generative QA Summary answers. NOTE - 'qna' items are queried first, and in none meet the score threshold, then QnABot queries the text field of 'text' items

- **EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD:** applies only when Embeddings are enabled (recommended) and if ES_SCORE_TEXT_ITEM_PASSAGES is true. If embedding similarity score on text item field is under threshold the match is rejected. Default is 0.80.

Expand All @@ -128,17 +128,17 @@ When QnABot stack is installed, open Content Designer **Settings** page:

- **LLM_API:** one of SAGEMAKER, LAMBDA - based on the value chosen when you last deployed or updated the QnABot Stack.
- **LLM_GENERATE_QUERY_ENABLE:** set to TRUE or FALSE to enable or disable question disambiguation.
- **LLM_GENERATE_QUERY_PROMPT_TEMPLATE:** the prompt template used to construct a prompt for the LLM to disabiguate a followup question. The template may use the placeholders:
- **LLM_GENERATE_QUERY_PROMPT_TEMPLATE:** the prompt template used to construct a prompt for the LLM to disambiguate a followup question. The template may use the placeholders:
- `{history}` - placeholder for the last `LLM_CHAT_HISTORY_MAX_MESSAGES` messages in the conversational history, to provide conversational context.
- `{input}` - placeholder for the current user utterance / question
- **LLM_GENERATE_QUERY_MODEL_PARAMS:** parameters sent to the LLM model when disambiguating follow-up questions. Default: `{"temperature":0}`. Check model documentation for additional values that your model provider accepts.
- **LLM_QA_ENABLE:** set to TRUE or FALSE to enable or disable generative answers from passages retreived via embeddings or Kendra fallback (when no FAQ match its found). NOTE LLM based generative answers are not applied when an FAQ / QID matches the question.
- **LLM_QA_USE_KENDRA_RETRIEVAL_API:** set to TRUE or FALSE to enable or disable the use of Kendra's retrieval API. When enabled, QnABot uses Kendra's Retrieve api to retrieve semantically relevant passages of up to 200 token words from the documents in your index (not FAQs). When disabled, QnAbot use the default Kendra Query API to search documents and FAQs. Takes effect only when LLM_QA_ENABLE is TRUE. The default is TRUE (recommended) when LLM QA is enabled. Note: this feature will only search the first configured index. See https://docs.aws.amazon.com/kendra/latest/APIReference/API_Retrieve.html
- **LLM_QA_ENABLE:** set to TRUE or FALSE to enable or disable generative answers from passages retrieved via embeddings or Kendra fallback (when no FAQ match its found). NOTE LLM based generative answers are not applied when an FAQ / QID matches the question.
- **LLM_QA_USE_KENDRA_RETRIEVAL_API:** set to TRUE or FALSE to enable or disable the use of Kendra's retrieval API. When enabled, QnABot uses Kendra's Retrieve api to retrieve semantically relevant passages of up to 200 token words from the documents in your index (not FAQs). When disabled, QnABot use the default Kendra Query API to search documents and FAQs. Takes effect only when LLM_QA_ENABLE is TRUE. The default is TRUE (recommended) when LLM QA is enabled. Note: this feature will only search the first configured index. See https://docs.aws.amazon.com/kendra/latest/APIReference/API_Retrieve.html
- **LLM_QA_PROMPT_TEMPLATE:** the prompt template used to construct a prompt for the LLM to generate an answer from the context of a retrieved passages (from Kendra or Embeddings). The template may use the placeholders:
- `{context}` - placeholder for passages retrieved from the seartch query - either a QnABot 'Text' item passage, or the Top `ALT_SEARCH_KENDRA_MAX_DOCUMENT_COUNT` Kendra passages
- `{context}` - placeholder for passages retrieved from the search query - either a QnABot 'Text' item passage, or the Top `ALT_SEARCH_KENDRA_MAX_DOCUMENT_COUNT` Kendra passages
- `{history}` - placeholder for the last `LLM_CHAT_HISTORY_MAX_MESSAGES` messages in the conversational history, to provide conversational context.
- `{input}` - placeholder for the current user utterance / question
- `{query}` - placeholder for the generated (disambiguated) query created by the generate query feature. NOTE the default prompt does not use `query` in the qa prompt, as it provides the conversation history and current user input instead, but you can change the prompt to use `query` inseatd of, or in addiotion to `input` and `history` to tune the LLM answers.
- `{query}` - placeholder for the generated (disambiguated) query created by the generate query feature. NOTE the default prompt does not use `query` in the qa prompt, as it provides the conversation history and current user input instead, but you can change the prompt to use `query` instead of, or in addition to `input` and `history` to tune the LLM answers.
- **LLM_QA_NO_HITS_REGEX:** when the pattern specified matches the response from the LLM, e.g. `Sorry, I don't know`, then the response is treated as no_hits, and the default `EMPTYMESSAGE` or Custom Don't Know ('no_hits') item is returned instead. Disabled by default, since enabling it prevents easy debugging of LLM don't know responses.
- **LLM_QA_MODEL_PARAMS:** parameters sent to the LLM model when generating answers to questions. Default: `{"temperature":0}`. Check model documentation for additional values that your model provider accepts.
- **LLM_QA_PREFIX_MESSAGE:** Message use to prefix LLM generated answer. May be be empty.
Expand All @@ -163,7 +163,7 @@ In Content Designer, choose **Add**, select **text**, enter an Item ID and a Pas

QnABot saves your passage, along with the text embeddings; for best results when using native passage retrieval in QnABot, be sure to enable [Semantic Search using Text Embeddings](../semantic_matching_using_LLM_embeddings/README.md).

Test your queries match the desired text item using the TEST tab in Content Designer. To test matches for text item passages, select the appropriate drop down before choosing SEARCH. Compare scores on "qna questions" to the configured threshold setting `EMBEDDINGS_SCORE_THTRESHOLD` and for passages to the threshold setting `EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD`. You may need to adjust thresholds to get the desired behavior when using the bot with the web client.
Test your queries match the desired text item using the TEST tab in Content Designer. To test matches for text item passages, select the appropriate drop down before choosing SEARCH. Compare scores on "qna questions" to the configured threshold setting `EMBEDDINGS_SCORE_THRESHOLD` and for passages to the threshold setting `EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD`. You may need to adjust thresholds to get the desired behavior when using the bot with the web client.

You can also import your passages from a JSON file using Content Designer import. From the Tools menu on the top left, choose **Import**, open **Examples/Extensions** and choose the LOAD button next to **TextPassage-NurseryRhymeExamples** to import two nursery rhyme text items.
To import your own passages create and import a JSON file with the structure similar to below:
Expand Down
27 changes: 17 additions & 10 deletions docs/semantic_matching_using_LLM_embeddings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ For now this is an Experimental feature. We encourage you to try it on non-produ

With this release, QnaBot can now use
1. PREFERRED: Embeddings from a Text Embedding model hosted on an Amazon SageMaker endpoint - see https://huggingface.co/intfloat/e5-large
2. CUSTOMIZABLE: Embeddings from a user provided Lambda function - explore alternate pretrained and/or fine tuned embeddings models.
2. CUSTOMIZABLE: Embeddings from a user provided Lambda function - explore alternate pre-trained and/or fine tuned embeddings models.

## 1. Amazon Sagemaker (PREFERRED)

Expand All @@ -30,9 +30,11 @@ By setting the parameter `SagemakerInitialInstanceCount` to `0`, a [Serverless S
![CFN Params](./images/CF_Params_Sagemaker.png)


## 3. Lambda function
## 2. Lambda function

Use a custom Lambda function to use any Embedding API or embedding model on Sagemaker to generate embeddings.
Use a custom Lambda function to use any Embedding API or embedding model on Sagemaker to generate embeddings.

*See [QnABot on AWS Sample Plugins](https://github.com/aws-samples/qnabot-on-aws-plugin-samples/blob/develop/README.md) for a plugin to integrate QnABot with our Amazon Bedrock service (in preview) for embeddings. Note that the plugin project is listed here for reference only and is a separate project from the QnABot project.*

### Deploy Stack for Embedding models invoked by a custom Lambda Function

Expand Down Expand Up @@ -72,20 +74,25 @@ When QnABot stack is installed, open Content Designer **Settings** page:

**EMBEDDINGS_ENABLE:** to enable / disable use of semantic search using embeddings, set `EMBEDDINGS_ENABLE` to FALSE.
- Set to FALSE to disable the use of embeddings based queries.
- Set to TRUE to re-enble the use of embeddings based queries after previously setting it to FALSE. NOTE - Setting TRUE when the stack has `EmbeddingsAPI` set to DISABLED will cause failures, since the QnABot stack isn't provisioned to support generation of embeddings.
- Set to TRUE to re-enable the use of embeddings based queries after previously setting it to FALSE. NOTE - Setting TRUE when the stack has `EmbeddingsAPI` set to DISABLED will cause failures, since the QnABot stack isn't provisioned to support generation of embeddings.
- If you disable embeddings, you will likely also want to re-enable keyword filters by setting `ES_USE_KEYWORD_FILTERS` to TRUE.
- If you add, modify, or import any items in Content Designer when set `EMBEDDINGS_ENABLE` is false, then embeddings won't get created and you'll have to reimport or re-save those items after reenabling embeddings again

**EMBEDDINGS_SCORE_THRESHOLD:** to customize the score threshold, change the value of `EMBEDDINGS_SCORE_THRESHOLD`. Unlike regular elasticsearch queries, embeddings queries always return scores between 0 and 1, so we can apply a threshold to separate good from bad results.
- If embedding similarity score is under threshold the match it's rejected and QnABot reverts to
- Trying to find a match on the answer field, only if ES_SCORE_ANSWER_FIELD is set to TRUE (see above).
- Trying to find a match on the answer field, only if ES_SCORE_ANSWER_FIELD is set to TRUE (see above).
- Text item passage query
- Kendra fallback
- or no_hits
- Use the Content Designer TEST tab to see the hits ranked by score for your query results.
- The default is 0.85 for now but you may well need to modify this based on your embedding model and your experiments.
- The default is 0.85 for now but you will likely need to modify this based on your embedding model and your experiments.

**EMBEDDINGS_SCORE_ANSWER_THRESHOLD:** to customize the answer score threshold, used only when ES_SCORE_ANSWER_FIELD is TRUE (see above), change the value of `EMBEDDINGS_SCORE_ANSWER_THRESHOLD`.
- If embedding similarity score for answer field query is under threshold the match it's rejected and QnABot reverts to Kendra fallback or no_hits
- Use the Content Designer TEST tab to see the hits ranked by score for your answer field query results. Select to "Score on answer field" checkbox to see answer field scores.
- The default is 0.80 for now but you may well need to modify this based on your embedding model and your experiments.

- If embedding similarity score for answer field query is under threshold the match it's rejected and QnABot reverts to Text item passage query, Kendra fallback or no_hits
- Use the Content Designer TEST tab to see the hits ranked by score for your answer field query results. For **Match on**, choose *qna item answer* to see answer field scores.
- The default is 0.80 for now but you will likely need to modify this based on your embedding model and your experiments.

**EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD:** to customize the passage score threshold, change the value of `EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD`.
- If embedding similarity score for text item passage field query is under threshold the match it's rejected and QnABot reverts to Kendra fallback or no_hits
- Use the Content Designer TEST tab to see the hits ranked by score for your answer field query results. For **Match on**, choose *text item passage* to see passage field scores.
- The default is 0.80 for now but you will need likely to modify this based on your embedding model and your experiments.
Loading

0 comments on commit 7fb99e0

Please sign in to comment.