This is an app for document reading and asking question about the document. This app is powered by LLM (GPT-3.5). There is no direct question answering solution by GPT for a bigger document. GPT context has limitation of 2049 tokens. So, we have to apply a trick to split document into different paragraph and embedded the paragraphs. When any question is asked that would be converted to word embedding and perform a semantic search on embeddings of full document and will find most relevant paragraph. The most relevant paragraph will be injected into chatgpt prompt through API and will get a generative answer. This process has an awesome result.
- [GPT-3.5 Question Answer App](#GPT-3.5 Question Answer App)
This app has used openai Embedding and Prompt engineering. I have taken the base code taken from the OpenAI API coding examples. I have applied object oriented approach to make all codes more structured and better usable. I have used my resume as an example of data in data folder. Datafram paragraph and embedding files are also saved in the same folder. If you use this code you have to apply your own access key in gpt_key.text file as json format like {"key" : "sk-cQDfQCzd5Hxxxxxx4YkxCT3BlbkFxxxxxxxxxxxxxxxxx"}.
There are 2 major files gpt_qa_aoo.py and gpt_class.py in script folder. You to run gpt_qa_aoo.py to run the app. You can apply this file to your own chatbot framework (RASA, Dialogflow, MS Chatbot, etc)
The following packages are needed to be installed by PIP :
- openai
- json
- pandas as pd
- numpy as np
- textract
- tiktoken
You can ask question about the resume like..
- what is his email address?
- what is his phone number?
- what the are main skills?
- what are the experiences in 2020?
- what are his experiences in ML