Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
schorndorfer committed Nov 13, 2023
1 parent 243d013 commit c11d0c8
Show file tree
Hide file tree
Showing 30 changed files with 548 additions and 513 deletions.
Binary file modified _images/chatgpt-settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/openai-api-create-api-key.mp4
Binary file not shown.
Binary file added _images/openai-api-limits.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
95 changes: 40 additions & 55 deletions _sources/augmented-generation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"<font color='purple'>**Retrieval Augmented Generation (RAG)**</font> is a powerful paradigm in natural language processing that combines the strengths of information retrieval and language generation. In the context of the **OpenAI API**, this approach involves retrieving relevant information from a large dataset and using that information to enhance the generation of accurate text. It can be used as another method to fine-tune your models. \n",
"\n",
"### _Definition_\n",
"<font color='purple'>**RAG**</font> is a method that leverages pre-existing knowledge by retrieving pertinent information from a knowledge base and using it to inform the generation of coherent and contextually relevant text. In the OpenAI API, <font color='purple'>**RAG**</font> is exemplified by models that integrate the retrieval of information to augment the output of the language generation process. The phrase <font color='purple'>**Retrieval Augmented Generation**</font> comes from a recent paper by Lewis et al. from Facebook AI (https://research.facebook.com/publications/retrieval-augmented-generation-for-knowledge-intensive-nlp-tasks/). The idea is to use a pre-trained language model (LM) to generate text, but to use a separate retrieval system to find relevant documents to condition the LM on.\n",
"<font color='purple'>**RAG**</font> is a method that leverages pre-existing knowledge by retrieving pertinent information from a knowledge base and using it to inform the generation of coherent and contextually relevant text. The phrase <font color='purple'>**Retrieval Augmented Generation**</font> comes from a recent paper by Lewis et al. from Facebook AI (https://research.facebook.com/publications/retrieval-augmented-generation-for-knowledge-intensive-nlp-tasks/). The idea is to use a pre-trained language model (LM) to generate text, but to use a separate retrieval system to find relevant documents to condition the LM on.\n",
"\n",
"### _How it Works_\n",
"\n",
Expand All @@ -35,11 +35,11 @@
"\n",
"- **Code Generation**: In software development, RAG can assist in generating code snippets by retrieving information from programming knowledge bases, ensuring the produced code is accurate and contextually fitting. (example today)\n",
"\n",
"- **Prevent Hallucinations**: Finally, RAG can be used to bring in external knowledge to check whether a GPT response is an hallucination. (example provided)\n",
"- **Prevent Hallucinations**: Finally, RAG can be used to bring in external knowledge to check whether a GPT response is a hallucination. (example provided)\n",
"\n",
"### _Getting Started_\n",
"\n",
"Please install and import libraries."
"Please install and import these libraries."
]
},
{
Expand Down Expand Up @@ -95,46 +95,43 @@
},
{
"cell_type": "code",
"execution_count": 58,
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Generated Text:\n",
"\n",
"Answer: \n",
"\n",
"The latest version of Python Selenium (3.141.0) uses the same method to find elements by class name as previous versions:\n",
"You can find an element by class name using the find_element_by_class_name() method in the latest version of Python Selenium. An example of this usage is as follows:\n",
"\n",
"driver.find_element_by_class_name(\"some-class\")\n"
"element = driver.find_element_by_class_name(\"class_name\")\n"
]
}
],
"source": [
"# Ask GPT-3 about the Python version\n",
"#prompt = \"What is the lastest version of Python's selenium library that you know?\"\n",
"prompt = \"How do I find an element by class name in the latest version of python selenium?\"\n",
"\n",
"# Generate response using GPT-3\n",
"response = openai.Completion.create(\n",
" engine=\"text-davinci-002\", # Choose the appropriate engine\n",
" engine=\"text-davinci-003\", # Choose the appropriate engine\n",
" prompt=prompt,\n",
" max_tokens=100, # Adjust as needed\n",
" temperature=0.7, # Adjust as needed\n",
" max_tokens=100, \n",
" temperature=0.7, \n",
")\n",
"\n",
"# Display the generated text\n",
"generated_text = response[\"choices\"][0][\"text\"]\n",
"print(\"Generated Text:\")\n",
"print(generated_text)"
"print(f\"Answer: {generated_text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This code for finding elements by class name no longer works in newer version of selenium found here: https://www.selenium.dev/documentation/webdriver/troubleshooting/upgrade_to_selenium_4/"
"This code for finding elements by class name no longer works in newer versions of selenium found here: https://www.selenium.dev/documentation/webdriver/troubleshooting/upgrade_to_selenium_4/"
]
},
{
Expand Down Expand Up @@ -171,7 +168,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -201,7 +198,7 @@
"embeddings = OpenAIEmbeddings(openai_api_key=api_key)\n",
"metadata = [{\"source\": url} for _ in range(len(chunks))] # Metadata for each chunk\n",
"\n",
"# Create a FAISS vector store and save it to disk\n",
"# Create a FAISS vector store and save it\n",
"store = FAISS.from_texts(chunks, embeddings, metadatas=metadata)\n",
"faiss.write_index(store.index, \"selenium_docs.index\")\n",
"store.index = None\n",
Expand All @@ -218,42 +215,33 @@
},
{
"cell_type": "code",
"execution_count": 60,
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/ambreenchaudhri/anaconda3/lib/python3.11/site-packages/langchain/chains/qa_with_sources/vector_db.py:67: UserWarning: `VectorDBQAWithSourcesChain` is deprecated - please use `from langchain.chains import RetrievalQAWithSourcesChain`\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Answer: In the latest version of python selenium, you can find an element by class name using the following methods: driver.findElement(By.className(\"className\")); driver.findElement(By.cssSelector(\".className\")); driver.findElementsByCssSelector(\".className\");\n",
"Answer: In the latest version of python selenium, you can find an element by class name using the following syntax: driver.findElement(By.className(\"className\")) or driver.findElement(By.cssSelector(\".className\")).\n",
"\n"
]
}
],
"source": [
"# Load the FAISS index from disk for Selenium.\n",
"# Load the FAISS index\n",
"index = faiss.read_index(\"selenium_docs.index\") # Assuming the name of the index file is 'selenium_docs.index'\n",
"\n",
"# Load the vector store from disk for Selenium.\n",
"# Load the vector store\n",
"with open(\"selenium_docs.pkl\", \"rb\") as f:\n",
" store = pickle.load(f)\n",
"\n",
"# Merge the index and store for Selenium.\n",
"# Merge the index and store\n",
"store.index = index\n",
"\n",
"# Build the question answering chain for Selenium.\n",
"# Build the question answering chain\n",
"chain = VectorDBQAWithSourcesChain.from_llm(llm=OpenAI(openai_api_key=api_key, temperature=0, max_tokens=1500, model_name='text-davinci-003'), vectorstore=store)\n",
"\n",
"# Ask GPT-3 about the latest version of Selenium.\n",
"#question = \"What is the latest version of Selenium?\"\n",
"# Ask GPT-3 a question\n",
"question = \"How do I find an element by class name in the latest version of python selenium? Show an example.\"\n",
"result = chain({\"question\": question})\n",
"\n",
Expand All @@ -267,40 +255,38 @@
"source": [
"### _Example B: Preventing Hallucinations_\n",
"\n",
"Another advantage of using RAG is to feed GPT an external knowledge source to check or prevent hallucinations. An **artificial hallucination** (also called confabulation or delusion) is a response generated by an AI which contains false or misleading information presented as factual. This could be something as innocuous as saying something exists in a file that doesn't. Or it could be an instance with GPT actually provides false information. Here is an example below of Luna, the elephant that walked on the moon. "
"Another advantage of using RAG is to feed GPT an external knowledge source to check or prevent hallucinations. An <font color='purple'>**artificial hallucination**</font> is a response that contains false or misleading information presented as factual. This could be something as innocuous as saying an item exists in a file that doesn't. Conversely, it could be an instance when GPT actually provides false information. Here is an example of Ellie, the elephant that walked on the moon. "
]
},
{
"cell_type": "code",
"execution_count": 63,
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Generated Text:\n",
"\n",
"Answer: \n",
"\n",
"The first elephant that landed on the moon was a female elephant named Ellie. She was born in captivity in Africa and was brought to the United States when she was two years old. Ellie became the first elephant to walk on the moon when she was part of the Apollo 11 mission in 1969.\n"
"The first elephant to land on the moon was a female elephant named Ellie. She was born in captivity in Africa and was brought to the United States when she was two years old. Ellie spent the majority of her life performing in circuses and zoos. In 1962, she was sent to the National Zoo in Washington, D.C. where she lived for the rest of her life. Ellie died in 1988 at the age of 36.\n"
]
}
],
"source": [
"prompt = \"Can you tell me more about the first elephant that landed on the moon? \"\n",
"prompt = \"Can you tell me more about the first elephant that landed on the moon?\"\n",
"\n",
"# Generate response using GPT-3\n",
"response = openai.Completion.create(\n",
" engine=\"text-davinci-002\", # Choose the appropriate engine\n",
" engine=\"text-davinci-002\", \n",
" prompt=prompt,\n",
" max_tokens=100, # Adjust as needed\n",
" temperature=0.0, # Adjust as needed\n",
" max_tokens=100, \n",
" temperature=0.0, \n",
")\n",
"\n",
"# Display the generated text\n",
"generated_text = response[\"choices\"][0][\"text\"]\n",
"print(\"Generated Text:\")\n",
"print(generated_text)\n"
"print(f\"Answer: {generated_text}\")\n"
]
},
{
Expand Down Expand Up @@ -331,8 +317,8 @@
" {\"role\": \"system\", \"content\": \"Answer the following question the best you can.\"},\n",
" {\"role\": \"user\", \"content\": \"Can you tell me more about the first elephant that landed on the moon?\"}\n",
" ],\n",
" max_tokens=100, # Adjust as needed\n",
" temperature=0.0, # Adjust as needed\n",
" max_tokens=100, \n",
" temperature=0.0, \n",
")\n",
"\n",
"# Display the generated text\n",
Expand All @@ -350,7 +336,7 @@
},
{
"cell_type": "code",
"execution_count": 72,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -390,21 +376,20 @@
}
],
"source": [
"# Load the FAISS index from disk for Selenium.\n",
"index = faiss.read_index(\"elephant_docs.index\") # Assuming the name of the index file is 'selenium_docs.index'\n",
"# Load the FAISS index\n",
"index = faiss.read_index(\"elephant_docs.index\") \n",
"\n",
"# Load the vector store from disk for Selenium.\n",
"# Load the vector store\n",
"with open(\"elephant_docs.pkl\", \"rb\") as f:\n",
" store = pickle.load(f)\n",
"\n",
"# Merge the index and store for Selenium.\n",
"# Merge the index and store\n",
"store.index = index\n",
"\n",
"# Build the question answering chain for Selenium.\n",
"chain = VectorDBQAWithSourcesChain.from_llm(llm=OpenAI(openai_api_key=api_key, temperature=1.0, max_tokens=100, model_name='text-davinci-002'), vectorstore=store)\n",
"# Build the question answering chain\n",
"chain = VectorDBQAWithSourcesChain.from_llm(llm=OpenAI(openai_api_key=api_key, temperature=0, max_tokens=100, model_name='text-davinci-002'), vectorstore=store)\n",
"\n",
"# Ask GPT-3 about the latest version of Selenium.\n",
"#question = \"What is the latest version of Selenium?\"\n",
"# Ask GPT-3 a question\n",
"question = \"Can you tell me more about the first elephant that landed on the moon?\"\n",
"result = chain({\"question\": question})\n",
"\n",
Expand Down
27 changes: 15 additions & 12 deletions _sources/function-calling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -58,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -68,6 +68,9 @@
"\n",
"# Python code for flight status is adapted from \n",
"# https://www.tutorialspoint.com/get-flight-status-using-python\n",
"#\n",
"# Note: Tracking is available for flights scheduled 3 days before or after today.\n",
"#\n",
"def get_flight_status(airline_code, flight_number, day, month, year):\n",
" def get_data(url):\n",
" response = requests.get(url)\n",
Expand All @@ -85,7 +88,7 @@
" item.get_text() for item in soup.find_all(\"div\", class_=\"text-helper__TextHelper-sc-8bko4a-0 kbHzdx\")\n",
" ]\n",
"\n",
" return statuses[0] + \"; Departing at \" + time_statuses[0] + \"; Arriving at \" + time_statuses[2]"
" return str(statuses[0] + \"; Departing at \" + time_statuses[0] + \"; Arriving at \" + time_statuses[2])"
]
},
{
Expand All @@ -97,14 +100,14 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ChatCompletion(id='chatcmpl-8JSS4JgIRLXMFYAa5qUI4f5ejNd7L', choices=[Choice(finish_reason='function_call', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{\\n \"airline_code\": \"UA\",\\n \"flight_number\": 792,\\n \"day\": 9,\\n \"month\": 11,\\n \"year\": 2023\\n}', name='get_flight_status'), tool_calls=None))], created=1699648292, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=48, prompt_tokens=119, total_tokens=167))\n"
"ChatCompletion(id='chatcmpl-8KWcsocsczlpDfw3VRp7DEbrhm88W', choices=[Choice(finish_reason='function_call', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{\\n \"airline_code\": \"UA\",\\n \"flight_number\": 792,\\n \"day\": 12,\\n \"month\": 11,\\n \"year\": 2023\\n}', name='get_flight_status'), tool_calls=None))], created=1699902666, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=48, prompt_tokens=119, total_tokens=167))\n"
]
}
],
Expand All @@ -114,7 +117,7 @@
" messages = [\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": \"What is the flight status of UA 792 for Nov 9, 2023?\"\n",
" \"content\": \"What is the flight status of UA 792 for Nov 12, 2023?\"\n",
" }\n",
" ],\n",
" functions = [\n",
Expand Down Expand Up @@ -162,7 +165,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 16,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -193,14 +196,14 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ChatCompletion(id='chatcmpl-8JSS6qstcU6tVKUB3JkZ2yIUbq5KW', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='The flight status of UA 792 for Nov 9, 2023 is on time. The flight is departing at 06:00 CST and arriving at 09:06 EST.', role='assistant', function_call=None, tool_calls=None))], created=1699648294, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=39, prompt_tokens=150, total_tokens=189))\n"
"ChatCompletion(id='chatcmpl-8KWdNwNEKsRQDoXLJ6cnrYEBnnRau', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='The flight status of UA 792 for November 12, 2023, is on time. The flight is scheduled to depart at 06:00 CST and arrive at 09:06 EST.', role='assistant', function_call=None, tool_calls=None))], created=1699902697, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=42, prompt_tokens=150, total_tokens=192))\n"
]
}
],
Expand All @@ -210,7 +213,7 @@
" messages = [\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": \"What is the flight status of UA 792 for Nov 9, 2023?\"\n",
" \"content\": \"What is the flight status of UA 792 for Nov 12, 2023?\"\n",
" },\n",
" {\n",
" \"role\": \"function\",\n",
Expand Down Expand Up @@ -263,14 +266,14 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The flight status of UA 792 for Nov 9, 2023 is on time. The flight is departing at 06:00 CST and arriving at 09:06 EST.\n"
"The flight status of UA 792 for November 12, 2023, is on time. The flight is scheduled to depart at 06:00 CST and arrive at 09:06 EST.\n"
]
}
],
Expand Down
54 changes: 7 additions & 47 deletions _sources/langchain.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit c11d0c8

Please sign in to comment.