Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama-2-chat integration #9

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c004d7b
llama-2-chat integration [wip]
eryk-dsai Aug 31, 2023
3337d6a
fixing problem with private pipeline
eryk-dsai Aug 31, 2023
26e10fc
_generate returns ChatResult
eryk-dsai Sep 1, 2023
2aba4cd
batch calls example in notebook, handling chat messages
eryk-dsai Sep 1, 2023
82f71f1
_format_messages_as_text docstring
eryk-dsai Sep 1, 2023
933f5f4
you can pass stop words now
eryk-dsai Sep 1, 2023
36519e0
format_messages_as_text test
eryk-dsai Sep 1, 2023
9a92e08
formatter
eryk-dsai Sep 1, 2023
cc53251
fix lint issues
eryk-dsai Sep 1, 2023
6a0cd87
removal of redundant notebook cell
eryk-dsai Sep 1, 2023
cc578b5
refactor: update naming to indicate Hugging Face usage
eryk-dsai Sep 4, 2023
cb33d48
small refactor
eryk-dsai Sep 4, 2023
440571d
fix lint errors, running formatter
eryk-dsai Sep 4, 2023
f860326
moving stopping criteria class out of function, correct typing
eryk-dsai Sep 4, 2023
9b6e79d
code review suggestions
eryk-dsai Sep 4, 2023
900d55c
run formatter and lint
eryk-dsai Sep 4, 2023
3b9c030
StoppingCriteria are correctly placed on the same device as pipeline
eryk-dsai Sep 5, 2023
9715b81
Merge branch 'llama2-chat' of https://github.com/deepsense-ai/langcha…
eryk-dsai Sep 5, 2023
3885de5
run formatter, lint
eryk-dsai Sep 5, 2023
814131d
removal of the redundant notebook cell
eryk-dsai Sep 5, 2023
bd6e2fe
moved StoppingCriteria import to method
eryk-dsai Sep 5, 2023
35b5e09
fixing type annotation
eryk-dsai Sep 5, 2023
1228bfc
fixing Enum tests
eryk-dsai Sep 5, 2023
926c02f
Editing the huggingface llama 2 notebook
eryk-dsai Sep 5, 2023
8a8a03a
Merge branch 'master' into llama2-chat
eryk-dsai Sep 5, 2023
964b579
typos, better name for customg Stopping Critieria subclass
eryk-dsai Sep 13, 2023
87597a1
Generic Hugging Face Pipeline Chat Model
eryk-dsai Sep 13, 2023
7756700
Merge branch 'langchain-ai:master' into llama2-chat
eryk-dsai Oct 6, 2023
d9e9ef3
simplifying HF Chat Model, by making use of HF Chat Templates
eryk-dsai Oct 10, 2023
cf203b9
removing incorrect check from validate_environment method
eryk-dsai Oct 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
369 changes: 369 additions & 0 deletions docs/extras/integrations/chat/huggingface_pipeline.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,369 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hugging Face Pipelines as LangChain Chat Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates how to use Hugging Face models as LangChain Chat models, using the Llama 2 Chat model as an example. We use the Hugging Face tokenizer's 'apply_chat_template' method to handle different instruction tuned models with different prompting templates. If you want to change the prompt templateing behavior, you can find instructions in the Hugging Face [guide](https://huggingface.co/docs/transformers/main/en/chat_templating)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Hugging Face imports:\n",
"import torch\n",
"from transformers import (\n",
" AutoModelForCausalLM,\n",
" AutoTokenizer,\n",
" BitsAndBytesConfig,\n",
" pipeline,\n",
")\n",
"\n",
"# LangChain imports:\n",
"from langchain.chat_models import ChatHuggingFacePipeline\n",
"from langchain.schema import AIMessage, HumanMessage, SystemMessage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook assumes that you were granted with access to the Llama 2 models in the Hugging Face models hub. To use the model locally, you need to be [logged in](https://huggingface.co/docs/huggingface_hub/quick-start#login) with a Hugging Face account. \n",
"\n",
"To log in using CLI run the following command in your terminal:\n",
"```\n",
"huggingface-cli login\n",
"```\n",
"or using an environment variable\n",
"```\n",
"huggingface-cli login --token $HUGGINGFACE_TOKEN\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating Hugging Face Pipeline instance:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following section loads the 7b version of the Llama 2 Chat model and uses the `bitsandbytes` library to load a model in 4bit using NF4 quantization with double quantization and compute dtype bfloat16, which speeds up the underlying matrix multiplications.\n",
"\n",
"More information about these techniques can be found at: [link](https://huggingface.co/blog/4bit-transformers-bitsandbytes)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"model_name = \"meta-llama/Llama-2-7b-chat-hf\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To load the model in 4bit, make sure that the `accelerate`, `transformers` and `bitsandbytes` libraries are installed:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# !pip install -q -U bitsandbytes\n",
"# !pip install -q -U git+https://github.com/huggingface/transformers.git\n",
"# !pip install -q -U git+https://github.com/huggingface/peft.git\n",
"# !pip install -q -U git+https://github.com/huggingface/accelerate.git"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"bnb_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_compute_dtype=torch.bfloat16,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8f5fbc7100b445f98d363702e53692fd",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
"# disabling the default System Message of the Llama model \n",
"tokenizer.use_default_system_prompt = False\n",
"\n",
"model_4bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config, device_map=\"auto\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"pipe = pipeline(\n",
" \"text-generation\",\n",
" model=model_4bit,\n",
" tokenizer=tokenizer,\n",
" torch_dtype=torch.float16,\n",
" device_map=\"auto\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initializing the Chat Model instance"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"chat = ChatHuggingFacePipeline(pipeline=pipe)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Besides defining arguments for `Pipeline` initialization, we can also control the generation process, by enabling sampling, chaning temperature or defining maximum length of single generation."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Generation kwargs:\n",
"pipeline_kwargs = {\n",
" \"do_sample\": True,\n",
" \"top_p\": 0.95,\n",
" \"temperature\": 0.7,\n",
" \"eos_token_id\": tokenizer.eos_token_id,\n",
" \"max_length\": 512, \n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Single calls:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can get chat completions by passing one or more messages to the chat model. The response will be a message:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Sure, I'd be happy to help! Here's the translation of \"I love programming\" from English to French:\n",
"Je adore le programming.\n",
"\n",
"I hope that helps! Let me know if you have any other sentences you'd like me to translate.\n"
]
}
],
"source": [
"messages = [\n",
" SystemMessage(\n",
" content=\"You are a helpful assistant that translates English to French.\"\n",
" ),\n",
" HumanMessage(\n",
" content=\"Translate this sentence from English to French. I love programming.\"\n",
" ),\n",
"]\n",
"result = chat(messages, **pipeline_kwargs)\n",
"print(result.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Single calls with stop words"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By utilizing Hugging Face [Stopping Criteria](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.StoppingCriteria) under the hood, we can provide phrases that, if generated by the model, will cause the generation process to stop."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Of course! Artificial\n"
]
}
],
"source": [
"messages = [\n",
" SystemMessage(\n",
" content=\"You are a helpful assistant.\"\n",
" ),\n",
" HumanMessage(\n",
" content=\"Tell me the history of AI.\"\n",
" ),\n",
"]\n",
"result = chat(messages, stop=[\"Artificial\", \"Inteligence\"], **pipeline_kwargs)\n",
"print(result.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Batch calls:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also go one step further and generate completions for multiple sets of messages:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"batch_messages = [\n",
" [\n",
" SystemMessage(content=\"You are a helpful assistant that translates English to French.\"),\n",
" HumanMessage(content=\"I love programming.\")\n",
" ],\n",
" [\n",
" SystemMessage(content=\"You are a helpful assistant that translates English to French.\"),\n",
" HumanMessage(content=\"I love artificial intelligence.\")\n",
" ],\n",
"]\n",
"result = chat.generate(batch_messages)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Response #0:\n",
" Great! \"Programmation\" is the French word for \"programming\".\n",
"\n",
"So, you love programmation? (programme)\n",
"\n",
"Response #1:\n",
" \"Je suis heureux que vous aimiez l'intelligence artificielle.\" (I am happy that you love artificial intelligence.)\n",
"\n"
]
}
],
"source": [
"for i, generation in enumerate(result.generations):\n",
" print(f\"Response #{i}:\\n{generation[0].text}\", end=\"\\n\\n\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.12 ('langchain_venv': venv)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "d1d3a3c58a58885896c5459933a599607cdbb9917d7e1ad7516c8786c51f2dd2"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
2 changes: 2 additions & 0 deletions libs/langchain/langchain/chat_models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
from langchain.chat_models.fake import FakeListChatModel
from langchain.chat_models.fireworks import ChatFireworks
from langchain.chat_models.google_palm import ChatGooglePalm
from langchain.chat_models.huggingface_pipeline import ChatHuggingFacePipeline
from langchain.chat_models.human import HumanInputChatModel
from langchain.chat_models.javelin_ai_gateway import ChatJavelinAIGateway
from langchain.chat_models.jinachat import JinaChat
Expand All @@ -50,6 +51,7 @@
"ChatGooglePalm",
"ChatMLflowAIGateway",
"ChatOllama",
"ChatHuggingFacePipeline",
"ChatVertexAI",
"JinaChat",
"HumanInputChatModel",
Expand Down
Loading
Loading