Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
cli		cli
docker		docker
docs		docs
examples		examples
models		models
pipeline		pipeline
server		server
tests		tests
ui		ui
utils		utils
README.md		README.md
__init__.py		__init__.py
chatbot.py		chatbot.py
config.py		config.py
plugins.py		plugins.py
requirements.txt		requirements.txt
requirements_cpu.txt		requirements_cpu.txt
requirements_hpu.txt		requirements_hpu.txt
version.py		version.py

README.md

NeuralChat

A customizable chatbot framework to create your own chatbot within minutes

Content

Introduction
Installation
Getting Started

3.1 Local Mode

3.2 Server Mode

3.2.1 Launch Server

3.2.2 Access Server
Advanced Topics
Validated Model List
Jupyter Notebooks

Introduction

NeuralChat is a customizable chat framework designed to easily create user own chatbot that can be efficiently deployed across multiple architectures (e.g., Intel® Xeon® Scalable processors, Habana® Gaudi® AI processors). NeuralChat is built on top of large language models (LLMs) and provides a set of strong capabilities including LLM fine-tuning, optimization, and inference, together with a rich set of plugins such as knowledge retrieval, query caching, etc. With NeuralChat, you can easily create a text-based or audio-based chatbot within minutes and deploy on user favorite platform rapidly.

NeuralChat is under active development with some experimental features (APIs are subject to change).

Installation

NeuralChat is seamlessly integrated into the Intel Extension for Transformers. Please refer to Installation page for step by step instructions.

Getting Started

NeuralChat could be deployed as local mode and server mode.

Local Mode

NeuralChat can be simplify deployed on local machine after installation, and users can access it through:

# Command line
neuralchat predict --query "Tell me about Intel Xeon Scalable Processors."

# Python code
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
print(response)

Server Mode

NeuralChat can be deployed on remote machine as a service, and users can access it through curl with Restful API:

Launch Service

Executing below command launches the chatbot service:

neuralchat_server start --config_file ./server/config/neuralchat.yaml

Access Service

Using curl command like below to send a request to the chatbot service:

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Tell me about Intel Xeon Scalable Processors."}' http://127.0.0.1:80/v1/chat/completions

Advanced Topics

Plugins

NeuralChat introduces the plugins which offer a rich set of useful LLM utils and features to augment the chatbot's capability. Such plugins are applied in the chatbot pipeline for inference.

Below shows the supported plugins:

Knowledge Retrieval

Knowledge retrieval consists of document indexing for efficient retrieval of relevant information, including Dense Indexing based on LangChain and Sparse Indexing based on fastRAG, document rankers to prioritize the most relevant responses.
Query Caching

Query caching enables the fast path to get the response without LLM inference and therefore improves the chat response time
Prompt Optimization

Prompt optimization supports auto prompt engineering to improve user prompts.
Memory Controller

Memory controller enables the efficient memory utilization.
Safety Checker

Safety checker enables the sensitive content check on inputs and outputs of the chatbot.

User could enable, disable, and even change the default behavior of all supported plugins like below

from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig, plugins

plugins.retrieval.enable = True
plugins.retrieval.args["input_path"]="./assets/docs/"
conf = PipelineConf(plugins=plugins)
chatbot = build_chatbot(conf)

Fine-tuning

NeuralChat supports fine-tuning the pretrained large language model (LLM) for text-generation, summarization, code generation tasks, and even TTS model, for user to create the customized chatbot.

# Command line
neuralchat finetune --base_model "meta-llama/Llama-2-7b-chat-hf" --config pipeline/finetuning/config/finetuning.yaml

# Python code
from intel_extension_for_transformers.neural_chat import finetune_model, TextGenerationFinetuningConfig
finetune_cfg = TextGenerationFinetuningConfig() # support other finetuning config
finetuned_model = finetune_model(finetune_cfg)

Optimization

NeuralChat provides several model optimization technologies, like AMP(advanced mixed precision) and WeightOnly Quantization, to allow user to define a customized chatbot.

# Command line
neuralchat optimize --base_model "meta-llama/Llama-2-7b-chat-hf" --config pipeline/optimization/config/optimization.yaml

# Python code
from intel_extension_for_transformers.neural_chat import build_chatbot, AMPConfig
pipeline_cfg = PipelineConfig(optimization_config=AMPConfig())
chatbot = build_chatbot(pipeline_cfg)

Validated Model List

The table below displays the validated model list in NeuralChat for both inference and fine-tuning.

Pretrained model	Text Generation (Instruction)	Text Generation (ChatBot)	Summarization	Code Generation
Intel/neural-chat-7b-v1-1	✅	✅	✅	✅
LLaMA series	✅	✅	✅	✅
LLaMA2 series	✅	✅	✅	✅
MPT series	✅	✅	✅	✅
FLAN-T5 series	✅	WIP	WIP	WIP

Selected Notebooks

Welcome to use Jupyter notebooks to explore how to run, deploy, and customize chatbots across multiple architectures, including Intel Xeon Scalable Processors (SPR, ICX), Intel Xeon CPU Max Series, Intel Habana Gaudi1/Gaudi2, and others. The selected notebooks are shown below and the full notebooks are available in here.

Notebook	Title	Description	Link
#1	Getting Started on Intel CPU SPR	Learn how to run chatbot on SPR	Notebook
#2	Getting Started on Habana Gaudi1/Gaudi2	Learn how to run chatbot on Habana Gaudi1/Gaudi2	Notebook
#3	Deploying Chatbot Service on Intel CPU SPR	Learn how to deploy chatbot service on SPR	Notebook
#4	Deploying Chatbot Service on Habana Gaudi1/Gaudi2	Learn how to deploy chatbot service on Intel Habana Gaudi1/Gaudi2	Notebook
#5	Deploying Chatbot Service with Load Balance on Intel CPU SPR	Learn how to deploy chatbot service with load balance on SPR	Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neural_chat

neural_chat

README.md

NeuralChat

A customizable chatbot framework to create your own chatbot within minutes

Content

Introduction

Installation

Getting Started

Local Mode

Server Mode

Launch Service

Access Service

Advanced Topics

Plugins

Fine-tuning

Optimization

Validated Model List

Selected Notebooks

Files

neural_chat

Directory actions

More options

Directory actions

More options

Latest commit

History

neural_chat

Folders and files

parent directory

README.md

NeuralChat

A customizable chatbot framework to create your own chatbot within minutes

Content

Introduction

Installation

Getting Started

Local Mode

Server Mode

Launch Service

Access Service

Advanced Topics

Plugins

Fine-tuning

Optimization

Validated Model List

Selected Notebooks