๐ต llmlite is a library helps to communicate with all kinds of LLMs consistently.
- State-of-the-art LLMs support
- Continuous Batching via vLLM
- Quantization([issue#37] (#37))
- Loading specific adapters (issue#51)
- Streaming (issue#52)
Model | State | System Prompt | Note |
---|---|---|---|
ChatGPT | Done โ | Yes | |
Llama-2 | Done โ | Yes | |
CodeLlama | Done โ | Yes | |
ChatGLM2 | Done โ | No | |
Baichuan2 | Done โ | Yes | |
ChatGLM3 | WIP โณ | Yes | |
Claude-2 | RoadMap ๐ | issue#7 | |
Falcon | RoadMap ๐ | issue#8 | |
StableLM | RoadMap ๐ | issue#11 |
backend | State |
---|---|
huggingface | Done โ |
vLLM | Done โ |
pip install llmlite==0.0.15
from llmlite import ChatLLM, ChatMessage
chat = ChatLLM(
model_name_or_path="meta-llama/Llama-2-7b-chat-hf", # required
task="text-generation",
)
result = chat.completion(
messages=[
ChatMessage(role="system", content="You're a honest assistant."),
ChatMessage(role="user", content="There's a llama in my garden, what should I do?"),
]
)
# Output: Oh my goodness, a llama in your garden?! ๐ฑ That's quite a surprise! ๐
As an honest assistant, I must inform you that llamas are not typically known for their gardening skills, so it's possible that the llama in your garden may have wandered there accidentally or is seeking shelter. ๐ฎ ...
This is mostly supported by vLLM, you can enable this by configuring the backend.
from llmlite import ChatLLM, ChatMessage
chat = ChatLLM(
model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
backend="vllm",
)
results = chat.completion(
messages=[
[
ChatMessage(role="system", content="You're a honest assistant."),
ChatMessage( role="user", content="There's a llama in my garden, what should I do?"),
],
[
ChatMessage(role="user", content="What's the population of the world?"),
],
],
max_tokens=2048,
)
for result in results:
print(f"RESULT: \n{result}\n\n")
llmlite
also supports other parameters like temperature
, max_length
, do_sample
, top_k
, top_p
to help control the length, randomness and diversity of the generated text.
See examples for reference.
You can use llmlite
to help you generate full prompts, for instance:
from llmlite import ChatLLM
messages = [
ChatMessage(role="system", content="You're a honest assistant."),
ChatMessage(role="user", content="There's a llama in my garden, what should I do?"),
]
ChatLLM.prompt("meta-llama/Llama-2-7b-chat-hf", messages)
# Output:
# <s>[INST] <<SYS>>
# You're a honest assistant.
# <</SYS>>
# There's a llama in my garden, what should I do? [/INST]
Set the env variable LOG_LEVEL
for log configuration, default to INFO
, others like DEBUG, INFO, WARNING etc..
๐ All kinds of contributions are welcomed ! Please follow Contributing.
๐ Thanks to all these contributors.