llmlite

🌵 llmlite is a library helps to communicate with all kinds of LLMs consistently.

Features

State-of-the-art LLMs support
Continuous Batching via vLLM
Quantization([issue#37] (#37))
Loading specific adapters (issue#51)
Streaming (issue#52)

Model Support

Model	State	System Prompt	Note
ChatGPT	Done ✅	Yes
Llama-2	Done ✅	Yes
CodeLlama	Done ✅	Yes
ChatGLM2	Done ✅	No
Baichuan2	Done ✅	Yes
ChatGLM3	WIP ⏳	Yes
Claude-2	RoadMap 📋		issue#7
Falcon	RoadMap 📋		issue#8
StableLM	RoadMap 📋		issue#11

Backend Support

backend	State
huggingface	Done ✅
vLLM	Done ✅

How to install

pip install llmlite==0.0.15

How to use

Chat

from llmlite import ChatLLM, ChatMessage

chat = ChatLLM(
    model_name_or_path="meta-llama/Llama-2-7b-chat-hf", # required
    task="text-generation",
    )

result = chat.completion(
  messages=[
    ChatMessage(role="system", content="You're a honest assistant."),
    ChatMessage(role="user", content="There's a llama in my garden, what should I do?"),
  ]
)

# Output: Oh my goodness, a llama in your garden?! 😱 That's quite a surprise! 😅 As an honest assistant, I must inform you that llamas are not typically known for their gardening skills, so it's possible that the llama in your garden may have wandered there accidentally or is seeking shelter. 🐮 ...

Continuous Batching

This is mostly supported by vLLM, you can enable this by configuring the backend.

from llmlite import ChatLLM, ChatMessage

chat = ChatLLM(
    model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
    backend="vllm",
)

results = chat.completion(
    messages=[
        [
            ChatMessage(role="system", content="You're a honest assistant."),
            ChatMessage( role="user", content="There's a llama in my garden, what should I do?"),
        ],
        [
            ChatMessage(role="user", content="What's the population of the world?"),
        ],
    ],
    max_tokens=2048,
)

for result in results:
    print(f"RESULT: \n{result}\n\n")

llmlite also supports other parameters like temperature, max_length, do_sample, top_k, top_p to help control the length, randomness and diversity of the generated text.

See examples for reference.

Prompting

You can use llmlite to help you generate full prompts, for instance:

from llmlite import ChatLLM

messages = [
    ChatMessage(role="system", content="You're a honest assistant."),
    ChatMessage(role="user", content="There's a llama in my garden, what should I do?"),
]

ChatLLM.prompt("meta-llama/Llama-2-7b-chat-hf", messages)

# Output:
# <s>[INST] <<SYS>>
# You're a honest assistant.
# <</SYS>>

# There's a llama in my garden, what should I do? [/INST]

Logging

Set the env variable LOG_LEVEL for log configuration, default to INFO, others like DEBUG, INFO, WARNING etc..

Contributions

🚀 All kinds of contributions are welcomed ! Please follow Contributing.

Contributors

🎉 Thanks to all these contributors.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
examples		examples
llmlite		llmlite
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmlite

Features

Model Support

Backend Support

How to install

How to use

Chat

Continuous Batching

Prompting

Logging

Contributions

Contributors

About

Releases 1

Packages

Contributors 3

Languages

License

InftyAI/llmlite

Folders and files

Latest commit

History

Repository files navigation

llmlite

Features

Model Support

Backend Support

How to install

How to use

Chat

Continuous Batching

Prompting

Logging

Contributions

Contributors

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages