OhLLM

Pure C++ LLM inference engine. No third party dependencies.

under construction, for education and learning purpose only (currently). Hope it will be eventually evolved to be useful in production.*

License: MIT

Build

./scripts/build.sh [--debug] [--test] [--nolog] [--clear]

# argments:
#   --debug: build debug version for debugging
#   --test: run all unit tests
#   --nolog: build without log
#   --clear: clear and rebuild

Usage

Benchmark

Features

This project will mainly focus on LLM inference on single machine with/without accelerator (Nvidia GPU, apple MPS, ...).

As the project proceeds:

support cpu inference in FP32/FP16, no vectorization intrinsics, compatible with GGUF format
support mainstream LLM models, like llama, phi, qwen, gemma and ...
support cpu inference in FP32/FP16 plus vectorization intrinsics
support cpu inference with quantization of Q8_0, Q4_0 and Q4_1
support streaming output
support nvidia gpu inference
support nvidia gpu inference with cpu offloading
support apple mps

Acknowledgements

This project is greatly inspired by ggml, fastllm and llama.cpp. Thanks a lot to all of the contributors.

Please buy me a ☕ if you find this project useful.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ohllm		ohllm
python		python
scripts		scripts
test		test
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OhLLM

Build

Usage

Benchmark

Features

Acknowledgements

About

Releases

Packages

Languages

License

feixyz10/ohLLM

Folders and files

Latest commit

History

Repository files navigation

OhLLM

Build

Usage

Benchmark

Features

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages