GGML / GGUF formats #35

damodharanj · 2023-12-28T20:40:41Z

The model is pretty amazing and thanks a lot for open sourcing it. Is there a way to size it down and run in hardwares like Apple silicon using ggml ?
GGML
Would this improve the inference times ? For me in Apple M2 it takes 12 seconds to translate 1 sentence. If you can guide me to do this would be willing to help!

jaygala24 · 2024-01-02T10:49:32Z

Hi @damodharanj

Yes, this should improve the inference time. However, it would require you to write the model definitions in C++ similar to llama.cpp and convert it to ggml.

Currently, we don't have the bandwidth, experience and hardware resources to help you port the models to ggml. Please let us know if there is any progress on this thread.

damodharanj · 2024-01-02T14:29:56Z

thanks a lot for the response! Sure will update what I can do from my end once I take this up

jaygala24 added enhancement New feature or request help wanted Extra attention is needed labels Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGML / GGUF formats #35

GGML / GGUF formats #35

damodharanj commented Dec 28, 2023

jaygala24 commented Jan 2, 2024

damodharanj commented Jan 2, 2024

GGML / GGUF formats #35

GGML / GGUF formats #35

Comments

damodharanj commented Dec 28, 2023

jaygala24 commented Jan 2, 2024

damodharanj commented Jan 2, 2024