You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model is pretty amazing and thanks a lot for open sourcing it. Is there a way to size it down and run in hardwares like Apple silicon using ggml ? GGML
Would this improve the inference times ? For me in Apple M2 it takes 12 seconds to translate 1 sentence. If you can guide me to do this would be willing to help!
The text was updated successfully, but these errors were encountered:
Yes, this should improve the inference time. However, it would require you to write the model definitions in C++ similar to llama.cpp and convert it to ggml.
Currently, we don't have the bandwidth, experience and hardware resources to help you port the models to ggml. Please let us know if there is any progress on this thread.
The model is pretty amazing and thanks a lot for open sourcing it. Is there a way to size it down and run in hardwares like Apple silicon using ggml ?
GGML
Would this improve the inference times ? For me in Apple M2 it takes 12 seconds to translate 1 sentence. If you can guide me to do this would be willing to help!
The text was updated successfully, but these errors were encountered: