Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: is gzip applied? #126

Open
qinst64 opened this issue Aug 2, 2024 · 1 comment
Open

question: is gzip applied? #126

qinst64 opened this issue Aug 2, 2024 · 1 comment

Comments

@qinst64
Copy link

qinst64 commented Aug 2, 2024

I have a very long prompt, and ollama is in a remote server.
While sending request through http using ollama-js, is compression (i.e. gzip ) already applied so that speed is optimal?

@hopperelec
Copy link
Contributor

hopperelec commented Aug 6, 2024

The HTTP standard does not support compression for requests because it would require a pre-request to identify if the server is capable of decompressing it. When using APIs, compression support can be assumed, but this would require being implemented in the API itself first. The Ollama API is written in Go which I am not familiar with so I can't confirm for certain that it doesn't already support compressed requests, but I doubt it.

Compressing responses would be much easier to implement (at least when the response isn't being streamed) but, again, this would require changes to the Ollama API rather than ollama-js. I tested this and it does not seem that the Ollama API currently compresses responses. However, there might be specific circumstances where it does, I'm not sure.

As for whether implementing it would be a good idea, it's likely that generation speed is going to have a much greater effect on the overall speed than the request speed. Even with an upload speed of just 1Mb/s, it would take about a second to send a prompt which occupies the entirety of Llama 3.1's context window. Even when running on hardware dedicated to AI (e.g: Groq), generating a response to a request of that size takes much longer than a second, and Ollama is intended for consumer hardware. So, while it could be beneficial, I don't think it's really a concern at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants