question: is gzip applied? #126

qinst64 · 2024-08-02T11:49:46Z

I have a very long prompt, and ollama is in a remote server.
While sending request through http using ollama-js, is compression (i.e. gzip ) already applied so that speed is optimal?

hopperelec · 2024-08-06T23:55:40Z

The HTTP standard does not support compression for requests because it would require a pre-request to identify if the server is capable of decompressing it. When using APIs, compression support can be assumed, but this would require being implemented in the API itself first. The Ollama API is written in Go which I am not familiar with so I can't confirm for certain that it doesn't already support compressed requests, but I doubt it.

Compressing responses would be much easier to implement (at least when the response isn't being streamed) but, again, this would require changes to the Ollama API rather than ollama-js. I tested this and it does not seem that the Ollama API currently compresses responses. However, there might be specific circumstances where it does, I'm not sure.

As for whether implementing it would be a good idea, it's likely that generation speed is going to have a much greater effect on the overall speed than the request speed. Even with an upload speed of just 1Mb/s, it would take about a second to send a prompt which occupies the entirety of Llama 3.1's context window. Even when running on hardware dedicated to AI (e.g: Groq), generating a response to a request of that size takes much longer than a second, and Ollama is intended for consumer hardware. So, while it could be beneficial, I don't think it's really a concern at the moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: is gzip applied? #126

question: is gzip applied? #126

qinst64 commented Aug 2, 2024

hopperelec commented Aug 6, 2024 •

edited

Loading

question: is gzip applied? #126

question: is gzip applied? #126

Comments

qinst64 commented Aug 2, 2024

hopperelec commented Aug 6, 2024 • edited Loading

hopperelec commented Aug 6, 2024 •

edited

Loading