You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have some questions regarding the Android app.
Currently, I am using q4f16_0 quantization, but there's a significant difference in prefill tokens per second compared to q4f16_1. I’m using the phi-3.5-mini model as a basis, but when testing q4f16_1, the device (Galaxy S24 Ultra) even shuts down entirely. I understand that q4f16_1 generally offers better performance, so I’d like to ask if there are any ways to improve this.
Is the repetition penalty working correctly? I couldn't find a parameter for it in the ChatCompletionRequest within the app, so I'm unsure if it functions as expected. When reviewing the generated sentences, it produces a continuous sequence in a similar style, which suggests it may not be applied properly.
Thanks.
The text was updated successfully, but these errors were encountered:
It's understandable to use q4f16_0 due to the prefill stage, but I recall seeing an issue raised earlier stating that the decoding performance is lower. Currently, the prefill tokens per second for the phi-3.5-mini model (q4f16_1) are below 1, and I would like to achieve a better quality response than q4f16_0 with a suitable prefill. Is there any way to do this?
❓ General Questions
Hello, I have some questions regarding the Android app.
Currently, I am using
q4f16_0
quantization, but there's a significant difference in prefill tokens per second compared toq4f16_1
. I’m using the phi-3.5-mini model as a basis, but when testingq4f16_1
, the device (Galaxy S24 Ultra) even shuts down entirely. I understand thatq4f16_1
generally offers better performance, so I’d like to ask if there are any ways to improve this.Is the repetition penalty working correctly? I couldn't find a parameter for it in the ChatCompletionRequest within the app, so I'm unsure if it functions as expected. When reviewing the generated sentences, it produces a continuous sequence in a similar style, which suggests it may not be applied properly.
Thanks.
The text was updated successfully, but these errors were encountered: