[Question] Android app issue #3010

j0h0k0i0m · 2024-11-04T07:04:21Z

❓ General Questions

Hello, I have some questions regarding the Android app.

Currently, I am using q4f16_0 quantization, but there's a significant difference in prefill tokens per second compared to q4f16_1. I’m using the phi-3.5-mini model as a basis, but when testing q4f16_1, the device (Galaxy S24 Ultra) even shuts down entirely. I understand that q4f16_1 generally offers better performance, so I’d like to ask if there are any ways to improve this.
Is the repetition penalty working correctly? I couldn't find a parameter for it in the ChatCompletionRequest within the app, so I'm unsure if it functions as expected. When reviewing the generated sentences, it produces a continuous sequence in a similar style, which suggests it may not be applied properly.

Thanks.

The text was updated successfully, but these errors were encountered:

Hzfengsy · 2024-11-04T14:34:15Z

On mobile phone, I don't think q416_1 offers better performance. For prefill stage, q4f16_0 provides much better performance than q4f16_1

j0h0k0i0m · 2024-11-05T06:19:23Z

@Hzfengsy

Thank you for replying to the issue.

It's understandable to use q4f16_0 due to the prefill stage, but I recall seeing an issue raised earlier stating that the decoding performance is lower. Currently, the prefill tokens per second for the phi-3.5-mini model (q4f16_1) are below 1, and I would like to achieve a better quality response than q4f16_0 with a suitable prefill. Is there any way to do this?

j0h0k0i0m added the question Question about the usage label Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Android app issue #3010

[Question] Android app issue #3010

j0h0k0i0m commented Nov 4, 2024

Hzfengsy commented Nov 4, 2024

j0h0k0i0m commented Nov 5, 2024

[Question] Android app issue #3010

[Question] Android app issue #3010

Comments

j0h0k0i0m commented Nov 4, 2024

❓ General Questions

Hzfengsy commented Nov 4, 2024

j0h0k0i0m commented Nov 5, 2024