RAG deployment sometimes need a 3rd g2-standard-24 instance #572

andrewsykim · 2024-04-05T03:16:15Z

The RAG deployment creates a CPU node pool of size 2 and a GPU node pool of size 2, both with autoscaling enabled. The GPU node pool is g2-standard-24 with 2 L4 GPUs and CPU node pool is n1-standard-16.

With the current default configuration, the cluster autoscaler sometimes scales the GPU node pool to fit everything in the RAG deployment. We should try to have the default node pool settings work without having to scale more nodes. GPUs are also expensive so we should avoid scaling a 3rd g2-standard-24 when possible. GPUs are also scarce so it increases failure rate of the RAG deployment.

I haven't thoroughly investigated why a 3rd g2-standard-24 instance is needed, but I suspect it's due to various components of the stack having their CPU / memory limits increased over the past several months. We should consider reducing the CPU/Memory requests whereever possible so we can run everything using the default node counts.

This will likely be fixed when we default all deployments to Autopilot as well. But even then we should try to reduce resource requests wherever possible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG deployment sometimes need a 3rd g2-standard-24 instance #572

RAG deployment sometimes need a 3rd g2-standard-24 instance #572

andrewsykim commented Apr 5, 2024

RAG deployment sometimes need a 3rd g2-standard-24 instance #572

RAG deployment sometimes need a 3rd g2-standard-24 instance #572

Comments

andrewsykim commented Apr 5, 2024