You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RAG deployment creates a CPU node pool of size 2 and a GPU node pool of size 2, both with autoscaling enabled. The GPU node pool is g2-standard-24 with 2 L4 GPUs and CPU node pool is n1-standard-16.
With the current default configuration, the cluster autoscaler sometimes scales the GPU node pool to fit everything in the RAG deployment. We should try to have the default node pool settings work without having to scale more nodes. GPUs are also expensive so we should avoid scaling a 3rd g2-standard-24 when possible. GPUs are also scarce so it increases failure rate of the RAG deployment.
I haven't thoroughly investigated why a 3rd g2-standard-24 instance is needed, but I suspect it's due to various components of the stack having their CPU / memory limits increased over the past several months. We should consider reducing the CPU/Memory requests whereever possible so we can run everything using the default node counts.
This will likely be fixed when we default all deployments to Autopilot as well. But even then we should try to reduce resource requests wherever possible
The text was updated successfully, but these errors were encountered:
The RAG deployment creates a CPU node pool of size 2 and a GPU node pool of size 2, both with autoscaling enabled. The GPU node pool is g2-standard-24 with 2 L4 GPUs and CPU node pool is n1-standard-16.
With the current default configuration, the cluster autoscaler sometimes scales the GPU node pool to fit everything in the RAG deployment. We should try to have the default node pool settings work without having to scale more nodes. GPUs are also expensive so we should avoid scaling a 3rd g2-standard-24 when possible. GPUs are also scarce so it increases failure rate of the RAG deployment.
I haven't thoroughly investigated why a 3rd g2-standard-24 instance is needed, but I suspect it's due to various components of the stack having their CPU / memory limits increased over the past several months. We should consider reducing the CPU/Memory requests whereever possible so we can run everything using the default node counts.
This will likely be fixed when we default all deployments to Autopilot as well. But even then we should try to reduce resource requests wherever possible
The text was updated successfully, but these errors were encountered: