The Gateway API Inference Extension came out of wg-serving and is sponsored by SIG Network. This repo contains: the load balancing algorithm, ext-proc code, CRDs, and controllers of the extension.
This extension is intented to provide value to multiplexed LLM services on a shared pool of compute. See the proposal for more info.
This project is currently in development.
For more rapid testing, our PoC is in the ./examples/
dir.
Install the CRDs into the cluster:
make install
Delete the APIs(CRDs) from the cluster:
make uninstall
Deploying the ext-proc image Refer to this README on how to deploy the Ext-Proc image.
Our community meeting is weekly at Th 10AM PDT; zoom link here.
We currently utilize the #wg-serving slack channel for communications.
Contributions are readily welcomed, thanks for joining us!
Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.