Releases · kubernetes-sigs/gateway-api-inference-extension

06 Mar 02:08

kfswain

v0.2.0-rc.1

fdeec78

v0.2.0-rc Pre-release

Pre-release

What's Changed

Revert "Replace EndpointSlice reconciler with pod list backed by informers" by @kfswain in #301
Fixing small linter complaints by @kfswain in #302
In hermetic test, add additional test cases and move k8sClient object creation so it's called once for all tests by @BenjaminBraunDev in #278
[Metrics] Add average kv cache and waiting queue size metrics for inference pool by @JeffLuoo in #304
Move getting started guide to docs site by @kfswain in #308
site-source: Fix 'Bakcground' misspell in API concepts page by @timflannagan in #309
Mkdocs fixes by @kfswain in #314
Bump google.golang.org/protobuf from 1.36.4 to 1.36.5 by @dependabot in #315
Remove gci linter by @ahg-g in #317
fix: adds ErrorNotFound Handling for InferenceModel Reconciler by @danehans in #286
site-src: Replace k8sgateway with kgateway & fix spelling in roles-and-personas.md by @timflannagan in #311
Fix: Go Mod Imports by @danehans in #318
Updates EPP Deployment and Release Doc/Script by @danehans in #322
Delete InferenceModels from the datastore when deletionTimestamp is set by @ahg-g in #319
Actually init logging using Zap by @tchap in #267
Remove fatal log calls in executable code by @tchap in #265
feat: Adds e2e test script by @danehans in #294
Replacing endpointSlice Reconciler with a direct Pod Reconciler by @kfswain in #300
Move manager from runserver to main by @tchap in #331
feat: adds image-load and kind-load Make targets by @danehans in #288
Use structured logging by @tchap in #330
Add TLS support with self-signed certificate. by @ahg-g in #335
Lora syncer docs by @coolkp in #320
Fix cloudbuild rule for the LoRA syncer image by @ahg-g in #339
fix: Corrects release branch naming by @danehans in #333
Use contextual logging by @tchap in #337
Bump the kubernetes group with 6 updates by @dependabot in #351
Bump sigs.k8s.io/controller-runtime from 0.20.1 to 0.20.2 by @dependabot in #352
Fixes to the adapter rollouts guide by @ahg-g in #338
Consolidating all storage behind datastore by @ahg-g in #350
fixed a typo - close a bash markdown by @nirrozenbaum in #364
Added controller and datastore package by @hzxuzhonghu in #363
Move pkg/ext-proc -> cmd/ext-proc by @tchap in #368
added license header to all .go files by @nirrozenbaum in #370
fix inference extension not correctly scrape pod metrics by @Kuromesi in #366
Move pkg/manifests -> config/manifests by @tchap in #371
[Metrics] Add request error metrics by @JeffLuoo in #269
Rename pkg/ext-proc to pkg/epp by @tchap in #372
Move pkg/ext-proc/metrics/README.md -> site-src/guides/metrics.md by @courageJ in #373
Defining an outer metadata struct as part of the extproc endpoint picking protocol by @ahg-g in #377
Draft a revised README.md by @smarterclayton in #374
Add README.md file to the epp pkg by @ahg-g in #386
Split the proxy and model server protocols for easy reference by @ahg-g in #387
[Metric] Add inference pool and request error metrics to the dashboard by @JeffLuoo in #389
Switch to gcr.io/distroless/static:nonroot base image by @ahg-g in #384
fix context canceled recv error handling by @Kuromesi in #390
Added endpoint picker diagram by @ahg-g in #396
Added v1alpha2 api by @hzxuzhonghu in #398
Adding a roadmap to README by @kfswain in #400
Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.0 by @dependabot in #402
Bump github.com/google/go-cmp from 0.6.0 to 0.7.0 by @dependabot in #403
updated logging in inferencepool reconciler by @nirrozenbaum in #399
added inferencemodel predicate + minor changes in logging by @nirrozenbaum in #397
Syncing getting started guide all to main by @kfswain in #410
fixed typo in filepath in website guide page by @nirrozenbaum in #412
Fix InferenceModel deletion logic by @ahg-g in #393
Updated yamls to use v1alpha2 by @ahg-g in #420
Rm v1alpha1 api by @hzxuzhonghu in #405
removed the EndpointPickerNotHealthy condition form pool status by @ahg-g in #421
[Metrics] Add metrics validation in integration test by @JeffLuoo in #413
predicate follow up PR to remove the check from Reconcile func by @nirrozenbaum in #418
Mis cleanup by @hzxuzhonghu in #428
fix metric scrape port not updated when inference pool target port updated by @Kuromesi in #417
make ModelName immutable and fix model weight by @hzxuzhonghu in #427
Consistent validation for reference types by @robscott in #430
create pods during integration tests by @Kuromesi in #431
fix typos by @nirrozenbaum in #433
Adding Accepted and ResolvedRefs conditions to InferencePool by @robscott in #446
Add code for Envoy extension that supports body-to-header translation by @rramkumar1 in #355
Add Makefile + cloudbuild configs for body-based routing extension by @rramkumar1 in #442
added cpu based example by @nirrozenbaum in #436
upda...

Contributors

robscott, tchap, and 15 other contributors

Assets 3

0 Join discussion

06 Feb 17:47

kfswain

v0.1.0

0b6b6eb

v0.1.0 Latest

Latest

API version: v1alpha1

We are excited to announce the v0.1.0 release of the Kubernetes Gateway API Inference Extension. This release is intended for early adopters and the community to begin integrating and testing the new APIs.

Thank you to all the contributors for helping us deliver this release and for shaping the future of this project!

Getting Started

If you'd like to jump right in, head here!

What we support

GIE v0.1.0 was developed on:

vLLM v0.7.1
Envoy Gateway v1.2.1(or higher)
k8s v1.31

With more model servers and gateway implementations coming soon!

Note: Model servers seeking to support GIE should implement our model server protocol here. Any feedback on the protocol or adoption process is very welcomed!

Note: v0.1.0 was necessary to enable Gateways to begin adopting this tooling. Any Gateway implementation that supports ext-proc & the Gateway API will be able to support GIE.

Disclaimers

Not for Production: This release candidate is provided solely for evaluation, testing, and feedback. We advise against using it in production or building products on top of it, as there may be breaking changes before the final release.
Feedback Welcome: Your experiences and feedback are invaluable. Please share any issues or suggestions via GitHub Issues to help us improve the project.

What's Changed

Owners addition by @kfswain in #2
proposed repo structure + copy of initial proposal by @kfswain in #1
Repo structure by @kfswain in #3
Update OWNERS by @smarterclayton in #6
PoC implementation by @kfswain in #4
Fix build for ext-proc example by @terrytangyuan in #7
Simplify POC installation by @liu-cong in #8
docs: poc markdown improvements by @Xunzhuo in #9
fix: inconsistent secret key with deployment by @Xunzhuo in #11
Updating top level README by @kfswain in #13
API Proposal by @kfswain in #5
Add initial ext proc implementation with LoRA affinity by @liu-cong in #14
Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug by @liu-cong in #17
Envoy update by @kfswain in #18
CRD implementation by @kfswain in #20
Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing by @liu-cong in #26
Add priority based scheduling by @liu-cong in #25
Update vllm deployment example to use 1 GPU as tensor parallelism is 1 by @liu-cong in #28
Add a hermetic e2e test with fake backend pods by @liu-cong in #29
Fix mutierr appending; add a unit test. by @liu-cong in #33
Some minor fixes in Envoy setup by @liu-cong in #35
Update targetModel in request body by @liu-cong in #37
Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. by @kfswain in #39
Simulation code for llm inference gateway by @kaushikmitr in #15
Add myself to approvers by @kfswain in #42
Dynamic lora load/unload sidecar by @coolkp in #31
LLMServerPool Implementation by @kfswain in #36
Repo cleanup by @kfswain in #46
Updating API and generating code by @kfswain in #47
Do not fail Init if fetch metrics fails. It can recover gracefully. by @liu-cong in #51
llmservice reconciler implementation by @kfswain in #48
Update README.md by @BenTheElder in #52
Fixing hermetic_test, small formatting changes by @kfswain in #53
Add myself to reviewers by @liu-cong in #40
Add dependency updates by @robert-cronin in #57
Bump the kubernetes group with 4 updates by @dependabot in #58
Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 by @dependabot in #61
Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 by @dependabot in #62
Bump github.com/prometheus/common from 0.55.0 to 0.60.1 by @dependabot in #60
Bump google.golang.org/grpc from 1.65.0 to 1.68.0 by @dependabot in #59
Fixing Groupversion by @kfswain in #63
Integrating LLMService with weight splitting by @kfswain in #64
Fix build and test by @liu-cong in #65
Makefile fixes with generated output by @kfswain in #67
Manifest updates by @kaushikmitr in #81
Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates by @kaushikmitr in #78
Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized by @liu-cong in #79
Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 by @dependabot in #82
Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 by @dependabot in #83
Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 by @dependabot in #86
Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 by @dependabot in #84
Bump github.com/prometheus/common from 0.60.1 to 0.61.0 by @dependabot in #85
Proposal update for the API names and latency objective by @ahg-g in #91
Adding simple cloudbuild file that builds, tags, and pushes the docker image by @kfswain in #94
switch to using upstream vllm with new metric by @coolkp in #54
Updating cloudbuild to have image name by @kfswain in #106
Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in #105
Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 by @dependabot in #102
Bump google.golang.org/grpc from 1.68.0 to 1.69.0 by @dependabot in #103
Bump the kubernetes group with 4 updates by @dependabot in https...

Contributors

robscott, tchap, and 22 other contributors

Assets 3

0 Join discussion

04 Feb 22:19

danehans

v0.1.0-rc.1

2cb8be2

v0.1.0-rc.1 Pre-release

Pre-release

API version: v1alpha1

We are excited to announce the v0.1.0-rc.1 release candidate of the Kubernetes Gateway API Inference Extension. This release is intended for early adopters and the community to begin integrating and testing the new APIs. Please note the following:

Not for Production: This release candidate is provided solely for evaluation, testing, and feedback. We strongly advise against using it in production or building products on top of it, as there may be breaking changes before the final release.
Feedback Welcome: Your experiences and feedback are invaluable. Please share any issues or suggestions via GitHub Issues to help us improve the project.

Thank you to all the contributors for helping us deliver this release and for shaping the future of this project!

What's Changed

Owners addition by @kfswain in #2
proposed repo structure + copy of initial proposal by @kfswain in #1
Repo structure by @kfswain in #3
Update OWNERS by @smarterclayton in #6
PoC implementation by @kfswain in #4
Fix build for ext-proc example by @terrytangyuan in #7
Simplify POC installation by @liu-cong in #8
docs: poc markdown improvements by @Xunzhuo in #9
fix: inconsistent secret key with deployment by @Xunzhuo in #11
Updating top level README by @kfswain in #13
API Proposal by @kfswain in #5
Add initial ext proc implementation with LoRA affinity by @liu-cong in #14
Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug by @liu-cong in #17
Envoy update by @kfswain in #18
CRD implementation by @kfswain in #20
Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing by @liu-cong in #26
Add priority based scheduling by @liu-cong in #25
Update vllm deployment example to use 1 GPU as tensor parallelism is 1 by @liu-cong in #28
Add a hermetic e2e test with fake backend pods by @liu-cong in #29
Fix mutierr appending; add a unit test. by @liu-cong in #33
Some minor fixes in Envoy setup by @liu-cong in #35
Update targetModel in request body by @liu-cong in #37
Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. by @kfswain in #39
Simulation code for llm inference gateway by @kaushikmitr in #15
Add myself to approvers by @kfswain in #42
Dynamic lora load/unload sidecar by @coolkp in #31
LLMServerPool Implementation by @kfswain in #36
Repo cleanup by @kfswain in #46
Updating API and generating code by @kfswain in #47
Do not fail Init if fetch metrics fails. It can recover gracefully. by @liu-cong in #51
llmservice reconciler implementation by @kfswain in #48
Update README.md by @BenTheElder in #52
Fixing hermetic_test, small formatting changes by @kfswain in #53
Add myself to reviewers by @liu-cong in #40
Add dependency updates by @robert-cronin in #57
Bump the kubernetes group with 4 updates by @dependabot in #58
Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 by @dependabot in #61
Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 by @dependabot in #62
Bump github.com/prometheus/common from 0.55.0 to 0.60.1 by @dependabot in #60
Bump google.golang.org/grpc from 1.65.0 to 1.68.0 by @dependabot in #59
Fixing Groupversion by @kfswain in #63
Integrating LLMService with weight splitting by @kfswain in #64
Fix build and test by @liu-cong in #65
Makefile fixes with generated output by @kfswain in #67
Manifest updates by @kaushikmitr in #81
Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates by @kaushikmitr in #78
Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized by @liu-cong in #79
Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 by @dependabot in #82
Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 by @dependabot in #83
Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 by @dependabot in #86
Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 by @dependabot in #84
Bump github.com/prometheus/common from 0.60.1 to 0.61.0 by @dependabot in #85
Proposal update for the API names and latency objective by @ahg-g in #91
Adding simple cloudbuild file that builds, tags, and pushes the docker image by @kfswain in #94
switch to using upstream vllm with new metric by @coolkp in #54
Updating cloudbuild to have image name by @kfswain in #106
Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in #105
Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 by @dependabot in #102
Bump google.golang.org/grpc from 1.68.0 to 1.69.0 by @dependabot in #103
Bump the kubernetes group with 4 updates by @dependabot in #101
Bump google.golang.org/protobuf from 1.35.2 to 1.36.0 by @dependabot in #104
Change from SIG Apps to SIG Network by @terrytangyuan in #92
Add response body handler by @liu-cong in #90
API Shift/Refactor by @kfswain in #93
API compliance fix and build fixes by @kfswain in #114
Added a verify rule to Makefile by @ahg-g in #122
update the linter version by @ahg-g in https://github.com/kubernetes-sigs/gatew...

Contributors

robscott, tchap, and 22 other contributors

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

API version: v1alpha1

Getting Started

What we support

Disclaimers

What's Changed

Contributors

What's Changed

Contributors

Releases: kubernetes-sigs/gateway-api-inference-extension

v0.2.0-rc

What's Changed

Contributors

v0.1.0

API version: v1alpha1

Getting Started

What we support

Disclaimers

What's Changed

Contributors

v0.1.0-rc.1

What's Changed

Contributors