Skip to content

Releases: kubernetes-sigs/kueue

Kueue v0.5.1

28 Nov 20:01
v0.5.1
8b9b1e8
Compare
Choose a tag to compare

Changes since v0.5.0:

Bug or Regression

  • Fix client-go libraries bug that can not operate clusterScoped resources like ClusterQueue and ResourceFlavor. (#1294, @tenzen-y)
  • Fixed fungiblity policy whenCanPreempt: Preempt. The admission should happen in the flavor for which preemptions were issued. (#1332, @alculquicondor)
  • Fix a bug that plain pods managed by kueue will remain a terminating condition forever. (#1342, @tenzen-y)
  • Fix fungibility policy Preempt where it was not able to utilize the next flavor if preemption was not possible. (#1366, @alculquicondor, @KunWuLuan)

Kueue v0.5.0

25 Oct 21:39
739ebb1
Compare
Choose a tag to compare

Changes since v0.4.0:

Highlights

  • AdmissionChecks: a mechanism for internal or external components to influence whether a Workload can be admitted.
  • Integration with cluster-autoscaler's ProvisioningRequest via AdmissionChecks.
  • Information about pending workloads in a ClusterQueue status.
  • Metrics for resource usage of ClusterQueues and LocalQueues.
  • Policy to control whether to preempt or borrow before trying the next flavors.
  • Partial admission graduated to Beta.
  • Workload priority, independent from Pod priority.
  • New integrations:
    • All Kubeflow training APIs
    • Single plain Pods

Changes by Kind

Feature

  • A mechanism for AdmissionChecks to provide labels, annotations, tolerations and node selectors to the pod templates when starting a job (#1180, @mimowo)
  • A reference standalone controller that can be used to support plain Pods using taints and tolerations, which can be used in Kubernetes versions that don't support scheduling gates. (#1111, @nstogner)
  • Add Active condition to AdmissionChecks (#1193, @trasc)
  • Add optional cluster queue resource quota and usage metrics. (#982, @trasc)
  • Add support for AdmissionChecks, a mechanism for internal or external components to influence whether a Workload can be admitted. (#1045, @trasc)
  • Add support for single plain Pods. (#1072, @achernevskii)
  • Add support for workload Priority (#1081, @Gekko0114)
  • Add tolerations to ResourceFlavor. Kueue injects these tolerations to the jobs that are assigned to the flavor when admitted. (#1248, @trasc)
  • Added pprof endpoints for profiling (#978, @stuton)
  • Allow the admission of multiple workloads within one scheduling cycle while borrowing. (#1039, @trasc)
  • An option to synchronize batch/job.completions with parallelism in case of partial admission (#971, @trasc)
  • Expose cluster queue information about pending workloads (#1069, @stuton)
  • Expose probe configurations to helm chart (#986, @yyzxw)
  • Graduate Partial admission to Beta. (#1221, @trasc)
  • Integrate with Cluster Autoscaler's ProvisioningRequest via two stage admission (#1154, @trasc)
  • Manage cluster queue active state based on admission checks life cycle. (#1079, @trasc)
  • Metrics for usage and reservations in ClusterQueues and LocalQueues. (#1206, @trasc)
  • Options to allow workloads to borrow quota or preempt other workloads before trying the next flavor in the list (#849, @KunWuLuan)
  • Support kubeflow.org/mxjob (#1183, @tenzen-y)
  • Support kubeflow.org/paddlejob (#1142, @tenzen-y)
  • Support kubeflow.org/pytorchjob (#995, @tenzen-y)
  • Support kubeflow.org/tfjob (#1068, @tenzen-y)
  • Support kubeflow.org/xgboostjob (#1114, @tenzen-y)
  • Workload objects have the label kueue.x-k8s.io/job-uid where the value matches the uid of the parent job, whether that's a Job, MPIJob, RayJob, JobSet (#1032, @achernevskii)

Bug or Regression

  • Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (#1197, @alculquicondor)
  • Ensure the ClusterQueue status is updated as the number of pending workloads changes. (#1135, @mimowo)
  • Fix resuming of RayJob after preempted. (#1156, @kerthcet)
  • Fixed missing create verb for webhook (#1035, @stuton)
  • Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (#1023, @alculquicondor)
  • Helm: Enable the JobSet integration by default (#1184, @tenzen-y)
  • Improve job controller to be resilient to API failures during preemption (#1005, @alculquicondor)
  • Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (#1024, @alculquicondor)
  • Terminate Kueue when there is an internal failure during setup, so that it can be retried. (#1077, @alculquicondor)

Other (Cleanup or Flake)

Kueue v0.4.2

11 Oct 20:01
417b060
Compare
Choose a tag to compare

Changes since v0.4.1:

Bug or Regression

  • Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (#1197, @alculquicondor)
  • Fix resuming of RayJob after preempted. (#1190, @kerthcet)

Kueue v0.4.1

15 Aug 13:40
328bb66
Compare
Choose a tag to compare

Bug or Regression

  • Fixed missing create verb for webhook (#1053, @stuton)
  • Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (#1029, @alculquicondor)
  • Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (#1030, @alculquicondor)

Kueue v0.4.0

07 Jul 14:41
5cc79d1
Compare
Choose a tag to compare

Changes since v0.3.0:

API Change

Feature

  • Add client-go libraries. (#789, @tenzen-y)
  • Add support for Kuberay's RayJobs. (#667, @trasc)
  • Add support for dynamic reclaim in the JobSet integration. (#901, @trasc)
  • Add support for partial workload admission (#771, @trasc)
  • Add the support for dynamic resources reclaim. (#756, @trasc)
  • Allow scheduler to admit more jobs when the head job have not reached the PodReady=true status. (#708, @KunWuLuan)
  • Allow specifying the manager pod and container security context instead of hardcoded values (#878, @bh-tt)
  • Feature gates for alpha/experimental features is introduced to Kueue Project. (#788, @kerthcet)
  • Ignoring integrations if crd wasn't installed otherwise all integrations are enabled by default (#883, @stuton)
  • Integrate JobSet into kueue (#762, @mcariatm)

Bug or Regression

  • Add permission to update frameworkjob status. (#797, @tenzen-y)
  • Fix a bug that updates events for clusterQueues are created endlessly. (#907, @tenzen-y)
  • Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (#835, @tenzen-y)
  • Fix panic in cluster queue if resources and coveredResources do not have the same length. (#787, @kannon92)
  • Fix: Enforce borrowed=0 if ClusterQueue doesn't belong to a cohort. (#759, @tenzen-y)
  • Fix: Potential over-admission within cohort when borrowing. (#805, @trasc)
  • Fixed preemption to prefer preempting workloads that were more recently admitted. (#843, @stuton)
  • Fixed the suspend=true add to the job/mpijob by the default webhook has not taken effect. (#758, @fjding)

Other (Cleanup or Flake)

  • Add validation for child jobs without ownerReference. (#865, @tenzen-y)

Kueue v0.3.2

13 Jun 14:51
ff63c63
Compare
Choose a tag to compare

Changes since v0.3.1:

Bug or Regression

  • Add permission to update frameworkjob status. (#798, @tenzen-y)
  • Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (#839, @tenzen-y)
  • Fix panic in cluster queue if resources and coveredResources do not have the same length. (#799, @kannon92)
  • Fix: Potential over-admission within cohort when borrowing. (#822, @trasc)
  • Fixed preemption to prefer preempting workloads that were more recently admitted. (#845, @stuton)

Kueue v0.3.1

16 May 18:55
50f628a
Compare
Choose a tag to compare

Changes since v0.3.0:

Bug fixes

  • Fix a bug that the validation webhook doesn't validate the queue name set as a label when creating MPIJob. #711
  • Fix a bug that updates a queue name in workloads with an empty value when using framework jobs that use batch/job internally, such as MPIJob. #713
  • Fix a bug in which borrowed values are set to a non-zero value even though the ClusterQueue doesn't belong to a cohort. #761
  • Fixed adding suspend=true job/mpijob by the default webhook. #765

Kueue v0.3.0

06 Apr 21:07
0e5db01
Compare
Choose a tag to compare

Changes since v0.2.1:

Features

  • Support for kubeflow's MPIJob (v2beta1)
  • Upgrade the config.kueue.x-k8s.io API version from v1alpha1 to v1beta1. v1alpha1 is no longer supported.
    v1beta1 includes the following changes:
    • Add namespace to propagate the namespace where kueue is deployed to the webhook certificate.
    • Add internalCertManagement with fields enable, webhookServiceName and webhookSecretName.
    • Remove enableInternalCertManagement. Use internalCertManagement.enable instead.
  • Upgrade the kueue.x-k8s.io API version from v1alpha2 to v1beta1.
    v1alpha2 is no longer supported.
    v1beta1 includes the following changes:
    • ClusterQueue:
      • Immutability of spec.queueingStrategy.
      • Refactor quota.min and quota.max into nominalQuota and borrowingLimit.
      • Swap hieararchy between resources and flavors.
      • Group flavors and resources into spec.resourceGroups to make
        co-dependent resources explicit.
      • Move admission from spec to status.
      • Add conditions field to status.
    • LocalQueue:
      • Add admitted field in status.
      • Add conditions field to status.
    • Workload:
      • Add metadata to podSet templates.
      • Move admission into status.
    • ResourceFlavor:
      • Introduce spec to hold all fields.
      • Rename labels to nodeLabels.
      • Rename taints to nodeTaints.
  • Reduce API calls by setting .status.admission and updating the Admitted condition in the same API call.
  • Obtain queue names from label kueue.x-k8s.io/queue-name. The annotation with
    the same name is still supported, but it's now deprecated.
  • Multiplatform support for linux/amd64 and linux/arm64.
  • Validating webhook for batch/v1.Job validates kueue-specific labels and
    annotations.
  • Sequential admission of jobs https://kueue.sigs.k8s.io/docs/tasks/setup_sequential_admission/
  • Preemption within ClusterQueue and cohort https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption
  • Support for LimitRanges when calculating jobs usage.
  • Library for integrating job-like CRDs (controller and webhooks) https://sigs.k8s.io/kueue/pkg/controller/jobframework

Production Readiness

Bug fixes

  • Fix job controller ClusterRole for clusters that enable OwnerReferencesPermissionEnforcement admission control validation #392
  • Fix race condition when admission attempt and requeuing happen at the same time #427
  • Atomically release quota and requeue previously inadmissible workloads #512
  • Fix support for leader election #580
  • Fix support for RuntimeClass when calculating jobs usage #565

Acknowledgments

Thanks to our contributors in this release, in no particular order:
@tenzen-y @mcariatm @moficodes @mwielgus @trasc @mimowo @alculquicondor @fjding @kerthcet @ArangoGutierrez @Fish-pro @rbarberop @cortespao @rptaylor @kannon92 @noryev @oginskis @charlieyu1996 @kincl @ahg-g

Kueue v0.2.1

25 Aug 23:43
Compare
Choose a tag to compare

Changes since v0.1.0:

Features

  • Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported.
    v1alpha2 includes the following changes:
    • Rename Queue to LocalQueue.
    • Remove ResourceFlavor.labels. Use ResourceFlavor.metadata.labels instead.
  • Add webhooks to validate and to add defaults to all kueue APIs.
  • Add internal cert manager to serve webhooks with TLS.
  • Use finalizers to prevent ClusterQueues and ResourceFlavors in use from being
    deleted prematurely.
  • Support codependent resources
    by assigning the same flavor to codependent resources in a pod set.
  • Support pod overhead
    in Workload pod sets.
  • Set requests to limits if requests are not set in a Workload pod set,
    matching internal defaulting for k8s Pods.
  • Add prometheus metrics to monitor health of
    the system and the status of ClusterQueues.
  • Use Server Side Apply for Workload admission to reduce API conflicts.

Bug fixes

  • Fix bug that caused Workloads that don't match the ClusterQueue's
    namespaceSelector to block other Workloads in StrictFIFO ClusterQueues.
  • Fix the number of pending workloads in BestEffortFIFO ClusterQueues status.
  • Fix a bug in BestEffortFIFO ClusterQueues where a workload might not be
    retried after a transient error.
  • Fix requeuing an out-of-date workload when failed to admit it.
  • Fix a bug in BestEffortFIFO ClusterQueues where inadmissible workloads
    were not removed from the ClusterQueue when removing the corresponding Queue.

Thanks to all our contributors!

In no particular order: @ahg-g @alculquicondor @ArangoGutierrez @cmssczy @denkensk @kerthcet @knight42 @cortespao @shuheiktgw @thisisprasad

Full Changelog: v0.1.0...v0.2.1

Kueue v0.2.0

25 Aug 22:49
Compare
Choose a tag to compare
Kueue v0.2.0 Pre-release
Pre-release

Do not use. The published container image doesn't match the release.