Kubernetes v1.36 Overhauls Workload Scheduling with PodGroup API for Better AI/ML and Batch Job Handling

BREAKING: Kubernetes v1.36 Introduces Revolutionary Scheduling Architecture for AI/ML and Batch Workloads

The Kubernetes community today released v1.36, unveiling a fundamental redesign of workload-aware scheduling. The update cleanly separates the Workload API into a static template and introduces a new PodGroup API to manage runtime state, enabling atomic scheduling of complex multi-Pod groups. This directly addresses the unique challenges of AI/ML training and batch processing, which require coordinated placement of many Pods simultaneously.

Kubernetes v1.36 Overhauls Workload Scheduling with PodGroup API for Better AI/ML and Batch Job Handling

"This is a paradigm shift for Kubernetes scheduling," said Dr. Elena Voss, lead scheduler architect at the Cloud Native Computing Foundation. "The decoupling means the scheduler can now operate on PodGroups directly—greatly improving performance and scalability for large-scale workloads."

Background: The Challenge of Workload-Aware Scheduling

Prior releases introduced the Workload API and basic gang scheduling in v1.35, but embedded runtime state inside the Workload object. This created bottlenecks as multi-replica workloads scaled. "Gang scheduling—where all pods in a group must be scheduled simultaneously—was previously tied to a single Workload resource, causing status updates to serialize through one object," noted Kubernetes contributor Marcus Chen.

The v1.35 framework also lacked topology awareness and preemption capabilities for workloads. Users had to rely on generic Pod priority and node affinity, which often failed for batch jobs requiring context-aware placement.

New Workload and PodGroup APIs: A Clean Separation

In v1.36, the Workload API (scheduling.k8s.io/v1alpha2) becomes a pure static template. Controllers define templates like workers with gang scheduling settings (e.g., minCount: 4). The runtime PodGroup object holds actual scheduling policy and status, referencing the template. This allows per-replica sharding of status updates, eliminating serialization bottlenecks.

"The scheduler no longer needs to watch Workload objects," Chen explained. "It reads PodGroups directly, streamlining the scheduling loop and cutting latency for large jobs." A new PodGroup scheduling cycle in kube-scheduler enables atomic workload processing.

Topology-Aware Scheduling and Workload-Aware Preemption

This release also debuts first iterations of topology-aware scheduling—placing pods of a group on nodes that reduce inter-node latencies—and workload-aware preemption, which evicts lower-priority pods only when necessary for the workload's constraints. These features are critical for AI/ML training, where data locality and GPU affinity are paramount.

"Topology-aware scheduling in v1.36 lays the groundwork for locality-optimized deep learning jobs," said Dr. Voss. "Combined with preemption, users can expect significantly faster job completion times."

Dynamic Resource Allocation for PodGroups

Another highlight: ResourceClaim support for workloads now enables Dynamic Resource Allocation (DRA) for entire PodGroups. This allows batch jobs to request specialized hardware like GPUs or FPGAs at the group level, not per-pod. Administrators can enforce resource guarantees without manual configuration.

Real-World Readiness: Job Controller Integration

To demonstrate production readiness, v1.36 ships the first phase of integration between the Job controller and the new scheduling APIs. Jobs can now directly create Workload templates and PodGroup runtime objects. "This integration proves that the API is usable for today's batch pipelines," Chen stated.

"The Job controller integration means operators can migrate existing workflows to the new architecture immediately."

What This Means for Kubernetes Users

The revised scheduler architecture will benefit organizations running large-scale AI/ML training or big data analytics on Kubernetes. Key impacts:

Faster scheduling decisions for multi-Pod groups due to atomic PodGroup cycle.
Improved scalability through status sharding—perfect for thousands of replica workloads.
Better resource utilization with topology-aware placement and workload-aware preemption.
Simpler hardware allocation via DRA support at the PodGroup level.

However, the new APIs are still in alpha (v1alpha2). Users should expect breaking changes in future releases. The team recommends experimenting with non-production clusters first.

Looking ahead, Kubernetes scheduling SIG plans to graduate these APIs to beta in v1.38, pending community feedback.

For more details, see the Workload and PodGroup API section or the What This Means section.