A Practical Guide to Modifying Pod Resources in Suspended Kubernetes Jobs (Beta)
Introduction
Batch processing and machine learning workloads often require dynamic resource adjustments based on cluster availability. Kubernetes v1.36 brings a much-anticipated beta feature: the ability to modify container resource requests and limits in the pod template of a suspended Job. This capability, first introduced as alpha in v1.35, gives queue controllers and cluster administrators the flexibility to adjust CPU, memory, GPU, and extended resource specifications on a Job while it is suspended—before it starts or resumes running. In this guide, we'll walk you through how to leverage this feature to make your batch operations more resilient and efficient.
What You Need
- A Kubernetes cluster running version v1.36 or later (beta) with the
MutableJobPodTemplateResourcesfeature gate enabled (it is enabled by default in v1.36). kubectlcommand-line tool configured to communicate with your cluster.- A basic understanding of Kubernetes Jobs and resource requests/limits.
- Optional: A queue controller (like Kueue) that can manage Job resource adjustments automatically.
Step-by-Step Guide
Step 1: Create a Suspended Job with Initial Resource Requirements
Start by defining a Job that is paused from the beginning. Set spec.suspend: true to keep it from spawning pods until you decide the right resource configuration. Here's an example YAML for a machine learning training job that initially requests 4 GPUs:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
limits:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
restartPolicy: Never
Save the file as job-suspended.yaml and create it with kubectl apply -f job-suspended.yaml. The Job is now registered but no pods are running.
Step 2: Assess Cluster Capacity and Decide on Resource Changes
Once the Job is suspended, you or your queue controller can evaluate the current cluster state. For instance, if only 2 GPUs are available instead of the requested 4, you can adjust the resource specifications without losing the Job's metadata, status, or history. This is a major improvement over the previous immutable behavior, which would have required deleting and recreating the Job.
Step 3: Modify Pod Resource Requests and Limits in the Suspended Job
To update the resource values, use kubectl patch or edit the Job resource directly. For example, to reduce GPU count from 4 to 2 (and adjust CPU and memory accordingly), run:
kubectl patch job training-job-example-abcd123 --type='json' -p='[
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "4"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "16Gi"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "4"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "16Gi"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/example-hardware-vendor.com~1gpu", "value": "2"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/example-hardware-vendor.com~1gpu", "value": "2"}
]'
Note: The tilde (~1) in the JSON patch path escapes the forward slash in the extended resource name. Alternatively, you can use kubectl edit job training-job-example-abcd123 and modify the YAML directly.
Step 4: Verify the Resource Modifications
Check that the changes were applied correctly by describing the Job:
kubectl describe job training-job-example-abcd123
You should see the updated resource values under the Pod Template section. The Job remains suspended at this point, so no pods have been created yet.
Step 5: Resume the Job to Launch Pods with the Adjusted Resources
Once the resource fields match the current cluster capacity, set spec.suspend to false to start the Job:
kubectl patch job training-job-example-abcd123 --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": false}]'
Now the Job will create its pods using the updated resource requests and limits. You can monitor pod creation with kubectl get pods -l job-name=training-job-example-abcd123.
Step 6 (Optional): Automate with a Queue Controller
For larger deployments, consider using a queue controller like Kueue to automatically adjust resources based on cluster state. Such controllers can integrate with the Kubernetes API to modify suspended Jobs before resuming them. The architecture remains the same: suspend the Job, modify resources, then resume. The controller handles the decision logic and API calls.
Tips for Using Mutable Pod Resources in Suspended Jobs
- Immutable fields remain: Only resource requests and limits in the pod template can be changed while the Job is suspended. Other fields (e.g., container image, command) are still immutable after creation.
- Resource changes only take effect before pods are created: Once the Job is resumed and pods start, further modifications to the pod template will not affect already-running pods. If you need to change resources mid-execution, consider using Vertical Pod Autoscaler (VPA) in update mode.
- Extended resources work too: You can modify any resource type, including custom extended resources (like GPUs) that your cluster advertises, as long as they are properly registered.
- CronJob integration: For periodic workloads managed by a CronJob, each Job instance can be individually adjusted before it runs. This allows you to scale down resource usage during heavy cluster load instead of failing the run.
- Version compatibility: This feature requires Kubernetes v1.36 or later (beta). If you use an older version, you may need to enable the
MutableJobPodTemplateResourcesfeature gate explicitly in v1.35. Check your cluster version withkubectl version. - Monitor job history: Because you no longer have to delete and recreate Jobs, you preserve the Job's history and status—useful for audit trails and debugging.
By following these steps, you can make your Kubernetes batch processing more adaptive to fluctuating cluster resources. Whether you're managing a small cluster manually or leveraging automated queue controllers, this beta feature saves time and reduces operational complexity.
Related Articles
- A Practical Guide to Building Reliable Multi-Agent AI Systems with Open Protocols
- Trump Phone Nears Release as Device Passes Key Certification Milestone
- Cloudflare Completes 'Fail Small' Initiative to Fortify Network Against Major Outages
- New Framework Reveals: Design Teams Thrive When Leaders Embrace Overlap, Not Separation
- Practical Accessibility in Digital Design: A Q&A Exploration
- How to Test Font Scaling for Accessibility Using Figma Variables
- OpenCL Follows Vulkan's Lead with Cooperative Matrix Extensions to Supercharge Machine Learning Inference
- The Cyclical Evolution of Web Development: From Hacks to Standards