A Practical Guide to Modifying Pod Resources in Suspended Kubernetes Jobs (Beta)

By

Introduction

Batch processing and machine learning workloads often require dynamic resource adjustments based on cluster availability. Kubernetes v1.36 brings a much-anticipated beta feature: the ability to modify container resource requests and limits in the pod template of a suspended Job. This capability, first introduced as alpha in v1.35, gives queue controllers and cluster administrators the flexibility to adjust CPU, memory, GPU, and extended resource specifications on a Job while it is suspended—before it starts or resumes running. In this guide, we'll walk you through how to leverage this feature to make your batch operations more resilient and efficient.

A Practical Guide to Modifying Pod Resources in Suspended Kubernetes Jobs (Beta)

What You Need

Step-by-Step Guide

Step 1: Create a Suspended Job with Initial Resource Requirements

Start by defining a Job that is paused from the beginning. Set spec.suspend: true to keep it from spawning pods until you decide the right resource configuration. Here's an example YAML for a machine learning training job that initially requests 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
  labels:
    app.kubernetes.io/name: trainer
spec:
  suspend: true
  template:
    metadata:
      annotations:
        kubernetes.io/description: "ML training, ID abcd123"
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

Save the file as job-suspended.yaml and create it with kubectl apply -f job-suspended.yaml. The Job is now registered but no pods are running.

Step 2: Assess Cluster Capacity and Decide on Resource Changes

Once the Job is suspended, you or your queue controller can evaluate the current cluster state. For instance, if only 2 GPUs are available instead of the requested 4, you can adjust the resource specifications without losing the Job's metadata, status, or history. This is a major improvement over the previous immutable behavior, which would have required deleting and recreating the Job.

Step 3: Modify Pod Resource Requests and Limits in the Suspended Job

To update the resource values, use kubectl patch or edit the Job resource directly. For example, to reduce GPU count from 4 to 2 (and adjust CPU and memory accordingly), run:

kubectl patch job training-job-example-abcd123 --type='json' -p='[
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "4"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "16Gi"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "4"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "16Gi"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/example-hardware-vendor.com~1gpu", "value": "2"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/example-hardware-vendor.com~1gpu", "value": "2"}
]'

Note: The tilde (~1) in the JSON patch path escapes the forward slash in the extended resource name. Alternatively, you can use kubectl edit job training-job-example-abcd123 and modify the YAML directly.

Step 4: Verify the Resource Modifications

Check that the changes were applied correctly by describing the Job:

kubectl describe job training-job-example-abcd123

You should see the updated resource values under the Pod Template section. The Job remains suspended at this point, so no pods have been created yet.

Step 5: Resume the Job to Launch Pods with the Adjusted Resources

Once the resource fields match the current cluster capacity, set spec.suspend to false to start the Job:

kubectl patch job training-job-example-abcd123 --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": false}]'

Now the Job will create its pods using the updated resource requests and limits. You can monitor pod creation with kubectl get pods -l job-name=training-job-example-abcd123.

Step 6 (Optional): Automate with a Queue Controller

For larger deployments, consider using a queue controller like Kueue to automatically adjust resources based on cluster state. Such controllers can integrate with the Kubernetes API to modify suspended Jobs before resuming them. The architecture remains the same: suspend the Job, modify resources, then resume. The controller handles the decision logic and API calls.

Tips for Using Mutable Pod Resources in Suspended Jobs

By following these steps, you can make your Kubernetes batch processing more adaptive to fluctuating cluster resources. Whether you're managing a small cluster manually or leveraging automated queue controllers, this beta feature saves time and reduces operational complexity.

Related Articles

Recommended

Discover More

Crypto Markets See First Dip of 2026 as Morgan Stanley Eyes ETFs and Senate Prepares Key Vote8 Essential Facts About the WebAssembly JSPI APIAzure Accelerate for Databases: Modernize Your Data Infrastructure for the AI EraMastering AI Hardware Diversity: How KernelEvolve Automates Performance Optimization at MetaTank Pad Ultra: Rugged Android Tablet with Integrated 1080p Projector and Massive Battery