Alert Runbooks

KubeDeploymentRolloutStuck

KubeDeploymentRolloutStuck

Description

This alert fires when a Kubernetes Deployment rollout is not making progress and remains stuck in an updating state for longer than expected.
It indicates that the Deployment controller is unable to successfully replace old replicas with new ones, potentially leaving the application in a partially updated or degraded state.


Possible Causes:


Severity estimation

Medium to High severity, depending on workload impact.

Severity increases with:


Troubleshooting steps

  1. Check Deployment status

    • Command / Action:
      • Inspect rollout progress
      • kubectl get deployment <deployment-name> -n <namespace>

    • Expected result:
      • UPDATED, READY, and AVAILABLE replicas converge to desired count
    • additional info:
      • Lack of progress indicates a stalled rollout

  1. Check rollout status

    • Command / Action:
      • Inspect rollout state
      • kubectl rollout status deployment <deployment-name> -n <namespace>

    • Expected result:
      • Rollout completes successfully
    • additional info:
      • A hanging status confirms the rollout is stuck

  1. Describe the Deployment

    • Command / Action:
      • Review events and conditions
      • kubectl describe deployment <deployment-name> -n <namespace>

    • Expected result:
      • Events show ReplicaSet scaling and pod creation
    • additional info:
      • Look for scheduling, image, or probe failures

  1. Inspect ReplicaSets

    • Command / Action:
      • List ReplicaSets and their replica counts
      • kubectl get rs -n <namespace>

    • Expected result:
      • New ReplicaSet scales up while old ones scale down
    • additional info:
      • New ReplicaSet stuck at 0 ready replicas indicates an issue

  1. Inspect Pods

    • Command / Action:
      • List pods and check their states
      • kubectl get pods -n <namespace>

    • Expected result:
      • Pods are Running and Ready
    • additional info:
      • Investigate Pending, CrashLoopBackOff, or ImagePullBackOff

  1. Check pod logs

    • Command / Action:
      • Review logs for failing pods
      • kubectl logs <pod-name> -n <namespace>

    • Expected result:
      • Application starts without repeated errors
    • additional info:
      • Use --previous for restarted containers

  1. Verify update strategy and PDBs

    • Command / Action:
      • Review update strategy and PodDisruptionBudgets
      • kubectl get deployment <deployment-name> -n <namespace> -o yaml

      • kubectl get pdb -n <namespace>

    • Expected result:
      • Update strategy allows progress and PDBs are not blocking
    • additional info:
      • Overly strict settings can block rollouts

  1. Roll back or fix and redeploy

    • Command / Action:
      • Roll back to last stable version if needed
      • kubectl rollout undo deployment <deployment-name> -n <namespace>

    • Expected result:
      • Deployment stabilizes and rollout completes
    • additional info:
      • Always identify root cause before retrying rollout

Additional resources