KubeDeploymentRolloutStuck

Description

This alert fires when a Kubernetes Deployment rollout is not making progress and remains stuck in an updating state for longer than expected.
It indicates that the Deployment controller is unable to successfully replace old replicas with new ones, potentially leaving the application in a partially updated or degraded state.

Possible Causes:

Pods failing to start or repeatedly crashing (CrashLoopBackOff)
Image pull failures (ImagePullBackOff, ErrImagePull)
Failing readiness or liveness probes
Insufficient cluster resources (CPU, memory, quotas)
Scheduling issues (taints, node selectors, affinity rules)
Misconfigured Deployment update strategy (maxUnavailable, maxSurge)
Node failures or nodes in NotReady state
Deployment paused or blocked by a PodDisruptionBudget

Severity estimation

Medium to High severity, depending on workload impact.

Low if the rollout is slow but progressing
Medium if rollout is stalled with some replicas unavailable
High if the Deployment serves user-facing or critical services
Critical if rollout leaves zero available replicas

Severity increases with:

Duration of the stuck rollout
Number of unavailable replicas
Criticality of the application

Troubleshooting steps

Check Deployment status
- Command / Action:
  - Inspect rollout progress
  - kubectl get deployment <deployment-name> -n <namespace>
- Expected result:
  - UPDATED, READY, and AVAILABLE replicas converge to desired count
- additional info:
  - Lack of progress indicates a stalled rollout

Check rollout status
- Command / Action:
  - Inspect rollout state
  - kubectl rollout status deployment <deployment-name> -n <namespace>
- Expected result:
  - Rollout completes successfully
- additional info:
  - A hanging status confirms the rollout is stuck

Describe the Deployment
- Command / Action:
  - Review events and conditions
  - kubectl describe deployment <deployment-name> -n <namespace>
- Expected result:
  - Events show ReplicaSet scaling and pod creation
- additional info:
  - Look for scheduling, image, or probe failures

Inspect ReplicaSets
- Command / Action:
  - List ReplicaSets and their replica counts
  - kubectl get rs -n <namespace>
- Expected result:
  - New ReplicaSet scales up while old ones scale down
- additional info:
  - New ReplicaSet stuck at 0 ready replicas indicates an issue

Inspect Pods
- Command / Action:
  - List pods and check their states
  - kubectl get pods -n <namespace>
- Expected result:
  - Pods are Running and Ready
- additional info:
  - Investigate Pending, CrashLoopBackOff, or ImagePullBackOff

Check pod logs
- Command / Action:
  - Review logs for failing pods
  - kubectl logs <pod-name> -n <namespace>
- Expected result:
  - Application starts without repeated errors
- additional info:
  - Use --previous for restarted containers

Verify update strategy and PDBs
- Command / Action:
  - Review update strategy and PodDisruptionBudgets
  - kubectl get deployment <deployment-name> -n <namespace> -o yaml
  - kubectl get pdb -n <namespace>
- Expected result:
  - Update strategy allows progress and PDBs are not blocking
- additional info:
  - Overly strict settings can block rollouts

Roll back or fix and redeploy
- Command / Action:
  - Roll back to last stable version if needed
  - kubectl rollout undo deployment <deployment-name> -n <namespace>
- Expected result:
  - Deployment stabilizes and rollout completes
- additional info:
  - Always identify root cause before retrying rollout

Additional resources

Kubernetes Deployment documentation
Kubernetes rollout troubleshooting
Kubernetes Pod lifecycle and troubleshooting
Related alert: KubeDeploymentGenerationMismatch
Related alert: KubeDeploymentReplicasMismatch