KubeDeploymentRolloutStuck
KubeDeploymentRolloutStuck
Description
This alert fires when a Kubernetes Deployment rollout is not making progress and remains stuck in an updating state for longer than expected.
It indicates that the Deployment controller is unable to successfully replace old replicas with new ones, potentially leaving the application in a partially updated or degraded state.
Possible Causes:
- Pods failing to start or repeatedly crashing (
CrashLoopBackOff) - Image pull failures (
ImagePullBackOff,ErrImagePull) - Failing readiness or liveness probes
- Insufficient cluster resources (CPU, memory, quotas)
- Scheduling issues (taints, node selectors, affinity rules)
- Misconfigured Deployment update strategy (
maxUnavailable,maxSurge) - Node failures or nodes in
NotReadystate - Deployment paused or blocked by a PodDisruptionBudget
Severity estimation
Medium to High severity, depending on workload impact.
- Low if the rollout is slow but progressing
- Medium if rollout is stalled with some replicas unavailable
- High if the Deployment serves user-facing or critical services
- Critical if rollout leaves zero available replicas
Severity increases with:
- Duration of the stuck rollout
- Number of unavailable replicas
- Criticality of the application
Troubleshooting steps
-
Check Deployment status
- Command / Action:
- Inspect rollout progress
-
kubectl get deployment <deployment-name> -n <namespace>
- Expected result:
UPDATED,READY, andAVAILABLEreplicas converge to desired count
- additional info:
- Lack of progress indicates a stalled rollout
- Command / Action:
-
Check rollout status
- Command / Action:
- Inspect rollout state
-
kubectl rollout status deployment <deployment-name> -n <namespace>
- Expected result:
- Rollout completes successfully
- additional info:
- A hanging status confirms the rollout is stuck
- Command / Action:
-
Describe the Deployment
- Command / Action:
- Review events and conditions
-
kubectl describe deployment <deployment-name> -n <namespace>
- Expected result:
- Events show ReplicaSet scaling and pod creation
- additional info:
- Look for scheduling, image, or probe failures
- Command / Action:
-
Inspect ReplicaSets
- Command / Action:
- List ReplicaSets and their replica counts
-
kubectl get rs -n <namespace>
- Expected result:
- New ReplicaSet scales up while old ones scale down
- additional info:
- New ReplicaSet stuck at 0 ready replicas indicates an issue
- Command / Action:
-
Inspect Pods
- Command / Action:
- List pods and check their states
-
kubectl get pods -n <namespace>
- Expected result:
- Pods are
RunningandReady
- Pods are
- additional info:
- Investigate
Pending,CrashLoopBackOff, orImagePullBackOff
- Investigate
- Command / Action:
-
Check pod logs
- Command / Action:
- Review logs for failing pods
-
kubectl logs <pod-name> -n <namespace>
- Expected result:
- Application starts without repeated errors
- additional info:
- Use
--previousfor restarted containers
- Use
- Command / Action:
-
Verify update strategy and PDBs
- Command / Action:
- Review update strategy and PodDisruptionBudgets
-
kubectl get deployment <deployment-name> -n <namespace> -o yaml
-
kubectl get pdb -n <namespace>
- Expected result:
- Update strategy allows progress and PDBs are not blocking
- additional info:
- Overly strict settings can block rollouts
- Command / Action:
-
Roll back or fix and redeploy
- Command / Action:
- Roll back to last stable version if needed
-
kubectl rollout undo deployment <deployment-name> -n <namespace>
- Expected result:
- Deployment stabilizes and rollout completes
- additional info:
- Always identify root cause before retrying rollout
- Command / Action:
Additional resources
- Kubernetes Deployment documentation
- Kubernetes rollout troubleshooting
- Kubernetes Pod lifecycle and troubleshooting
- Related alert: KubeDeploymentGenerationMismatch
- Related alert: KubeDeploymentReplicasMismatch