KubeDaemonSetRolloutStuck
KubeDaemonSetRolloutStuck
Description
This alert fires when a Kubernetes DaemonSet rollout is not progressing and remains stuck in an updating state for longer than expected.
It indicates that the DaemonSet controller is unable to successfully create or update pods on all eligible nodes, leaving the cluster in a partially updated or degraded state.
DaemonSets are often used for critical node-level components (networking, logging, monitoring, security), so a stuck rollout can have widespread impact.
Possible Causes:
- One or more nodes are
NotReady, cordoned, or unreachable - Insufficient node resources (CPU, memory, disk)
- Pod scheduling blocked by taints, node selectors, or affinity rules
- Image pull failures (
ImagePullBackOff,ErrImagePull) - Pods failing to start or crashing (
CrashLoopBackOff) - Failing init containers
- Security context issues (PSA/PSP, SELinux, AppArmor)
- DaemonSet update strategy too restrictive (e.g.
maxUnavailable) - Kubelet or node-level failures
Severity estimation
Medium to High severity, depending on the DaemonSet function.
- Low if the DaemonSet is non-critical and impact is limited
- Medium if functionality is partially degraded
- High if the DaemonSet provides critical services (CNI, logging, monitoring, security agents)
- Critical if rollout failure impacts cluster networking or node stability
Severity increases with:
- Number of affected nodes
- Duration of the stuck rollout
- Criticality of the DaemonSet workload
Troubleshooting steps
-
Check DaemonSet status
- Command / Action:
- Inspect desired vs updated pod counts
-
kubectl get daemonset <daemonset-name> -n <namespace>
- Expected result:
DESIRED,CURRENT,READY, andUPDATEDmatch
- additional info:
UPDATED < DESIREDindicates a stuck rollout
- Command / Action:
-
Describe the DaemonSet
- Command / Action:
- Review events and update progress
-
kubectl describe daemonset <daemonset-name> -n <namespace>
- Expected result:
- Events show successful pod scheduling and updates
- additional info:
- Look for scheduling, image, or permission errors
- Command / Action:
-
Identify missing or unhealthy pods
- Command / Action:
- List DaemonSet pods and node placement
-
kubectl get pods -n <namespace> -l <daemonset-label> -o wide
- Expected result:
- One Running and Ready pod per eligible node
- additional info:
- Compare against
kubectl get nodes
- Compare against
- Command / Action:
-
Describe problematic pods
- Command / Action:
- Inspect pod events and status
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- Pods start successfully and become Ready
- additional info:
- Common issues include scheduling failures or volume mount errors
- Command / Action:
-
Check container logs
- Command / Action:
- Review logs for failing containers
-
kubectl logs <pod-name> -n <namespace>
- Expected result:
- Application starts without repeated errors
- additional info:
- For crash loops, check previous logs with
--previous
- For crash loops, check previous logs with
- Command / Action:
-
Verify node health
- Command / Action:
- Check node readiness and pressure conditions
-
kubectl get nodes
-
kubectl describe node <node-name>
- Expected result:
- Nodes are
Readywith noDiskPressureorMemoryPressure
- Nodes are
- additional info:
- Node issues often block DaemonSet scheduling
- Command / Action:
-
Review update strategy
- Command / Action:
- Inspect update strategy configuration
-
kubectl get daemonset <daemonset-name> -n <namespace> -o yaml
- Expected result:
maxUnavailableallows progress even with degraded nodes
- additional info:
- Overly strict strategies can stall rollouts
- Command / Action:
-
Force recovery if required
- Command / Action:
- Delete stuck pods or roll back configuration
-
kubectl delete pod <pod-name> -n <namespace>
- Expected result:
- Pods are recreated and rollout resumes
- additional info:
- Ensure root cause is resolved before forcing restarts
- Command / Action:
Additional resources
- Kubernetes DaemonSet documentation
- Kubernetes scheduling and eviction
- Kubernetes Pod lifecycle and troubleshooting
- Related alert: KubeDaemonSetNotScheduled
- Related alert: KubeNodeNotReady