KubePodNotEnoughHealthyPods
KubePodNotEnoughHealthyPods
Description
This alert fires when a workload (such as a Deployment, StatefulSet, or ReplicaSet) does not have enough healthy pods compared to the expected or configured number.
It indicates that one or more pods are not Ready, unavailable, or failing, which may reduce service capacity or cause partial or full outages.
Possible Causes:
- Pods failing readiness or liveness probes
- Application crashes (
CrashLoopBackOff) - Insufficient cluster resources (CPU, memory, disk)
- Scheduling issues (taints, node selectors, affinity rules)
- Image pull failures (
ImagePullBackOff,ErrImagePull) - Node failures or nodes in
NotReadystate - Ongoing rollout or deployment update
- Misconfigured health checks
- Dependency failures (databases, APIs, external services)
Severity estimation
Medium to High severity, depending on workload criticality and impact.
- Low if reduced health occurs briefly during a rollout
- Medium if some pods are unhealthy but redundancy exists
- High if user-facing services are affected
- Critical if healthy pod count drops below minimum required capacity or reaches zero
Severity increases with:
- Duration of unhealthy state
- Number of affected pods
- Criticality of the service
Troubleshooting steps
-
Identify affected workload
- Command / Action:
- Check which workload is missing healthy pods
-
kubectl get deployment,statefulset,replicaset -n <namespace>
- Expected result:
- Healthy workloads show matching desired and ready pod counts
- additional info:
- Focus on workloads where READY < DESIRED
- Command / Action:
-
Inspect pods
- Command / Action:
- List pods and check their status
-
kubectl get pods -n <namespace>
- Expected result:
- Pods are
RunningandReady
- Pods are
- additional info:
- Investigate
Pending,CrashLoopBackOff, orNotReadypods
- Investigate
- Command / Action:
-
Describe unhealthy pods
- Command / Action:
- Inspect pod details and events
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- Events show normal scheduling and startup
- additional info:
- Look for probe failures, image issues, or resource errors
- Command / Action:
-
Check container logs
- Command / Action:
- Review logs for failing containers
-
kubectl logs <pod-name> -n <namespace>
- Expected result:
- Application runs without repeated errors
- additional info:
- For restarts, check previous logs with
--previous
- For restarts, check previous logs with
- Command / Action:
-
Verify readiness and liveness probes
- Command / Action:
- Review probe configuration in the workload spec
-
kubectl get <resource> <name> -n <namespace> -o yaml
- Expected result:
- Probes reflect realistic startup and response times
- additional info:
- Overly strict probes can cause healthy apps to appear unhealthy
- Command / Action:
-
Check node health
- Command / Action:
- Ensure nodes are healthy and schedulable
-
kubectl get nodes
- Expected result:
- Nodes are in
Readystate
- Nodes are in
- additional info:
- Node pressure or failures can affect pod health
- Command / Action:
-
Review recent changes
- Command / Action:
- Check recent deployments or configuration changes
-
kubectl rollout history deployment <deployment-name> -n <namespace>
- Expected result:
- Recent changes are expected and valid
- additional info:
- Consider rollback if a recent change caused the issue
- Command / Action:
-
Scale or roll back if required
- Command / Action:
- Scale workload or roll back to a stable version
-
kubectl scale deployment <deployment-name> –replicas=<n> -n <namespace>
-
kubectl rollout undo deployment <deployment-name> -n <namespace>
- Expected result:
- Healthy pod count meets or exceeds required minimum
- additional info:
- Always address root cause before scaling permanently
- Command / Action:
Additional resources
- Kubernetes Pods documentation
- Kubernetes Pod lifecycle and troubleshooting
- Kubernetes Deployment documentation
- Related alert: KubeDeploymentReplicasMismatch
- Related alert: KubePodCrashLooping