Alert Runbooks

KubePodNotEnoughHealthyPods

KubePodNotEnoughHealthyPods

Description

This alert fires when a workload (such as a Deployment, StatefulSet, or ReplicaSet) does not have enough healthy pods compared to the expected or configured number.
It indicates that one or more pods are not Ready, unavailable, or failing, which may reduce service capacity or cause partial or full outages.


Possible Causes:


Severity estimation

Medium to High severity, depending on workload criticality and impact.

Severity increases with:


Troubleshooting steps

  1. Identify affected workload

    • Command / Action:
      • Check which workload is missing healthy pods
      • kubectl get deployment,statefulset,replicaset -n <namespace>

    • Expected result:
      • Healthy workloads show matching desired and ready pod counts
    • additional info:
      • Focus on workloads where READY < DESIRED

  1. Inspect pods

    • Command / Action:
      • List pods and check their status
      • kubectl get pods -n <namespace>

    • Expected result:
      • Pods are Running and Ready
    • additional info:
      • Investigate Pending, CrashLoopBackOff, or NotReady pods

  1. Describe unhealthy pods

    • Command / Action:
      • Inspect pod details and events
      • kubectl describe pod <pod-name> -n <namespace>

    • Expected result:
      • Events show normal scheduling and startup
    • additional info:
      • Look for probe failures, image issues, or resource errors

  1. Check container logs

    • Command / Action:
      • Review logs for failing containers
      • kubectl logs <pod-name> -n <namespace>

    • Expected result:
      • Application runs without repeated errors
    • additional info:
      • For restarts, check previous logs with --previous

  1. Verify readiness and liveness probes

    • Command / Action:
      • Review probe configuration in the workload spec
      • kubectl get <resource> <name> -n <namespace> -o yaml

    • Expected result:
      • Probes reflect realistic startup and response times
    • additional info:
      • Overly strict probes can cause healthy apps to appear unhealthy

  1. Check node health

    • Command / Action:
      • Ensure nodes are healthy and schedulable
      • kubectl get nodes

    • Expected result:
      • Nodes are in Ready state
    • additional info:
      • Node pressure or failures can affect pod health

  1. Review recent changes

    • Command / Action:
      • Check recent deployments or configuration changes
      • kubectl rollout history deployment <deployment-name> -n <namespace>

    • Expected result:
      • Recent changes are expected and valid
    • additional info:
      • Consider rollback if a recent change caused the issue

  1. Scale or roll back if required

    • Command / Action:
      • Scale workload or roll back to a stable version
      • kubectl scale deployment <deployment-name> –replicas=<n> -n <namespace>

      • kubectl rollout undo deployment <deployment-name> -n <namespace>

    • Expected result:
      • Healthy pod count meets or exceeds required minimum
    • additional info:
      • Always address root cause before scaling permanently

Additional resources