Alert Runbooks

KubePodNotReady

KubePodNotReady

Description

This alert fires when a Kubernetes Pod is running but not in a Ready state for longer than expected.
A Pod marked as NotReady cannot receive traffic via Services, which may lead to partial or complete service disruption depending on replica count and workload type.


Possible Causes:


Severity estimation

Medium to High severity, depending on impact.

Severity increases with:


Troubleshooting steps

  1. Check pod readiness status

    • Command / Action:
      • Inspect pod readiness conditions
      • kubectl get pod <pod-name> -n <namespace>

    • Expected result:
      • Pod shows READY 1/1 (or expected container count)
    • additional info:
      • READY 0/1 indicates readiness probe or startup issues

  1. Describe the pod

    • Command / Action:
      • Review events, conditions, and probe results
      • kubectl describe pod <pod-name> -n <namespace>

    • Expected result:
      • Events show successful container startup and readiness
    • additional info:
      • Look for probe failures, mount errors, or scheduling issues

  1. Check readiness probe configuration

    • Command / Action:
      • Inspect readiness probe definition
      • kubectl get pod <pod-name> -n <namespace> -o yaml

    • Expected result:
      • Probe matches actual application health endpoint
    • additional info:
      • Overly strict probes can keep pods NotReady

  1. Inspect container logs

    • Command / Action:
      • Review application logs
      • kubectl logs <pod-name> -n <namespace>

    • Expected result:
      • Application starts successfully
    • additional info:
      • Use --previous if the container has restarted

  1. Check init containers

    • Command / Action:
      • Verify init containers completed successfully
      • kubectl get pod <pod-name> -n <namespace> -o jsonpath=’{.status.initContainerStatuses}'

    • Expected result:
      • All init containers show terminated with exit code 0
    • additional info:
      • Stuck init containers block readiness

  1. Check node health

    • Command / Action:
      • Verify node status and pressure conditions
      • kubectl get node <node-name>

      • kubectl describe node <node-name>

    • Expected result:
      • Node is Ready with no pressure conditions
    • additional info:
      • Node issues can delay readiness

  1. Verify resource requests and limits

    • Command / Action:
      • Inspect pod resource configuration
      • kubectl describe pod <pod-name> -n <namespace>

    • Expected result:
      • Resources are sufficient for application startup
    • additional info:
      • CPU throttling or OOM kills can prevent readiness

  1. Restart pod if appropriate

    • Command / Action:
      • Restart pod after fixing root cause
      • kubectl delete pod <pod-name> -n <namespace>

    • Expected result:
      • New pod becomes Ready
    • additional info:
      • Avoid restarts without understanding the cause

Additional resources