Alert Runbooks

KubeStatefulSetReplicaMismatch

KubeStatefulSetReplicaMismatch

Description

This alert fires when a Kubernetes StatefulSet does not have the expected number of replicas running and Ready.
It indicates that the actual number of Ready pods differs from the desired replica count, potentially impacting stateful workloads such as databases, queues, or clustered applications that rely on stable identities and storage.


Possible Causes:


Severity estimation

Medium to High severity, depending on the workload and replica count.

Severity increases with:


Troubleshooting steps

  1. Check StatefulSet status

    • Command / Action:
      • Inspect desired vs ready replicas
      • kubectl get statefulset <statefulset-name> -n <namespace>

    • Expected result:
      • READY equals desired replica count
    • additional info:
      • A mismatch confirms the alert condition

  1. Describe the StatefulSet

    • Command / Action:
      • Review events and pod management behavior
      • kubectl describe statefulset <statefulset-name> -n <namespace>

    • Expected result:
      • Events show normal pod creation and updates
    • additional info:
      • StatefulSets create pods sequentially by default

  1. Inspect pods

    • Command / Action:
      • List StatefulSet pods and their status
      • kubectl get pods -n <namespace> -l <statefulset-label> -o wide

    • Expected result:
      • Pods are Running and Ready
    • additional info:
      • Identify which ordinal pod is missing or unhealthy

  1. Describe problematic pods

    • Command / Action:
      • Inspect events and container status
      • kubectl describe pod <pod-name> -n <namespace>

    • Expected result:
      • Pods start successfully without repeated failures
    • additional info:
      • Look for PVC, scheduling, or probe-related errors

  1. Check PersistentVolumeClaims

    • Command / Action:
      • Verify PVCs are bound and healthy
      • kubectl get pvc -n <namespace>

    • Expected result:
      • All PVCs are in Bound state
    • additional info:
      • Unbound PVCs block pod startup

  1. Check pod logs

    • Command / Action:
      • Review logs for application-level failures
      • kubectl logs <pod-name> -n <namespace>

    • Expected result:
      • Application starts and runs normally
    • additional info:
      • Use --previous for restarted containers

  1. Verify PodDisruptionBudgets

    • Command / Action:
      • Inspect PDBs that may block recovery
      • kubectl get pdb -n <namespace>

    • Expected result:
      • PDBs allow at least one pod to be unavailable
    • additional info:
      • Overly strict PDBs can stall StatefulSet recovery

  1. Recover missing replicas

    • Command / Action:
      • Fix root cause and allow pod recreation
      • kubectl delete pod <pod-name> -n <namespace>

    • Expected result:
      • StatefulSet recreates pod with same identity
    • additional info:
      • Avoid deleting PVCs unless data loss is acceptable

Additional resources