KubeStatefulSetReplicaMismatch

Description

This alert fires when a Kubernetes StatefulSet does not have the expected number of replicas running and Ready.
It indicates that the actual number of Ready pods differs from the desired replica count, potentially impacting stateful workloads such as databases, queues, or clustered applications that rely on stable identities and storage.

Possible Causes:

Pods failing to start or repeatedly crashing (CrashLoopBackOff)
Pending pods due to insufficient CPU, memory, or storage
PersistentVolumeClaim (PVC) provisioning or binding failures
Volume mount or permission issues
Node failures or nodes in NotReady state
Strict pod management policy blocking pod creation
Failing readiness or liveness probes
PodDisruptionBudget constraints
Manual pod deletion without replacement

Severity estimation

Medium to High severity, depending on the workload and replica count.

Low if a non-critical replica is missing and redundancy remains
Medium if reduced capacity or quorum risk exists
High if critical replicas are missing (e.g. primary database pod)
Critical if quorum is lost or all replicas are unavailable

Severity increases with:

Number of missing replicas
Role of the missing pod(s) in the application
Duration of the mismatch

Troubleshooting steps

Check StatefulSet status
- Command / Action:
  - Inspect desired vs ready replicas
  - kubectl get statefulset <statefulset-name> -n <namespace>
- Expected result:
  - READY equals desired replica count
- additional info:
  - A mismatch confirms the alert condition

Describe the StatefulSet
- Command / Action:
  - Review events and pod management behavior
  - kubectl describe statefulset <statefulset-name> -n <namespace>
- Expected result:
  - Events show normal pod creation and updates
- additional info:
  - StatefulSets create pods sequentially by default

Inspect pods
- Command / Action:
  - List StatefulSet pods and their status
  - kubectl get pods -n <namespace> -l <statefulset-label> -o wide
- Expected result:
  - Pods are Running and Ready
- additional info:
  - Identify which ordinal pod is missing or unhealthy

Describe problematic pods
- Command / Action:
  - Inspect events and container status
  - kubectl describe pod <pod-name> -n <namespace>
- Expected result:
  - Pods start successfully without repeated failures
- additional info:
  - Look for PVC, scheduling, or probe-related errors

Check PersistentVolumeClaims
- Command / Action:
  - Verify PVCs are bound and healthy
  - kubectl get pvc -n <namespace>
- Expected result:
  - All PVCs are in Bound state
- additional info:
  - Unbound PVCs block pod startup

Check pod logs
- Command / Action:
  - Review logs for application-level failures
  - kubectl logs <pod-name> -n <namespace>
- Expected result:
  - Application starts and runs normally
- additional info:
  - Use --previous for restarted containers

Verify PodDisruptionBudgets
- Command / Action:
  - Inspect PDBs that may block recovery
  - kubectl get pdb -n <namespace>
- Expected result:
  - PDBs allow at least one pod to be unavailable
- additional info:
  - Overly strict PDBs can stall StatefulSet recovery

Recover missing replicas
- Command / Action:
  - Fix root cause and allow pod recreation
  - kubectl delete pod <pod-name> -n <namespace>
- Expected result:
  - StatefulSet recreates pod with same identity
- additional info:
  - Avoid deleting PVCs unless data loss is acceptable

Additional resources

Kubernetes StatefulSet documentation
Kubernetes Persistent Volumes
Kubernetes Pod lifecycle and troubleshooting
Related alert: KubePodNotReady
Related alert: KubePodCrashLooping