KubePodCrashLooping
KubePodCrashLooping
Description
This alert fires when a Kubernetes Pod is repeatedly crashing and restarting, entering a CrashLoopBackOff state.
It indicates that the container starts but fails shortly after, preventing the application from running normally and potentially impacting service availability.
Possible Causes:
- Application runtime errors or unhandled exceptions
- Incorrect command or entrypoint configuration
- Missing or invalid environment variables or secrets
- Dependency failures (databases, APIs, external services)
- Failing liveness or startup probes
- Insufficient resources (CPU, memory) causing OOMKills
- Configuration or image changes introduced during a recent deployment
- File system or permission issues
Severity estimation
Medium to High severity, depending on impact and scope.
- Low if the pod is non-critical or has redundancy
- Medium if some replicas are crash looping but service remains available
- High if crash looping affects user-facing or critical services
- Critical if all replicas of a workload are crash looping
Severity increases with:
- Duration of the crash loop
- Number of affected pods
- Criticality of the application
Troubleshooting steps
-
Identify crash looping pods
- Command / Action:
- List pods and check restart counts
-
kubectl get pods -n <namespace>
- Expected result:
- Crash looping pods show
CrashLoopBackOffand high restart counts
- Crash looping pods show
- additional info:
- Focus on pods with frequent restarts
- Command / Action:
-
Describe the Pod
- Command / Action:
- Inspect pod events and restart reasons
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- Events indicate the reason for container restarts
- additional info:
- Look for probe failures, OOMKilled, or config errors
- Command / Action:
-
Check previous container logs
- Command / Action:
- Review logs from the last failed container instance
-
kubectl logs <pod-name> -n <namespace> –previous
- Expected result:
- Logs reveal the error that caused the crash
- additional info:
- Current logs may be empty if the container crashes quickly
- Command / Action:
-
Verify resource limits
- Command / Action:
- Check CPU and memory limits
-
kubectl get pod <pod-name> -n <namespace> -o yaml
- Expected result:
- Resource limits are sufficient for the workload
- additional info:
- OOMKilled events indicate insufficient memory
- Command / Action:
-
Check probes configuration
- Command / Action:
- Review liveness and startup probes
-
kubectl get <resource> <name> -n <namespace> -o yaml
- Expected result:
- Probes allow enough startup and recovery time
- additional info:
- Overly aggressive probes can cause crash loops
- Command / Action:
-
Verify configuration and secrets
- Command / Action:
- Check ConfigMaps and Secrets used by the pod
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- Required configuration is present and mounted correctly
- additional info:
- Missing secrets often cause immediate container exits
- Command / Action:
-
Check recent changes
- Command / Action:
- Review recent deployments or configuration updates
-
kubectl rollout history deployment <deployment-name> -n <namespace>
- Expected result:
- Recent changes are expected and valid
- additional info:
- Roll back if a recent change introduced the crash loop
- Command / Action:
-
Roll back or fix and redeploy
- Command / Action:
- Roll back to last known good version or apply a fix
-
kubectl rollout undo deployment <deployment-name> -n <namespace>
- Expected result:
- Pods stabilize and remain in
Runningstate
- Pods stabilize and remain in
- additional info:
- Avoid repeated restarts without addressing the root cause
- Command / Action:
Additional resources
- Kubernetes Pods documentation
- Kubernetes Pod lifecycle and troubleshooting
- Kubernetes debugging applications
- Related alert: KubePodNotEnoughHealthyPods
- Related alert: KubeDeploymentReplicasMismatch