KubeMemoryOvercommit
KubeMemoryOvercommit
Description
This alert fires when total memory requests across pods exceed the allocatable memory capacity of one or more Kubernetes nodes.
Memory overcommitment increases the risk of pod evictions, OOMKilled containers, application crashes, and cluster instability, especially during traffic spikes or memory leaks.
Possible Causes:
- Memory requests set too high relative to node capacity
- Excessive number of memory-intensive workloads scheduled on the same node
- Cluster scaling lagging behind workload growth
- Misconfigured resource requests and limits
- Nodes with reduced allocatable memory due to system or kubelet reservations
- Sudden increase in replicas or batch workloads
- Inadequate cluster autoscaler configuration
- Memory leaks in applications causing sustained high usage
Severity estimation
High severity, depending on workload impact.
- Low if workloads tolerate memory pressure with minimal evictions
- Medium if occasional pod evictions occur but services remain stable
- High if critical workloads are being evicted or restarted frequently
- Critical if multiple services are affected or data loss risk exists
Severity increases with:
- Degree of overcommitment
- Duration of the condition
- Frequency of pod evictions
- Criticality of affected workloads
Troubleshooting steps
-
Identify affected nodes
- Command / Action:
- Check node allocatable memory and usage
-
kubectl describe node <node-name>
- Expected result:
- Requested memory is below allocatable memory
- additional info:
- Focus on nodes reporting high memory pressure or evictions
- Command / Action:
-
Review memory requests per namespace
- Command / Action:
- List memory requests aggregated by namespace
-
kubectl get pods -A -o custom-columns=NS:.metadata.namespace,MEMORY:.spec.containers[*].resources.requests.memory
- Expected result:
- Requests align with actual workload needs
- additional info:
- Overestimated requests increase overcommitment risk
- Command / Action:
-
Compare requests vs actual usage
- Command / Action:
- Inspect real memory usage
-
kubectl top pod -A
- Expected result:
- Memory usage roughly matches requests
- additional info:
- Large gaps indicate inefficient request sizing
- Command / Action:
-
Check memory limits and OOMKill events
- Command / Action:
- Review memory limits and check for OOMKilled containers
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- Memory limits are reasonable and no OOMKilled containers
- additional info:
- OOMKilled containers indicate memory exhaustion
- Command / Action:
-
Check for memory leaks
- Command / Action:
- Monitor memory usage trends over time
-
kubectl logs <pod-name> -n <namespace>
- Expected result:
- Memory usage remains stable or grows slowly
- additional info:
- Sustained memory growth suggests potential memory leak
- Command / Action:
-
Identify memory-heavy pods
- Command / Action:
- Detect pods with high memory consumption
-
kubectl top pod -A –sort-by=memory
- Expected result:
- Memory usage is evenly distributed
- additional info:
- A few memory-hungry pods can starve others
- Command / Action:
-
Reduce memory requests where possible
- Command / Action:
- Tune memory requests to realistic values
-
kubectl set resources deployment <deployment-name> –requests=memory=<value> -n <namespace>
- Expected result:
- Lower total requested memory on nodes
- additional info:
- Always validate changes under load
- Command / Action:
-
Scale the cluster
- Command / Action:
- Add nodes or enable autoscaling
-
kubectl get nodes
- Expected result:
- Memory pressure is reduced across nodes
- additional info:
- Ensure Cluster Autoscaler is properly configured
- Command / Action:
-
Reschedule workloads
- Command / Action:
- Evict or rebalance pods to less loaded nodes
-
kubectl drain <node-name> –ignore-daemonsets
- Expected result:
- Pods redistribute to nodes with more available memory
- additional info:
- Use carefully in production environments
- Command / Action:
Additional resources
- Kubernetes resource management
- Memory requests and limits
- Related alerts: KubeMemoryQuotaOvercommit