KubeMemoryQuotaOvercommit
KubeMemoryQuotaOvercommit
Description
This alert fires when total memory limits enforced by Kubernetes exceed the available memory allocatable on one or more nodes, causing memory quota overcommitment.
Memory quota overcommitment can result in pod evictions, OOMKilled containers, increased latency, and potential data loss, especially under peak load conditions.
Possible Causes:
- Memory limits are set too high across multiple workloads
- Excessive number of memory-limited pods scheduled on the same node
- Node allocatable memory reduced due to system or kubelet reservations
- Misconfigured resource limits and requests
- Cluster scaling lagging behind workload growth
- Batch workloads or sudden replica increases
- Autoscaler not reacting quickly enough
- Memory leaks causing sustained consumption
Severity estimation
High severity, depending on workload criticality:
- Low: Occasional evictions with minimal user impact
- Medium: Sustained evictions causing service degradation or restarts
- High: Critical services affected, repeated pod evictions
- Critical: Multiple services affected, data loss risk, cluster instability
Severity increases with:
- Level of overcommitment
- Number of affected workloads
- Frequency of pod evictions
- Importance of affected workloads
Troubleshooting steps
-
Confirm memory pressure and evictions
- Command / Action:
- Check for pod evictions and memory pressure on nodes
-
kubectl describe node <node-name>
- Expected result:
- No MemoryPressure condition or minimal evictions
- additional info:
- Sustained high evictions confirm memory quota overcommitment
- Command / Action:
-
Check memory limits on affected pods
- Command / Action:
- Review memory limits and requests
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- Memory limits match realistic workload requirements
- additional info:
- Overly high limits contribute directly to quota overcommitment
- Command / Action:
-
Compare total memory limits vs node allocatable
- Command / Action:
- Inspect node memory capacity
-
kubectl describe node <node-name>
- Expected result:
- Total memory limits do not exceed node allocatable capacity significantly
- additional info:
- Significant overcommitment increases eviction risk
- Command / Action:
-
Identify heavily loaded pods
- Command / Action:
- Correlate high memory usage with pods
-
kubectl top pod -n <namespace> –sort-by=memory
- Expected result:
- Memory usage evenly distributed or minimal
- additional info:
- “Noisy neighbor” pods may dominate memory usage
- Command / Action:
-
Check for OOMKilled containers
- Command / Action:
- Identify pods that were terminated due to OOM
-
kubectl get pods -A -o json | grep -i oomkill
- Expected result:
- No or very few OOMKilled containers
- additional info:
- OOMKilled containers confirm memory exhaustion
- Command / Action:
-
Adjust memory limits
- Command / Action:
- Reduce excessive memory limits for pods or deployments
-
kubectl set resources deployment <deployment-name> –limits=memory=<value> -n <namespace>
- Expected result:
- Eviction rate decreases
- additional info:
- Validate changes under load
- Command / Action:
-
Tune memory requests
- Command / Action:
- Align memory requests with actual workload usage
-
kubectl set resources deployment <deployment-name> –requests=memory=<value> -n <namespace>
- Expected result:
- Better pod placement and reduced memory contention
- additional info:
- Requests affect scheduling; limits affect evictions
- Command / Action:
-
Scale workloads or cluster
- Command / Action:
- Add replicas or nodes to distribute memory load
-
kubectl scale deployment <deployment-name> –replicas=<n> -n <namespace>
- Expected result:
- Memory pressure per pod is reduced
- additional info:
- Horizontal scaling mitigates memory contention
- Command / Action:
-
Review autoscaler configuration
- Command / Action:
- Check HPA and Cluster Autoscaler settings
-
kubectl get hpa -A
- Expected result:
- Autoscaling reacts appropriately to increased memory load
- additional info:
- Delayed scaling can worsen memory quota overcommitment
- Command / Action:
-
Check for memory leaks
- Command / Action:
- Monitor memory usage trends for sustained growth
-
kubectl logs <pod-name> -n <namespace> –tail=100
- Expected result:
- Memory usage remains stable over time
- additional info:
- Sustained memory growth suggests potential memory leak in application
- Command / Action:
Additional resources
- Kubernetes memory quotas and limits
- Kubernetes resource management
- Debugging OOMKilled pods
- Related alert: KubeMemoryOvercommit