KubeMemoryOvercommit

Description

This alert fires when total memory requests across pods exceed the allocatable memory capacity of one or more Kubernetes nodes.
Memory overcommitment increases the risk of pod evictions, OOMKilled containers, application crashes, and cluster instability, especially during traffic spikes or memory leaks.

Possible Causes:

Memory requests set too high relative to node capacity
Excessive number of memory-intensive workloads scheduled on the same node
Cluster scaling lagging behind workload growth
Misconfigured resource requests and limits
Nodes with reduced allocatable memory due to system or kubelet reservations
Sudden increase in replicas or batch workloads
Inadequate cluster autoscaler configuration
Memory leaks in applications causing sustained high usage

Severity estimation

High severity, depending on workload impact.

Low if workloads tolerate memory pressure with minimal evictions
Medium if occasional pod evictions occur but services remain stable
High if critical workloads are being evicted or restarted frequently
Critical if multiple services are affected or data loss risk exists

Severity increases with:

Degree of overcommitment
Duration of the condition
Frequency of pod evictions
Criticality of affected workloads

Troubleshooting steps

Identify affected nodes
- Command / Action:
  - Check node allocatable memory and usage
  - kubectl describe node <node-name>
- Expected result:
  - Requested memory is below allocatable memory
- additional info:
  - Focus on nodes reporting high memory pressure or evictions

Review memory requests per namespace
- Command / Action:
  - List memory requests aggregated by namespace
  - kubectl get pods -A -o custom-columns=NS:.metadata.namespace,MEMORY:.spec.containers[*].resources.requests.memory
- Expected result:
  - Requests align with actual workload needs
- additional info:
  - Overestimated requests increase overcommitment risk

Compare requests vs actual usage
- Command / Action:
  - Inspect real memory usage
  - kubectl top pod -A
- Expected result:
  - Memory usage roughly matches requests
- additional info:
  - Large gaps indicate inefficient request sizing

Check memory limits and OOMKill events
- Command / Action:
  - Review memory limits and check for OOMKilled containers
  - kubectl describe pod <pod-name> -n <namespace>
- Expected result:
  - Memory limits are reasonable and no OOMKilled containers
- additional info:
  - OOMKilled containers indicate memory exhaustion

Check for memory leaks
- Command / Action:
  - Monitor memory usage trends over time
  - kubectl logs <pod-name> -n <namespace>
- Expected result:
  - Memory usage remains stable or grows slowly
- additional info:
  - Sustained memory growth suggests potential memory leak

Identify memory-heavy pods
- Command / Action:
  - Detect pods with high memory consumption
  - kubectl top pod -A –sort-by=memory
- Expected result:
  - Memory usage is evenly distributed
- additional info:
  - A few memory-hungry pods can starve others

Reduce memory requests where possible
- Command / Action:
  - Tune memory requests to realistic values
  - kubectl set resources deployment <deployment-name> –requests=memory=<value> -n <namespace>
- Expected result:
  - Lower total requested memory on nodes
- additional info:
  - Always validate changes under load

Scale the cluster
- Command / Action:
  - Add nodes or enable autoscaling
  - kubectl get nodes
- Expected result:
  - Memory pressure is reduced across nodes
- additional info:
  - Ensure Cluster Autoscaler is properly configured

Reschedule workloads
- Command / Action:
  - Evict or rebalance pods to less loaded nodes
  - kubectl drain <node-name> –ignore-daemonsets
- Expected result:
  - Pods redistribute to nodes with more available memory
- additional info:
  - Use carefully in production environments

Additional resources

Kubernetes resource management
Memory requests and limits
Related alerts: KubeMemoryQuotaOvercommit