KubeMemoryQuotaOvercommit

Description

This alert fires when total memory limits enforced by Kubernetes exceed the available memory allocatable on one or more nodes, causing memory quota overcommitment.
Memory quota overcommitment can result in pod evictions, OOMKilled containers, increased latency, and potential data loss, especially under peak load conditions.

Possible Causes:

Memory limits are set too high across multiple workloads
Excessive number of memory-limited pods scheduled on the same node
Node allocatable memory reduced due to system or kubelet reservations
Misconfigured resource limits and requests
Cluster scaling lagging behind workload growth
Batch workloads or sudden replica increases
Autoscaler not reacting quickly enough
Memory leaks causing sustained consumption

Severity estimation

High severity, depending on workload criticality:

Low: Occasional evictions with minimal user impact
Medium: Sustained evictions causing service degradation or restarts
High: Critical services affected, repeated pod evictions
Critical: Multiple services affected, data loss risk, cluster instability

Severity increases with:

Level of overcommitment
Number of affected workloads
Frequency of pod evictions
Importance of affected workloads

Troubleshooting steps

Confirm memory pressure and evictions
- Command / Action:
  - Check for pod evictions and memory pressure on nodes
  - kubectl describe node <node-name>
- Expected result:
  - No MemoryPressure condition or minimal evictions
- additional info:
  - Sustained high evictions confirm memory quota overcommitment

Check memory limits on affected pods
- Command / Action:
  - Review memory limits and requests
  - kubectl describe pod <pod-name> -n <namespace>
- Expected result:
  - Memory limits match realistic workload requirements
- additional info:
  - Overly high limits contribute directly to quota overcommitment

Compare total memory limits vs node allocatable
- Command / Action:
  - Inspect node memory capacity
  - kubectl describe node <node-name>
- Expected result:
  - Total memory limits do not exceed node allocatable capacity significantly
- additional info:
  - Significant overcommitment increases eviction risk

Identify heavily loaded pods
- Command / Action:
  - Correlate high memory usage with pods
  - kubectl top pod -n <namespace> –sort-by=memory
- Expected result:
  - Memory usage evenly distributed or minimal
- additional info:
  - “Noisy neighbor” pods may dominate memory usage

Check for OOMKilled containers
- Command / Action:
  - Identify pods that were terminated due to OOM
  - kubectl get pods -A -o json | grep -i oomkill
- Expected result:
  - No or very few OOMKilled containers
- additional info:
  - OOMKilled containers confirm memory exhaustion

Adjust memory limits
- Command / Action:
  - Reduce excessive memory limits for pods or deployments
  - kubectl set resources deployment <deployment-name> –limits=memory=<value> -n <namespace>
- Expected result:
  - Eviction rate decreases
- additional info:
  - Validate changes under load

Tune memory requests
- Command / Action:
  - Align memory requests with actual workload usage
  - kubectl set resources deployment <deployment-name> –requests=memory=<value> -n <namespace>
- Expected result:
  - Better pod placement and reduced memory contention
- additional info:
  - Requests affect scheduling; limits affect evictions

Scale workloads or cluster
- Command / Action:
  - Add replicas or nodes to distribute memory load
  - kubectl scale deployment <deployment-name> –replicas=<n> -n <namespace>
- Expected result:
  - Memory pressure per pod is reduced
- additional info:
  - Horizontal scaling mitigates memory contention

Review autoscaler configuration
- Command / Action:
  - Check HPA and Cluster Autoscaler settings
  - kubectl get hpa -A
- Expected result:
  - Autoscaling reacts appropriately to increased memory load
- additional info:
  - Delayed scaling can worsen memory quota overcommitment

Check for memory leaks
- Command / Action:
  - Monitor memory usage trends for sustained growth
  - kubectl logs <pod-name> -n <namespace> –tail=100
- Expected result:
  - Memory usage remains stable over time
- additional info:
  - Sustained memory growth suggests potential memory leak in application

KubeMemoryQuotaOvercommit

KubeMemoryQuotaOvercommit

Description

Possible Causes:

Severity estimation

Troubleshooting steps

Additional resources