Alert Runbooks

KubeMemoryQuotaOvercommit

KubeMemoryQuotaOvercommit

Description

This alert fires when total memory limits enforced by Kubernetes exceed the available memory allocatable on one or more nodes, causing memory quota overcommitment.
Memory quota overcommitment can result in pod evictions, OOMKilled containers, increased latency, and potential data loss, especially under peak load conditions.


Possible Causes:


Severity estimation

High severity, depending on workload criticality:

Severity increases with:


Troubleshooting steps

  1. Confirm memory pressure and evictions

    • Command / Action:
      • Check for pod evictions and memory pressure on nodes
      • kubectl describe node <node-name>

    • Expected result:
      • No MemoryPressure condition or minimal evictions
    • additional info:
      • Sustained high evictions confirm memory quota overcommitment

  1. Check memory limits on affected pods

    • Command / Action:
      • Review memory limits and requests
      • kubectl describe pod <pod-name> -n <namespace>

    • Expected result:
      • Memory limits match realistic workload requirements
    • additional info:
      • Overly high limits contribute directly to quota overcommitment

  1. Compare total memory limits vs node allocatable

    • Command / Action:
      • Inspect node memory capacity
      • kubectl describe node <node-name>

    • Expected result:
      • Total memory limits do not exceed node allocatable capacity significantly
    • additional info:
      • Significant overcommitment increases eviction risk

  1. Identify heavily loaded pods

    • Command / Action:
      • Correlate high memory usage with pods
      • kubectl top pod -n <namespace> –sort-by=memory

    • Expected result:
      • Memory usage evenly distributed or minimal
    • additional info:
      • “Noisy neighbor” pods may dominate memory usage

  1. Check for OOMKilled containers

    • Command / Action:
      • Identify pods that were terminated due to OOM
      • kubectl get pods -A -o json | grep -i oomkill

    • Expected result:
      • No or very few OOMKilled containers
    • additional info:
      • OOMKilled containers confirm memory exhaustion

  1. Adjust memory limits

    • Command / Action:
      • Reduce excessive memory limits for pods or deployments
      • kubectl set resources deployment <deployment-name> –limits=memory=<value> -n <namespace>

    • Expected result:
      • Eviction rate decreases
    • additional info:
      • Validate changes under load

  1. Tune memory requests

    • Command / Action:
      • Align memory requests with actual workload usage
      • kubectl set resources deployment <deployment-name> –requests=memory=<value> -n <namespace>

    • Expected result:
      • Better pod placement and reduced memory contention
    • additional info:
      • Requests affect scheduling; limits affect evictions

  1. Scale workloads or cluster

    • Command / Action:
      • Add replicas or nodes to distribute memory load
      • kubectl scale deployment <deployment-name> –replicas=<n> -n <namespace>

    • Expected result:
      • Memory pressure per pod is reduced
    • additional info:
      • Horizontal scaling mitigates memory contention

  1. Review autoscaler configuration

    • Command / Action:
      • Check HPA and Cluster Autoscaler settings
      • kubectl get hpa -A

    • Expected result:
      • Autoscaling reacts appropriately to increased memory load
    • additional info:
      • Delayed scaling can worsen memory quota overcommitment

  1. Check for memory leaks

    • Command / Action:
      • Monitor memory usage trends for sustained growth
      • kubectl logs <pod-name> -n <namespace> –tail=100

    • Expected result:
      • Memory usage remains stable over time
    • additional info:
      • Sustained memory growth suggests potential memory leak in application

Additional resources