Alert Runbooks

KubeMemoryOvercommit

KubeMemoryOvercommit

Description

This alert fires when total memory requests across pods exceed the allocatable memory capacity of one or more Kubernetes nodes.
Memory overcommitment increases the risk of pod evictions, OOMKilled containers, application crashes, and cluster instability, especially during traffic spikes or memory leaks.


Possible Causes:


Severity estimation

High severity, depending on workload impact.

Severity increases with:


Troubleshooting steps

  1. Identify affected nodes

    • Command / Action:
      • Check node allocatable memory and usage
      • kubectl describe node <node-name>

    • Expected result:
      • Requested memory is below allocatable memory
    • additional info:
      • Focus on nodes reporting high memory pressure or evictions

  1. Review memory requests per namespace

    • Command / Action:
      • List memory requests aggregated by namespace
      • kubectl get pods -A -o custom-columns=NS:.metadata.namespace,MEMORY:.spec.containers[*].resources.requests.memory

    • Expected result:
      • Requests align with actual workload needs
    • additional info:
      • Overestimated requests increase overcommitment risk

  1. Compare requests vs actual usage

    • Command / Action:
      • Inspect real memory usage
      • kubectl top pod -A

    • Expected result:
      • Memory usage roughly matches requests
    • additional info:
      • Large gaps indicate inefficient request sizing

  1. Check memory limits and OOMKill events

    • Command / Action:
      • Review memory limits and check for OOMKilled containers
      • kubectl describe pod <pod-name> -n <namespace>

    • Expected result:
      • Memory limits are reasonable and no OOMKilled containers
    • additional info:
      • OOMKilled containers indicate memory exhaustion

  1. Check for memory leaks

    • Command / Action:
      • Monitor memory usage trends over time
      • kubectl logs <pod-name> -n <namespace>

    • Expected result:
      • Memory usage remains stable or grows slowly
    • additional info:
      • Sustained memory growth suggests potential memory leak

  1. Identify memory-heavy pods

    • Command / Action:
      • Detect pods with high memory consumption
      • kubectl top pod -A –sort-by=memory

    • Expected result:
      • Memory usage is evenly distributed
    • additional info:
      • A few memory-hungry pods can starve others

  1. Reduce memory requests where possible

    • Command / Action:
      • Tune memory requests to realistic values
      • kubectl set resources deployment <deployment-name> –requests=memory=<value> -n <namespace>

    • Expected result:
      • Lower total requested memory on nodes
    • additional info:
      • Always validate changes under load

  1. Scale the cluster

    • Command / Action:
      • Add nodes or enable autoscaling
      • kubectl get nodes

    • Expected result:
      • Memory pressure is reduced across nodes
    • additional info:
      • Ensure Cluster Autoscaler is properly configured

  1. Reschedule workloads

    • Command / Action:
      • Evict or rebalance pods to less loaded nodes
      • kubectl drain <node-name> –ignore-daemonsets

    • Expected result:
      • Pods redistribute to nodes with more available memory
    • additional info:
      • Use carefully in production environments

Additional resources