Alert Runbooks

CPUThrottlingHigh

CPUThrottlingHigh

Description

This alert fires when a container or pod experiences high CPU throttling, meaning it is frequently prevented from using CPU time due to hitting its configured CPU limits.
High CPU throttling can cause increased latency, slow processing, timeouts, and degraded application performance, even if CPU usage appears low at first glance.


Possible Causes:


Severity estimation

Medium to High severity, depending on impact.

Severity increases with:


Troubleshooting steps

  1. Confirm CPU throttling

    • Command / Action:
      • Check throttling metrics (Prometheus)
      • container_cpu_cfs_throttled_seconds_total

    • Expected result:
      • Throttling rate is low or near zero
    • additional info:
      • Sustained throttling confirms CPU limit pressure

  1. Check CPU usage vs limits

    • Command / Action:
      • Compare actual usage to limits
      • kubectl top pod <pod-name> -n <namespace>

    • Expected result:
      • CPU usage comfortably below the limit
    • additional info:
      • Throttling can occur even if average usage seems low

  1. Inspect resource configuration

    • Command / Action:
      • Review requests and limits
      • kubectl describe pod <pod-name> -n <namespace>

    • Expected result:
      • CPU limits align with workload needs
    • additional info:
      • Tight limits increase throttling risk

  1. Check node CPU pressure

    • Command / Action:
      • Inspect node resource usage
      • kubectl describe node <node-name>

    • Expected result:
      • Node has available CPU capacity
    • additional info:
      • CPU overcommitment amplifies throttling

  1. Review application behavior

    • Command / Action:
      • Identify CPU-intensive operations
      • Review application metrics and profiling

    • Expected result:
      • CPU usage patterns match expectations
    • additional info:
      • Busy loops or inefficient algorithms cause bursts

  1. Increase CPU limits (if appropriate)

    • Command / Action:
      • Adjust CPU limits to reduce throttling
      • kubectl set resources deployment <deployment-name> –limits=cpu=<value> -n <namespace>

    • Expected result:
      • Throttling rate decreases
    • additional info:
      • Ensure node capacity can support higher limits

  1. Adjust CPU requests

    • Command / Action:
      • Increase CPU requests to improve scheduling
      • kubectl set resources deployment <deployment-name> –requests=cpu=<value> -n <namespace>

    • Expected result:
      • Pod is scheduled on nodes with sufficient CPU
    • additional info:
      • Requests affect placement; limits affect throttling

  1. Scale the workload

    • Command / Action:
      • Distribute load across more replicas
      • kubectl scale deployment <deployment-name> –replicas=<n> -n <namespace>

    • Expected result:
      • CPU load and throttling per pod decrease
    • additional info:
      • Horizontal scaling often reduces throttling

Additional resources