KubeCPUQuotaOvercommit

Description

Less critical than KubeCPUOvercommit because is calculated with pod limits instead of pod requests

This alert fires when the total CPU limits enforced by Kubernetes exceed the available CPU allocatable on one or more nodes, causing CFS quota overcommitment.

CPU quota overcommitment can result in CPU throttling, increased latency, degraded performance, and potential instability for workloads, especially under peak load conditions.

Possible Causes:

CPU limits are set too high across multiple workloads
Excessive number of CPU-limited pods scheduled on the same node
Node allocatable CPU reduced due to system or kubelet reservations
Misconfigured resource limits and requests
Cluster scaling lagging behind workload growth
Batch workloads or sudden replica increases
Autoscaler not reacting quickly enough

Severity estimation

Medium to High severity, depending on workload criticality:

Low: Occasional throttling with minimal user impact
Medium: Sustained throttling causing latency or degraded throughput
High: Critical services affected, requests failing or delayed
Critical: Multiple services degraded, cluster instability observed

Severity increases with:

Level of overcommitment
Number of affected workloads
Duration of sustained throttling
Importance of affected workloads

Troubleshooting steps

Confirm CPU quota throttling
- Command / Action:
  - Inspect container CPU throttling metrics in Prometheus
  - container_cpu_cfs_throttled_seconds_total
- Expected result:
  - Low or near-zero throttling
- additional info:
  - Sustained high throttling confirms CPU quota overcommitment

Check CPU limits on affected pods
- Command / Action:
  - Review CPU limits and requests
  - kubectl describe pod <pod-name> -n <namespace>
- Expected result:
  - CPU limits match realistic workload requirements
- additional info:
  - Overly high limits contribute directly to quota overcommitment

Compare total CPU limits vs node allocatable
- Command / Action:
  - Inspect node CPU capacity
  - kubectl describe node <node-name>
- Expected result:
  - Total CPU limits do not exceed node allocatable capacity significantly
- additional info:
  - Significant overcommitment increases throttling risk

Identify heavily throttled pods
- Command / Action:
  - Correlate throttling metrics with pods
  - kubectl top pod -n <namespace>
- Expected result:
  - Throttling evenly distributed or minimal
- additional info:
  - “Noisy neighbor” pods may dominate CPU usage

Adjust CPU limits
- Command / Action:
  - Reduce excessive CPU limits for pods or deployments
  - kubectl set resources deployment <deployment-name> –limits=cpu=<value> -n <namespace>
- Expected result:
  - Throttling rate decreases
- additional info:
  - Validate changes under load

Tune CPU requests
- Command / Action:
  - Align CPU requests with actual workload usage
  - kubectl set resources deployment <deployment-name> –requests=cpu=<value> -n <namespace>
- Expected result:
  - Better pod placement and reduced contention
- additional info:
  - Requests affect scheduling; limits affect throttling

Scale workloads or cluster
- Command / Action:
  - Add replicas or nodes to distribute CPU load
  - kubectl scale deployment <deployment-name> –replicas=<n> -n <namespace>
- Expected result:
  - CPU pressure per pod is reduced
- additional info:
  - Horizontal scaling mitigates quota contention

Review autoscaler configuration
- Command / Action:
  - Check HPA and Cluster Autoscaler settings
  - kubectl get hpa -A
- Expected result:
  - Autoscaling reacts appropriately to increased CPU load
- additional info:
  - Delayed scaling can worsen CPU quota overcommitment

Additional resources

Kubernetes CPU quotas and CFS
Kubernetes resource management
Prometheus container CPU metrics
Related alert: CPUThrottlingHigh
Related alert: KubeCPUOvercommit