KubeCPUOvercommit

Description

This alert fires when total CPU requests or limits across pods exceed the allocatable CPU capacity of one or more Kubernetes nodes.
CPU overcommitment increases the risk of CPU contention, throttling, degraded performance, and unstable workloads, especially during traffic spikes.

Possible Causes:

CPU requests set too high relative to node capacity
Excessive number of CPU-intensive workloads scheduled on the same node
Cluster scaling lagging behind workload growth
Misconfigured resource requests and limits
Nodes with reduced allocatable CPU due to system reservations
Sudden increase in replicas or batch workloads
Inadequate cluster autoscaler configuration

Severity estimation

Medium to High severity, depending on workload impact.

Low if workloads tolerate contention and throttling is minimal
Medium if latency increases or background jobs slow down
High if user-facing services are impacted
Critical if critical workloads are starved or repeatedly throttled

Severity increases with:

Degree of overcommitment
Duration of the condition
Criticality of affected workloads

Troubleshooting steps

Identify affected nodes
- Command / Action:
  - Check node allocatable CPU and usage
  - kubectl describe node <node-name>
- Expected result:
  - Requested CPU is below allocatable CPU
- additional info:
  - Focus on nodes reporting high pod density or CPU pressure

Review CPU requests per namespace
- Command / Action:
  - List CPU requests aggregated by namespace
  - kubectl get pods -A -o custom-columns=NS:.metadata.namespace,CPU:.spec.containers[*].resources.requests.cpu
- Expected result:
  - Requests align with actual workload needs
- additional info:
  - Overestimated requests increase overcommitment risk

Compare requests vs actual usage
- Command / Action:
  - Inspect real CPU usage
  - kubectl top pod -A
- Expected result:
  - CPU usage roughly matches requests
- additional info:
  - Large gaps indicate inefficient request sizing

Check CPU limits and throttling
- Command / Action:
  - Review CPU limits for affected pods
  - kubectl describe pod <pod-name> -n <namespace>
- Expected result:
  - CPU limits are reasonable and consistent
- additional info:
  - Tight limits amplify the impact of overcommitment

Identify noisy neighbors
- Command / Action:
  - Detect pods with high CPU consumption
  - kubectl top pod -n <namespace>
- Expected result:
  - CPU usage is evenly distributed
- additional info:
  - A few CPU-hungry pods can starve others

Reduce CPU requests where possible
- Command / Action:
  - Tune CPU requests to realistic values
  - kubectl set resources deployment <deployment-name> –requests=cpu=<value> -n <namespace>
- Expected result:
  - Lower total requested CPU on nodes
- additional info:
  - Always validate changes under load

Scale the cluster
- Command / Action:
  - Add nodes or enable autoscaling
  - kubectl get nodes
- Expected result:
  - CPU pressure is reduced across nodes
- additional info:
  - Ensure Cluster Autoscaler is properly configured

Reschedule workloads
- Command / Action:
  - Evict or rebalance pods
  - kubectl drain <node-name> –ignore-daemonsets
- Expected result:
  - Pods redistribute to less loaded nodes
- additional info:
  - Use carefully in production environments

Additional resources

Kubernetes resource management
CPU requests and limits
Related alert: CPUThrottlingHigh