KubeCPUOvercommit
KubeCPUOvercommit
Description
This alert fires when total CPU requests or limits across pods exceed the allocatable CPU capacity of one or more Kubernetes nodes.
CPU overcommitment increases the risk of CPU contention, throttling, degraded performance, and unstable workloads, especially during traffic spikes.
Possible Causes:
- CPU requests set too high relative to node capacity
- Excessive number of CPU-intensive workloads scheduled on the same node
- Cluster scaling lagging behind workload growth
- Misconfigured resource requests and limits
- Nodes with reduced allocatable CPU due to system reservations
- Sudden increase in replicas or batch workloads
- Inadequate cluster autoscaler configuration
Severity estimation
Medium to High severity, depending on workload impact.
- Low if workloads tolerate contention and throttling is minimal
- Medium if latency increases or background jobs slow down
- High if user-facing services are impacted
- Critical if critical workloads are starved or repeatedly throttled
Severity increases with:
- Degree of overcommitment
- Duration of the condition
- Criticality of affected workloads
Troubleshooting steps
-
Identify affected nodes
- Command / Action:
- Check node allocatable CPU and usage
-
kubectl describe node <node-name>
- Expected result:
- Requested CPU is below allocatable CPU
- additional info:
- Focus on nodes reporting high pod density or CPU pressure
- Command / Action:
-
Review CPU requests per namespace
- Command / Action:
- List CPU requests aggregated by namespace
-
kubectl get pods -A -o custom-columns=NS:.metadata.namespace,CPU:.spec.containers[*].resources.requests.cpu
- Expected result:
- Requests align with actual workload needs
- additional info:
- Overestimated requests increase overcommitment risk
- Command / Action:
-
Compare requests vs actual usage
- Command / Action:
- Inspect real CPU usage
-
kubectl top pod -A
- Expected result:
- CPU usage roughly matches requests
- additional info:
- Large gaps indicate inefficient request sizing
- Command / Action:
-
Check CPU limits and throttling
- Command / Action:
- Review CPU limits for affected pods
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- CPU limits are reasonable and consistent
- additional info:
- Tight limits amplify the impact of overcommitment
- Command / Action:
-
Identify noisy neighbors
- Command / Action:
- Detect pods with high CPU consumption
-
kubectl top pod -n <namespace>
- Expected result:
- CPU usage is evenly distributed
- additional info:
- A few CPU-hungry pods can starve others
- Command / Action:
-
Reduce CPU requests where possible
- Command / Action:
- Tune CPU requests to realistic values
-
kubectl set resources deployment <deployment-name> –requests=cpu=<value> -n <namespace>
- Expected result:
- Lower total requested CPU on nodes
- additional info:
- Always validate changes under load
- Command / Action:
-
Scale the cluster
- Command / Action:
- Add nodes or enable autoscaling
-
kubectl get nodes
- Expected result:
- CPU pressure is reduced across nodes
- additional info:
- Ensure Cluster Autoscaler is properly configured
- Command / Action:
-
Reschedule workloads
- Command / Action:
- Evict or rebalance pods
-
kubectl drain <node-name> –ignore-daemonsets
- Expected result:
- Pods redistribute to less loaded nodes
- additional info:
- Use carefully in production environments
- Command / Action:
Additional resources
- Kubernetes resource management
- CPU requests and limits
- Related alert: CPUThrottlingHigh