KubeSchedulerAbsent
KubeSchedulerAbsent
Description
Prometheus target discovery has not found the Kubernetes Scheduler in the past 15 minutes.
The Kube Scheduler is a critical control plane component responsible for assigning newly created pods to nodes based on resource requirements, constraints, affinity/anti-affinity rules, and other scheduling policies. When absent from monitoring, no new metrics are being collected from this component.
Possible Causes:
- Kube Scheduler process has crashed or is not running
- Scheduler pod/container has been terminated or evicted
- Metrics endpoint is unreachable or misconfigured
- Network connectivity issues between Prometheus and Scheduler
- Service discovery configuration error in Prometheus
- Scheduler running without metrics endpoint enabled
- Firewall or network policy blocking metric scraping
- Certificate or authentication issues preventing metric collection
- Scheduler running on a node that is down or unreachable
- Leader election issues (only the leader exposes full metrics)
Severity estimation
Critical severity - The Scheduler is a core control plane component.
While existing pods continue to run normally, without a functioning Scheduler:
- New pods will remain in Pending state indefinitely
- Pods cannot be assigned to nodes
- Deployments, StatefulSets, and DaemonSets cannot scale up
- Failed pods won’t be rescheduled to other nodes
- Any workload requiring pod creation or scheduling will be blocked
Immediate action is required if the Scheduler is actually down (not just a monitoring issue).
Troubleshooting steps
-
Verify Prometheus target status
- Command / Action:
- Check if Scheduler appears in Prometheus targets
- Access Prometheus UI at /targets
- Look for kube-scheduler targets
- Expected result:
- Target should be present and status should be “UP”
- additional info:
- If target is missing from discovery, this is a service discovery issue
- If target shows as “DOWN”, the endpoint is unreachable
- Only the leader scheduler exposes full metrics
- Command / Action:
-
Check Scheduler pod status (for kubeadm clusters)
- Command / Action:
- Verify Scheduler is running as a static pod
-
kubectl get pods -n kube-system | grep scheduler
-
kubectl get pods -n kube-system -l component=kube-scheduler
- Expected result:
- Scheduler pod should be in Running state
-
kube-scheduler-<node> 1/1 Running
- additional info:
- For kubeadm clusters, Scheduler runs as a static pod on control plane nodes
- In HA setups, multiple scheduler pods exist but only one is active (leader)
- Pod should be in Running state with 1/1 ready containers
- Command / Action:
-
Check Scheduler process (for systemd-managed clusters)
- Command / Action:
- Verify Scheduler service is active
-
systemctl status kube-scheduler
-
ps aux | grep kube-scheduler
- Expected result:
- Service should be active and running
- Process should be present
- additional info:
- For non-kubeadm clusters, Scheduler may run as a systemd service
- Check service status on control plane nodes
- Command / Action:
-
Inspect Scheduler logs
- Command / Action:
- Review logs for errors or crash information
- For static pod: >kubectl logs -n kube-system kube-scheduler-<node>
- For systemd: >journalctl -u kube-scheduler -n 100
- Expected result:
- No critical errors or crash logs
- Scheduler should be processing pod scheduling requests
- additional info:
- Look for authentication errors, API server connectivity issues, or crashes
- Check for leader election messages (only leader actively schedules)
- Look for “Successfully acquired lease” messages
- Command / Action:
-
Check for pending pods (indicates scheduler issues)
- Command / Action:
- List pods stuck in Pending state
-
kubectl get pods -A –field-selector=status.phase=Pending
-
kubectl describe pod <pending-pod> -n <namespace>
- Expected result:
- If scheduler is working, pending pods should have scheduling events
- No pods should be stuck in Pending due to scheduler failure
- additional info:
- Pods in Pending without any scheduler events indicate scheduler is not working
- Events should show “Successfully assigned” or scheduling failure reasons
- Pending due to insufficient resources is normal, not a scheduler issue
- Command / Action:
-
Verify metrics endpoint accessibility
- Command / Action:
- Test if metrics endpoint is reachable
-
kubectl get pods -n kube-system -l component=kube-scheduler -o wide
-
curl -k https://<scheduler-ip>:10259/metrics
- Expected result:
- Metrics endpoint should return Prometheus-formatted metrics
- HTTP 200 response with metric data
- additional info:
- Default metrics port is 10259 (secure) or 10251 (insecure, deprecated)
- May require certificates or tokens for authentication
- Check –bind-address and –secure-port flags
- Command / Action:
-
Check Scheduler configuration
- Command / Action:
- Review Scheduler startup configuration
- For static pod: >kubectl get pod -n kube-system kube-scheduler-<node> -o yaml
- Check manifest: >cat /etc/kubernetes/manifests/kube-scheduler.yaml
- Expected result:
- Metrics endpoint should be enabled and properly configured
- –bind-address should not be 127.0.0.1 (unless using host network)
- additional info:
- Ensure –authorization-mode and –authentication-kubeconfig are properly set
- Verify network mode allows external metric scraping
- Check –leader-elect flag is set to true for HA setups
- Command / Action:
-
Verify leader election status (for HA clusters)
- Command / Action:
- Check which scheduler is the leader
-
kubectl get lease -n kube-system kube-scheduler -o yaml
-
kubectl get endpoints -n kube-system kube-scheduler -o yaml
- Expected result:
- One scheduler should hold the leader lease
- Leader’s holderIdentity should be visible
- additional info:
- Only the leader actively schedules pods
- Non-leader schedulers are in standby mode
- Leader election uses leases in kube-system namespace
- Command / Action:
-
Verify ServiceMonitor configuration (for Prometheus Operator)
- Command / Action:
- Check ServiceMonitor for Scheduler
-
kubectl get servicemonitor -n kube-system
-
kubectl get servicemonitor -n monitoring -l app.kubernetes.io/name=kube-scheduler -o yaml
- Expected result:
- ServiceMonitor exists and targets correct endpoints
- Selector matches Scheduler service/endpoints
- additional info:
- ServiceMonitor configuration tells Prometheus how to discover Scheduler
- Check endpoint, port, and authentication settings
- Command / Action:
-
Check service and endpoints
- Command / Action:
- Verify service exists and has endpoints
-
kubectl get svc -n kube-system kube-scheduler
-
kubectl get endpoints -n kube-system kube-scheduler
- Expected result:
- Service should exist and have valid endpoints
- Endpoints should point to running Scheduler instances
- additional info:
- Missing endpoints indicate the service can’t find Scheduler pods
- May need to create service manually for static pods
- Command / Action:
-
Restart Scheduler (if not running)
- Command / Action:
- Restart Scheduler component
- For static pod: >kubectl delete pod -n kube-system kube-scheduler-<node>
- For systemd: >systemctl restart kube-scheduler
- Expected result:
- Scheduler starts successfully
- Pending pods begin to be scheduled
- Prometheus begins scraping metrics again
- additional info:
- Static pods automatically restart when deleted
- Verify pod comes back in Running state
- Monitor for successful metric collection in Prometheus
- Check that pending pods are now being scheduled
- Command / Action:
-
Verify network policies and firewall rules
- Command / Action:
- Check for network policies blocking metric scraping
-
kubectl get networkpolicies -n kube-system
- Check firewall rules on control plane nodes
- Expected result:
- No policies blocking traffic from Prometheus to Scheduler
- Port 10259 (or 10251) should be accessible
- additional info:
- Network policies may prevent Prometheus from reaching the metrics endpoint
- Verify Prometheus can reach the Scheduler pod/host network
- Command / Action:
Additional resources
- Kubernetes Scheduler documentation
- Kubernetes Control Plane Components
- Kubernetes Scheduler Configuration
- Leader Election in Kubernetes
- Related alert: KubeControllerManagerAbsent
- Related alert: KubeProxyAbsent
- Related alert: TargetDown