KubeSchedulerAbsent

Description

Prometheus target discovery has not found the Kubernetes Scheduler in the past 15 minutes.

The Kube Scheduler is a critical control plane component responsible for assigning newly created pods to nodes based on resource requirements, constraints, affinity/anti-affinity rules, and other scheduling policies. When absent from monitoring, no new metrics are being collected from this component.

Possible Causes:

Kube Scheduler process has crashed or is not running
Scheduler pod/container has been terminated or evicted
Metrics endpoint is unreachable or misconfigured
Network connectivity issues between Prometheus and Scheduler
Service discovery configuration error in Prometheus
Scheduler running without metrics endpoint enabled
Firewall or network policy blocking metric scraping
Certificate or authentication issues preventing metric collection
Scheduler running on a node that is down or unreachable
Leader election issues (only the leader exposes full metrics)

Severity estimation

Critical severity - The Scheduler is a core control plane component.

While existing pods continue to run normally, without a functioning Scheduler:

New pods will remain in Pending state indefinitely
Pods cannot be assigned to nodes
Deployments, StatefulSets, and DaemonSets cannot scale up
Failed pods won’t be rescheduled to other nodes
Any workload requiring pod creation or scheduling will be blocked

Immediate action is required if the Scheduler is actually down (not just a monitoring issue).

Troubleshooting steps

Verify Prometheus target status
- Command / Action:
  - Check if Scheduler appears in Prometheus targets
  - Access Prometheus UI at /targets
  - Look for kube-scheduler targets
- Expected result:
  - Target should be present and status should be “UP”
- additional info:
  - If target is missing from discovery, this is a service discovery issue
  - If target shows as “DOWN”, the endpoint is unreachable
  - Only the leader scheduler exposes full metrics

Check Scheduler pod status (for kubeadm clusters)
- Command / Action:
  - Verify Scheduler is running as a static pod
  - kubectl get pods -n kube-system | grep scheduler
  - kubectl get pods -n kube-system -l component=kube-scheduler
- Expected result:
  - Scheduler pod should be in Running state
  - kube-scheduler-<node> 1/1 Running
- additional info:
  - For kubeadm clusters, Scheduler runs as a static pod on control plane nodes
  - In HA setups, multiple scheduler pods exist but only one is active (leader)
  - Pod should be in Running state with 1/1 ready containers

Check Scheduler process (for systemd-managed clusters)
- Command / Action:
  - Verify Scheduler service is active
  - systemctl status kube-scheduler
  - ps aux | grep kube-scheduler
- Expected result:
  - Service should be active and running
  - Process should be present
- additional info:
  - For non-kubeadm clusters, Scheduler may run as a systemd service
  - Check service status on control plane nodes

Inspect Scheduler logs
- Command / Action:
  - Review logs for errors or crash information
  - For static pod: >kubectl logs -n kube-system kube-scheduler-<node>
  - For systemd: >journalctl -u kube-scheduler -n 100
- Expected result:
  - No critical errors or crash logs
  - Scheduler should be processing pod scheduling requests
- additional info:
  - Look for authentication errors, API server connectivity issues, or crashes
  - Check for leader election messages (only leader actively schedules)
  - Look for “Successfully acquired lease” messages

Check for pending pods (indicates scheduler issues)
- Command / Action:
  - List pods stuck in Pending state
  - kubectl get pods -A –field-selector=status.phase=Pending
  - kubectl describe pod <pending-pod> -n <namespace>
- Expected result:
  - If scheduler is working, pending pods should have scheduling events
  - No pods should be stuck in Pending due to scheduler failure
- additional info:
  - Pods in Pending without any scheduler events indicate scheduler is not working
  - Events should show “Successfully assigned” or scheduling failure reasons
  - Pending due to insufficient resources is normal, not a scheduler issue

Verify metrics endpoint accessibility
- Command / Action:
  - Test if metrics endpoint is reachable
  - kubectl get pods -n kube-system -l component=kube-scheduler -o wide
  - curl -k https://<scheduler-ip>:10259/metrics
- Expected result:
  - Metrics endpoint should return Prometheus-formatted metrics
  - HTTP 200 response with metric data
- additional info:
  - Default metrics port is 10259 (secure) or 10251 (insecure, deprecated)
  - May require certificates or tokens for authentication
  - Check –bind-address and –secure-port flags

Check Scheduler configuration
- Command / Action:
  - Review Scheduler startup configuration
  - For static pod: >kubectl get pod -n kube-system kube-scheduler-<node> -o yaml
  - Check manifest: >cat /etc/kubernetes/manifests/kube-scheduler.yaml
- Expected result:
  - Metrics endpoint should be enabled and properly configured
  - –bind-address should not be 127.0.0.1 (unless using host network)
- additional info:
  - Ensure –authorization-mode and –authentication-kubeconfig are properly set
  - Verify network mode allows external metric scraping
  - Check –leader-elect flag is set to true for HA setups

Verify leader election status (for HA clusters)
- Command / Action:
  - Check which scheduler is the leader
  - kubectl get lease -n kube-system kube-scheduler -o yaml
  - kubectl get endpoints -n kube-system kube-scheduler -o yaml
- Expected result:
  - One scheduler should hold the leader lease
  - Leader’s holderIdentity should be visible
- additional info:
  - Only the leader actively schedules pods
  - Non-leader schedulers are in standby mode
  - Leader election uses leases in kube-system namespace

Verify ServiceMonitor configuration (for Prometheus Operator)
- Command / Action:
  - Check ServiceMonitor for Scheduler
  - kubectl get servicemonitor -n kube-system
  - kubectl get servicemonitor -n monitoring -l app.kubernetes.io/name=kube-scheduler -o yaml
- Expected result:
  - ServiceMonitor exists and targets correct endpoints
  - Selector matches Scheduler service/endpoints
- additional info:
  - ServiceMonitor configuration tells Prometheus how to discover Scheduler
  - Check endpoint, port, and authentication settings

Check service and endpoints
- Command / Action:
  - Verify service exists and has endpoints
  - kubectl get svc -n kube-system kube-scheduler
  - kubectl get endpoints -n kube-system kube-scheduler
- Expected result:
  - Service should exist and have valid endpoints
  - Endpoints should point to running Scheduler instances
- additional info:
  - Missing endpoints indicate the service can’t find Scheduler pods
  - May need to create service manually for static pods

Restart Scheduler (if not running)
- Command / Action:
  - Restart Scheduler component
  - For static pod: >kubectl delete pod -n kube-system kube-scheduler-<node>
  - For systemd: >systemctl restart kube-scheduler
- Expected result:
  - Scheduler starts successfully
  - Pending pods begin to be scheduled
  - Prometheus begins scraping metrics again
- additional info:
  - Static pods automatically restart when deleted
  - Verify pod comes back in Running state
  - Monitor for successful metric collection in Prometheus
  - Check that pending pods are now being scheduled

Verify network policies and firewall rules
- Command / Action:
  - Check for network policies blocking metric scraping
  - kubectl get networkpolicies -n kube-system
  - Check firewall rules on control plane nodes
- Expected result:
  - No policies blocking traffic from Prometheus to Scheduler
  - Port 10259 (or 10251) should be accessible
- additional info:
  - Network policies may prevent Prometheus from reaching the metrics endpoint
  - Verify Prometheus can reach the Scheduler pod/host network

Additional resources

Kubernetes Scheduler documentation
Kubernetes Control Plane Components
Kubernetes Scheduler Configuration
Leader Election in Kubernetes
Related alert: KubeControllerManagerAbsent
Related alert: KubeProxyAbsent
Related alert: TargetDown