KubeControllerManagerAbsent

Description

Prometheus target discovery has not found the Kubernetes Controller Manager in the past 15 minutes.

The Kube Controller Manager is a critical control plane component that runs core controller loops responsible for managing the state of the cluster. It handles replication, namespace lifecycle, service accounts, node management, and many other essential cluster functions. When absent, no new metrics are being collected from this component.

Possible Causes:

Kube Controller Manager process has crashed or is not running
Controller Manager pod/container has been terminated or evicted
Metrics endpoint is unreachable or misconfigured
Network connectivity issues between Prometheus and Controller Manager
Service discovery configuration error in Prometheus
Controller Manager running without metrics endpoint enabled
Firewall or network policy blocking metric scraping
Certificate or authentication issues preventing metric collection
Controller Manager running on a node that is down or unreachable

Severity estimation

Critical severity - The Controller Manager is a core control plane component.

While existing workloads continue to run, the cluster cannot properly manage state changes without the Controller Manager:

New pods may not be created when replica counts change
Node failures won’t trigger pod rescheduling
Garbage collection won’t occur
Service account tokens won’t be created
ReplicaSets, Deployments, DaemonSets won’t be reconciled

Immediate action is required if the Controller Manager is actually down (not just a monitoring issue).

Troubleshooting steps

Verify Prometheus target status
- Command / Action:
  - Check if Controller Manager appears in Prometheus targets
  - Access Prometheus UI at /targets
  - Look for kube-controller-manager targets
- Expected result:
  - Target should be present and status should be “UP”
- additional info:
  - If target is missing from discovery, this is a service discovery issue
  - If target shows as “DOWN”, the endpoint is unreachable

Check Controller Manager pod status (for kubeadm clusters)
- Command / Action:
  - Verify Controller Manager is running as a static pod
  - kubectl get pods -n kube-system | grep controller-manager
  - kubectl get pods -n kube-system -l component=kube-controller-manager
- Expected result:
  - Controller Manager pod should be in Running state
  - kube-controller-manager-<node> 1/1 Running
- additional info:
  - For kubeadm clusters, Controller Manager runs as a static pod on control plane nodes
  - Pod should be in Running state with 1/1 ready containers

Check Controller Manager process (for systemd-managed clusters)
- Command / Action:
  - Verify Controller Manager service is active
  - systemctl status kube-controller-manager
  - ps aux | grep kube-controller-manager
- Expected result:
  - Service should be active and running
  - Process should be present
- additional info:
  - For non-kubeadm clusters, Controller Manager may run as a systemd service
  - Check service status on control plane nodes

Inspect Controller Manager logs
- Command / Action:
  - Review logs for errors or crash information
  - For static pod: >kubectl logs -n kube-system kube-controller-manager-<node>
  - For systemd: >journalctl -u kube-controller-manager -n 100
- Expected result:
  - No critical errors or crash logs
  - Controller loops should be running normally
- additional info:
  - Look for authentication errors, API server connectivity issues, or crashes
  - Check for leader election messages (Controller Manager uses leader election)

Verify metrics endpoint accessibility
- Command / Action:
  - Test if metrics endpoint is reachable
  - kubectl get pods -n kube-system -l component=kube-controller-manager -o wide
  - curl -k https://<controller-manager-ip>:10257/metrics
- Expected result:
  - Metrics endpoint should return Prometheus-formatted metrics
  - HTTP 200 response with metric data
- additional info:
  - Default metrics port is 10257 (secure) or 10252 (insecure, deprecated)
  - May require certificates or tokens for authentication
  - Check –bind-address and –secure-port flags

Check Controller Manager configuration
- Command / Action:
  - Review Controller Manager startup configuration
  - For static pod: >kubectl get pod -n kube-system kube-controller-manager-<node> -o yaml
  - Check manifest: >cat /etc/kubernetes/manifests/kube-controller-manager.yaml
- Expected result:
  - Metrics endpoint should be enabled and properly configured
  - –bind-address should not be 127.0.0.1 (unless using host network)
- additional info:
  - Ensure –authorization-mode and –authentication-kubeconfig are properly set
  - Verify network mode allows external metric scraping

Verify Service Monitor configuration (for Prometheus Operator)
- Command / Action:
  - Check ServiceMonitor for Controller Manager
  - kubectl get servicemonitor -n kube-system
  - kubectl get servicemonitor -n monitoring -l app.kubernetes.io/name=kube-controller-manager -o yaml
- Expected result:
  - ServiceMonitor exists and targets correct endpoints
  - Selector matches Controller Manager service/endpoints
- additional info:
  - ServiceMonitor configuration tells Prometheus how to discover Controller Manager
  - Check endpoint, port, and authentication settings

Check service and endpoints
- Command / Action:
  - Verify service exists and has endpoints
  - kubectl get svc -n kube-system kube-controller-manager
  - kubectl get endpoints -n kube-system kube-controller-manager
- Expected result:
  - Service should exist and have valid endpoints
  - Endpoints should point to running Controller Manager instances
- additional info:
  - Missing endpoints indicate the service can’t find Controller Manager pods
  - May need to create service manually for static pods

Restart Controller Manager (if not running)
- Command / Action:
  - Restart Controller Manager component
  - For static pod: >kubectl delete pod -n kube-system kube-controller-manager-<node>
  - For systemd: >systemctl restart kube-controller-manager
- Expected result:
  - Controller Manager starts successfully
  - Prometheus begins scraping metrics again
- additional info:
  - Static pods automatically restart when deleted
  - Verify pod comes back in Running state
  - Monitor for successful metric collection in Prometheus

Verify network policies and firewall rules
- Command / Action:
  - Check for network policies blocking metric scraping
  - kubectl get networkpolicies -n kube-system
  - Check firewall rules on control plane nodes
- Expected result:
  - No policies blocking traffic from Prometheus to Controller Manager
  - Port 10257 (or 10252) should be accessible
- additional info:
  - Network policies may prevent Prometheus from reaching the metrics endpoint
  - Verify Prometheus can reach the Controller Manager pod/host network

Additional resources

Kubernetes Controller Manager documentation
Kubernetes Control Plane Components
Kubernetes Monitoring Architecture
Related alert: KubeSchedulerAbsent
Related alert: KubeProxyAbsent
Related alert: TargetDown