Alert Runbooks

KubeControllerManagerAbsent

KubeControllerManagerAbsent

Description

Prometheus target discovery has not found the Kubernetes Controller Manager in the past 15 minutes.

The Kube Controller Manager is a critical control plane component that runs core controller loops responsible for managing the state of the cluster. It handles replication, namespace lifecycle, service accounts, node management, and many other essential cluster functions. When absent, no new metrics are being collected from this component.


Possible Causes:


Severity estimation

Critical severity - The Controller Manager is a core control plane component.

While existing workloads continue to run, the cluster cannot properly manage state changes without the Controller Manager:

Immediate action is required if the Controller Manager is actually down (not just a monitoring issue).


Troubleshooting steps

  1. Verify Prometheus target status

    • Command / Action:
      • Check if Controller Manager appears in Prometheus targets
      • Access Prometheus UI at /targets
      • Look for kube-controller-manager targets
    • Expected result:
      • Target should be present and status should be “UP”
    • additional info:
      • If target is missing from discovery, this is a service discovery issue
      • If target shows as “DOWN”, the endpoint is unreachable

  1. Check Controller Manager pod status (for kubeadm clusters)

    • Command / Action:
      • Verify Controller Manager is running as a static pod
      • kubectl get pods -n kube-system | grep controller-manager

      • kubectl get pods -n kube-system -l component=kube-controller-manager

    • Expected result:
      • Controller Manager pod should be in Running state
      • kube-controller-manager-<node> 1/1 Running

    • additional info:
      • For kubeadm clusters, Controller Manager runs as a static pod on control plane nodes
      • Pod should be in Running state with 1/1 ready containers

  1. Check Controller Manager process (for systemd-managed clusters)

    • Command / Action:
      • Verify Controller Manager service is active
      • systemctl status kube-controller-manager

      • ps aux | grep kube-controller-manager

    • Expected result:
      • Service should be active and running
      • Process should be present
    • additional info:
      • For non-kubeadm clusters, Controller Manager may run as a systemd service
      • Check service status on control plane nodes

  1. Inspect Controller Manager logs

    • Command / Action:
      • Review logs for errors or crash information
      • For static pod: >kubectl logs -n kube-system kube-controller-manager-<node>
      • For systemd: >journalctl -u kube-controller-manager -n 100
    • Expected result:
      • No critical errors or crash logs
      • Controller loops should be running normally
    • additional info:
      • Look for authentication errors, API server connectivity issues, or crashes
      • Check for leader election messages (Controller Manager uses leader election)

  1. Verify metrics endpoint accessibility

    • Command / Action:
      • Test if metrics endpoint is reachable
      • kubectl get pods -n kube-system -l component=kube-controller-manager -o wide

      • curl -k https://<controller-manager-ip>:10257/metrics

    • Expected result:
      • Metrics endpoint should return Prometheus-formatted metrics
      • HTTP 200 response with metric data
    • additional info:
      • Default metrics port is 10257 (secure) or 10252 (insecure, deprecated)
      • May require certificates or tokens for authentication
      • Check –bind-address and –secure-port flags

  1. Check Controller Manager configuration

    • Command / Action:
      • Review Controller Manager startup configuration
      • For static pod: >kubectl get pod -n kube-system kube-controller-manager-<node> -o yaml
      • Check manifest: >cat /etc/kubernetes/manifests/kube-controller-manager.yaml
    • Expected result:
      • Metrics endpoint should be enabled and properly configured
      • –bind-address should not be 127.0.0.1 (unless using host network)
    • additional info:
      • Ensure –authorization-mode and –authentication-kubeconfig are properly set
      • Verify network mode allows external metric scraping

  1. Verify Service Monitor configuration (for Prometheus Operator)

    • Command / Action:
      • Check ServiceMonitor for Controller Manager
      • kubectl get servicemonitor -n kube-system

      • kubectl get servicemonitor -n monitoring -l app.kubernetes.io/name=kube-controller-manager -o yaml

    • Expected result:
      • ServiceMonitor exists and targets correct endpoints
      • Selector matches Controller Manager service/endpoints
    • additional info:
      • ServiceMonitor configuration tells Prometheus how to discover Controller Manager
      • Check endpoint, port, and authentication settings

  1. Check service and endpoints

    • Command / Action:
      • Verify service exists and has endpoints
      • kubectl get svc -n kube-system kube-controller-manager

      • kubectl get endpoints -n kube-system kube-controller-manager

    • Expected result:
      • Service should exist and have valid endpoints
      • Endpoints should point to running Controller Manager instances
    • additional info:
      • Missing endpoints indicate the service can’t find Controller Manager pods
      • May need to create service manually for static pods

  1. Restart Controller Manager (if not running)

    • Command / Action:
      • Restart Controller Manager component
      • For static pod: >kubectl delete pod -n kube-system kube-controller-manager-<node>
      • For systemd: >systemctl restart kube-controller-manager
    • Expected result:
      • Controller Manager starts successfully
      • Prometheus begins scraping metrics again
    • additional info:
      • Static pods automatically restart when deleted
      • Verify pod comes back in Running state
      • Monitor for successful metric collection in Prometheus

  1. Verify network policies and firewall rules

    • Command / Action:
      • Check for network policies blocking metric scraping
      • kubectl get networkpolicies -n kube-system

      • Check firewall rules on control plane nodes
    • Expected result:
      • No policies blocking traffic from Prometheus to Controller Manager
      • Port 10257 (or 10252) should be accessible
    • additional info:
      • Network policies may prevent Prometheus from reaching the metrics endpoint
      • Verify Prometheus can reach the Controller Manager pod/host network

Additional resources