Alert Runbooks

KubeAggregatedAPIDown

KubeAggregatedAPIDown

Description

This alert fires when a Kubernetes aggregated API is unavailable and reports Available=False in its APIService status.
Aggregated APIs are served by extension API servers (such as metrics.k8s.io or custom APIs). When they are down, dependent Kubernetes features stop working.

Commonly impacted components:


Possible Causes:


Severity estimation

Medium to High severity, depending on API importance.

Severity increases with:


Troubleshooting steps

  1. Identify unavailable aggregated APIs

    • Command / Action:
      • List all APIService objects
      • kubectl get apiservice

    • Expected result:
      • AVAILABLE is True
    • additional info:
      • Any False value indicates an unavailable aggregated API

  1. Describe the failing APIService

    • Command / Action:
      • Inspect conditions and errors
      • kubectl describe apiservice

    • Expected result:
      • Condition Available=True
    • additional info:
      • TLS, service, or permission errors are usually reported here

  1. Check backing Service and Endpoints

    • Command / Action:
      • Verify Service and Endpoints exist
      • kubectl get svc -n

      • kubectl get endpoints -n

    • Expected result:
      • Endpoints list one or more ready pods
    • additional info:
      • No endpoints means kube-apiserver cannot reach the API

  1. Inspect aggregated API pods

    • Command / Action:
      • Check pod status
      • kubectl get pods -n

    • Expected result:
      • Pods are Running and Ready
    • additional info:
      • CrashLoopBackOff or Pending blocks API availability

  1. Check pod logs

    • Command / Action:
      • Review API server logs
      • kubectl logs -n

    • Expected result:
      • API server starts without fatal errors
    • additional info:
      • Certificate and RBAC errors are common causes

  1. Verify TLS configuration

    • Command / Action:
      • Inspect CA bundle configuration
      • kubectl get apiservice -o yaml

    • Expected result:
      • caBundle is present and valid
    • additional info:
      • Invalid or expired certs cause API unavailability

  1. Check node and network health

    • Command / Action:
      • Verify node readiness
      • kubectl get nodes

    • Expected result:
      • Nodes are Ready
    • additional info:
      • Network or node issues affect aggregated APIs

  1. Restart or redeploy aggregated API

    • Command / Action:
      • Restart API server deployment
      • kubectl rollout restart deployment -n

    • Expected result:
      • APIService becomes Available=True
    • additional info:
      • Fix root cause before restarting to avoid loops

Additional resources