Alert Runbooks

KubeVersionMismatch

KubeVersionMismatch

Description

This alert fires when Kubernetes cluster components are running different semantic versions — typically control plane nodes (API server, controller-manager, scheduler) and worker nodes are on mismatched versions.

This most commonly occurs during a rolling cluster upgrade and is expected to be transient. However, if the mismatch persists after an upgrade completes, it indicates that some nodes or components were not successfully updated, which can cause API incompatibilities, unexpected behavior, or feature degradation.


Possible Causes:


Severity estimation

Low to Medium severity depending on the duration and degree of mismatch:

Kubernetes version skew policy allows worker nodes to be up to 2 minor versions behind the control plane. Exceeding this is unsupported.


Troubleshooting steps

  1. Check current versions across all nodes

    • Command / Action:
      • List all nodes and their Kubernetes versions
      • kubectl get nodes

    • Expected result:
      • All nodes show the same version in the VERSION column
      • NAME                         STATUS   ROLES                       AGE    VERSION
        
      • elk-cp-eu-hc-fsn1-01         Ready    control-plane,etcd,master   476d   v1.33.1+rke2r1
        
      • elk-alloy-eu-hc-fsn1-01      Ready    worker                      476d   v1.33.1+rke2r1
        
      • elk-warm-eu-hz-hel1-07       Ready    worker                      376d   v1.33.1+rke2r1
        
    • additional info:
      • Identify which nodes are on the old version — these are the ones that need to be upgraded

  1. Check if an upgrade is currently in progress

    • Command / Action:
      • Verify whether a cluster upgrade is actively running in Rancher or the cluster management tool
      • Check Rancher UI: Cluster → Overview → Kubernetes Version
    • Expected result:
      • If an upgrade is in progress, the mismatch is expected and will resolve automatically
    • additional info:
      • Do not intervene if an upgrade is actively running; monitor until it completes

  1. Identify nodes that failed to upgrade

    • Command / Action:
      • Compare node versions to find those still on the old version
      • kubectl get nodes -o custom-columns=“NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,STATUS:.status.conditions[-1].type”

    • Expected result:
      • Nodes on the old version are clearly listed and distinguishable from updated ones
    • additional info:
      • Also check if any nodes are in NotReady or SchedulingDisabled state, which may have caused the upgrade to stall

  1. Check for nodes stuck in cordon or drain

    • Command / Action:
      • Look for nodes that are cordoned and may be blocking the rolling upgrade
      • kubectl get nodes | grep SchedulingDisabled

      • kubectl describe node <node-name> | grep -A5 Taints

    • Expected result:
      • No nodes are unexpectedly cordoned after the upgrade completes
    • additional info:
      • If a node is stuck cordoned after the upgrade, uncordon it: kubectl uncordon <node-name>

  1. Trigger upgrade of remaining nodes via Rancher

    • Command / Action:
      • In Rancher, navigate to the cluster and trigger the upgrade for nodes still on the old version
      • Rancher UI: Cluster → Nodes → select outdated node → Edit → update Kubernetes version
    • Expected result:
      • The node is upgraded to match the control plane version; the alert clears
    • additional info:
      • For RKE2 clusters, node upgrades are managed through the system-upgrade-controller or directly via Rancher

  1. Verify all components are on the same version after upgrade

    • Command / Action:
      • Confirm all nodes now report the same version
      • kubectl get nodes

    • Expected result:
      • All nodes show the same version; alert clears within the next evaluation cycle
    • additional info:
      • Also verify control plane components: kubectl get pods -n kube-system -o wide to confirm apiserver, scheduler, and controller-manager versions

Additional resources