KubeVersionMismatch
KubeVersionMismatch
Description
This alert fires when Kubernetes cluster components are running different semantic versions — typically control plane nodes (API server, controller-manager, scheduler) and worker nodes are on mismatched versions.
This most commonly occurs during a rolling cluster upgrade and is expected to be transient. However, if the mismatch persists after an upgrade completes, it indicates that some nodes or components were not successfully updated, which can cause API incompatibilities, unexpected behavior, or feature degradation.
Possible Causes:
- Cluster upgrade in progress (expected, transient state)
- Upgrade partially failed — some nodes did not complete the update
- Node stuck in a cordon/drain loop preventing the upgrade from proceeding
- Manual node addition with a different Kubernetes version
- RKE2/K3s or other distribution upgrade applied to control plane but not all worker nodes
- Node unreachable during upgrade, causing it to be skipped
Severity estimation
Low to Medium severity depending on the duration and degree of mismatch:
- Low: Mismatch is recent and the cluster is actively being upgraded; expected transient state
- Medium: Upgrade completed but mismatch persists; some nodes were not updated
- High: Major version mismatch between control plane and workers; API incompatibilities may affect workloads
- Critical: Components more than one minor version apart, violating Kubernetes version skew policy; cluster stability at risk
Kubernetes version skew policy allows worker nodes to be up to 2 minor versions behind the control plane. Exceeding this is unsupported.
Troubleshooting steps
-
Check current versions across all nodes
- Command / Action:
- List all nodes and their Kubernetes versions
-
kubectl get nodes
- Expected result:
- All nodes show the same version in the
VERSIONcolumn -
NAME STATUS ROLES AGE VERSION -
elk-cp-eu-hc-fsn1-01 Ready control-plane,etcd,master 476d v1.33.1+rke2r1 -
elk-alloy-eu-hc-fsn1-01 Ready worker 476d v1.33.1+rke2r1 -
elk-warm-eu-hz-hel1-07 Ready worker 376d v1.33.1+rke2r1
- All nodes show the same version in the
- additional info:
- Identify which nodes are on the old version — these are the ones that need to be upgraded
- Command / Action:
-
Check if an upgrade is currently in progress
- Command / Action:
- Verify whether a cluster upgrade is actively running in Rancher or the cluster management tool
- Check Rancher UI: Cluster → Overview → Kubernetes Version
- Expected result:
- If an upgrade is in progress, the mismatch is expected and will resolve automatically
- additional info:
- Do not intervene if an upgrade is actively running; monitor until it completes
- Command / Action:
-
Identify nodes that failed to upgrade
- Command / Action:
- Compare node versions to find those still on the old version
-
kubectl get nodes -o custom-columns=“NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,STATUS:.status.conditions[-1].type”
- Expected result:
- Nodes on the old version are clearly listed and distinguishable from updated ones
- additional info:
- Also check if any nodes are in
NotReadyorSchedulingDisabledstate, which may have caused the upgrade to stall
- Also check if any nodes are in
- Command / Action:
-
Check for nodes stuck in cordon or drain
- Command / Action:
- Look for nodes that are cordoned and may be blocking the rolling upgrade
-
kubectl get nodes | grep SchedulingDisabled
-
kubectl describe node <node-name> | grep -A5 Taints
- Expected result:
- No nodes are unexpectedly cordoned after the upgrade completes
- additional info:
- If a node is stuck cordoned after the upgrade, uncordon it:
kubectl uncordon <node-name>
- If a node is stuck cordoned after the upgrade, uncordon it:
- Command / Action:
-
Trigger upgrade of remaining nodes via Rancher
- Command / Action:
- In Rancher, navigate to the cluster and trigger the upgrade for nodes still on the old version
- Rancher UI: Cluster → Nodes → select outdated node → Edit → update Kubernetes version
- Expected result:
- The node is upgraded to match the control plane version; the alert clears
- additional info:
- For RKE2 clusters, node upgrades are managed through the
system-upgrade-controlleror directly via Rancher
- For RKE2 clusters, node upgrades are managed through the
- Command / Action:
-
Verify all components are on the same version after upgrade
- Command / Action:
- Confirm all nodes now report the same version
-
kubectl get nodes
- Expected result:
- All nodes show the same version; alert clears within the next evaluation cycle
- additional info:
- Also verify control plane components:
kubectl get pods -n kube-system -o wideto confirm apiserver, scheduler, and controller-manager versions
- Also verify control plane components:
- Command / Action:
Additional resources
- Kubernetes version skew policy
- Kubernetes cluster upgrades
- Related alert: KubeClientErrors