KubePersistentVolumeErrors
KubePersistentVolumeErrors
Description
This alert fires when one or more PersistentVolumes (PVs) are in a Failed or Pending phase, indicating that the volume cannot be provisioned, bound, or mounted correctly.
Pods depending on the affected PV will be unable to start or will crash, potentially causing data unavailability, service outages, or data loss if the volume backs a stateful workload.
Possible Causes:
- StorageClass provisioner is unavailable or misconfigured
- Underlying storage backend is unreachable (NFS server down, cloud disk API errors, etc.)
- PersistentVolumeClaim (PVC) and PV have incompatible access modes or storage class
- PV was manually deleted or reclaimed while a PVC was still bound to it
- Insufficient capacity in the storage backend to provision the requested volume
- Node where the volume is attached is unreachable or being drained
- CSI driver pod is not running or has errors
- Volume attachment stuck after a node failure or restart
Severity estimation
High severity — stateful workloads relying on the affected PV are likely failing:
- Medium: Non-critical workload affected; service degraded but not fully down
- High: Production stateful service (database, message queue, cache) cannot start or access data
- Critical: Data-bearing volumes in a failed state with risk of data loss or corruption
Severity increases with:
- Number of PVs in error state
- Criticality of the workloads depending on the affected volumes
- Duration of the error state
- Whether the volume holds persistent data vs. ephemeral state
Troubleshooting steps
-
Identify the PVs in error state
- Command / Action:
- List all PVs and filter for those in Failed or Pending phase
-
kubectl get pv –all-namespaces | grep -E ‘Failed|Pending’
- Expected result:
- A list of affected PV names, their status, and associated claim
- additional info:
- Note the
CLAIMcolumn to identify which namespace and PVC is affected
- Note the
- Command / Action:
-
Describe the affected PV for error details
- Command / Action:
- Inspect the PV object for events and status conditions
-
kubectl describe pv <pv-name>
- Expected result:
- Events section shows the specific error message (e.g., provisioning failure, attachment error, backend unreachable)
- additional info:
- The
ReasonandMessagefields in events pinpoint the root cause
- The
- Command / Action:
-
Check the associated PVC status
- Command / Action:
- Inspect the PVC bound to the failing PV
-
kubectl describe pvc <pvc-name> -n <namespace>
- Expected result:
- PVC status is
Bound; ifPendingorLost, the binding itself has failed
- PVC status is
- additional info:
- A PVC in
Loststate means the underlying PV was deleted or became unavailable
- A PVC in
- Command / Action:
-
Check the CSI driver or provisioner pod
- Command / Action:
- Verify the storage provisioner is running and healthy
-
kubectl get pods -n kube-system | grep -E ‘csi|provisioner|nfs’
-
kubectl logs <provisioner-pod> -n kube-system –tail=50
- Expected result:
- Provisioner pod is Running with no error logs
- additional info:
- A crashed or restarting provisioner pod will prevent new PV provisioning and may affect existing volumes
- Command / Action:
-
Check events in the affected namespace
- Command / Action:
- Look for volume-related warning events in the namespace
-
kubectl get events -n <namespace> –sort-by=’.lastTimestamp’ | grep -iE ‘volume|pvc|pv|mount|attach’
- Expected result:
- Events reveal whether the issue is at provisioning, binding, or mounting stage
- additional info:
- Mount errors often indicate a node-level issue; provisioning errors point to the storage backend or StorageClass
- Command / Action:
-
Check pods that depend on the affected PVC
- Command / Action:
- Find pods stuck in Pending or ContainerCreating due to volume errors
-
kubectl get pods -n <namespace> | grep -vE ‘Running|Completed’
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- Pod events show
Unable to attach or mount volumesor similar messages linking back to the PV error
- Pod events show
- additional info:
- If the pod is stuck on a specific node, the issue may be a node-level volume attachment problem
- Command / Action:
-
Verify the storage backend is reachable
- Command / Action:
- Confirm the underlying storage system (NFS, EBS, GCE PD, Azure Disk, Ceph, etc.) is accessible
- For cloud volumes, check the cloud provider console or CLI for disk/volume status
- For NFS: verify the NFS server is reachable from cluster nodes
- Expected result:
- The storage backend is online and responding normally
- additional info:
- Backend outages require resolution at the infrastructure level before Kubernetes can recover the PV
- Command / Action:
-
Force-detach a stuck volume if the node is gone
- Command / Action:
- If a volume is stuck attached to an unreachable node, force-detach via the cloud provider CLI or Kubernetes
-
kubectl get volumeattachment
-
kubectl delete volumeattachment <attachment-name>
- Expected result:
- The volume detaches from the old node and can be reattached to the new one
- additional info:
- Only do this if the original node is confirmed unreachable or terminated; force-detaching from a live node risks data corruption
- Command / Action:
-
Recreate the PV or PVC if in a permanently failed state
- Command / Action:
- If the PV cannot recover, back up data (if accessible), delete the PV/PVC, and recreate
-
kubectl delete pvc <pvc-name> -n <namespace>
-
kubectl delete pv <pv-name>
- Expected result:
- A new PV is provisioned and bound to a new PVC; workloads resume
- additional info:
- Ensure
ReclaimPolicyis set toRetainif you need to preserve the underlying data before deleting
- Ensure
- Command / Action:
-
Confirm recovery and workload restart
- Command / Action:
- Verify PV is back to
Boundstate and dependent pods are running -
kubectl get pv
-
kubectl get pods -n <namespace>
- Verify PV is back to
- Expected result:
- PV phase is
Bound; all dependent pods are inRunningstate
- PV phase is
- additional info:
- If pods do not restart automatically, delete them to trigger a fresh scheduling cycle:
kubectl delete pod <pod-name> -n <namespace>
- If pods do not restart automatically, delete them to trigger a fresh scheduling cycle:
- Command / Action:
Additional resources
- Kubernetes Persistent Volumes
- Troubleshooting PVCs
- CSI Driver troubleshooting
- Kubernetes Storage Classes
- Related alert: KubePersistentVolumeFillingUp