KubePersistentVolumeFillingUp
KubePersistentVolumeFillingUp
Description
This alert fires when a PersistentVolume (PV) is running low on available disk space, typically above 85% utilization. If the volume fills up completely, the workload using it will likely crash or become read-only, potentially causing data loss, application errors, or service outages.
Possible Causes:
- Continuous log or data accumulation without a retention or cleanup policy
- Database growth exceeding the initial volume size estimate
- Application bug causing unbounded file or data growth
- Accumulated temporary files, crash dumps, or core files
- Misconfigured backup or snapshot retention writing data back to the volume
- Insufficient initial volume size for the actual workload requirements
- Missing or misconfigured log rotation
Severity estimation
Medium to High severity, depending on fill rate and remaining space:
- Medium: Volume at 85–90%; workload is still operational but action is required
- High: Volume at 90–95%; imminent risk of the volume filling completely
- Critical: Volume above 95% or filling rapidly; workload failure is imminent
Severity increases with:
- How quickly the volume is filling (check the rate of growth)
- Criticality of the workload depending on the volume (database, message queue, etc.)
- Whether the volume is already causing write errors
Troubleshooting steps
-
Identify the affected PVC and namespace
- Command / Action:
- Check alert labels for the PVC name and namespace, then confirm current usage
-
kubectl get pvc -n <namespace>
-
kubectl describe pvc <pvc-name> -n <namespace>
- Expected result:
- The affected PVC is identified and its bound PV is confirmed
- additional info:
- Note the StorageClass — it determines whether the volume can be expanded online
- Command / Action:
-
Check actual disk usage inside the pod
- Command / Action:
- Exec into the pod using the volume and check disk usage
-
kubectl exec -it <pod-name> -n <namespace> – df -h
-
kubectl exec -it <pod-name> -n <namespace> – du -sh /<mount-path>/*
- Expected result:
- The mount path shows high utilization; the
duoutput identifies which directories are largest
- The mount path shows high utilization; the
- additional info:
- Replace
/<mount-path>with the actual volume mount path from the pod spec
- Replace
- Command / Action:
-
Identify what is consuming the most space
- Command / Action:
- Find the largest files and directories on the volume
-
kubectl exec -it <pod-name> -n <namespace> – du -sh /<mount-path>/* | sort -rh | head -20
- Expected result:
- The top space consumers are identified (logs, data files, temp files, dumps, etc.)
- additional info:
- Log files and database write-ahead logs are common culprits; identify the pattern before deleting anything
- Command / Action:
-
Clean up unnecessary files to recover space immediately
- Command / Action:
- Remove stale logs, temporary files, or completed dump files that are safe to delete
-
kubectl exec -it <pod-name> -n <namespace> – find /<mount-path> -name “*.log” -mtime +7 -delete
- Expected result:
- Disk usage drops below the alert threshold; immediate pressure is relieved
- additional info:
- Only delete files you are certain are safe to remove; coordinate with the application team if unsure
- Command / Action:
-
Check and configure log rotation or data retention
- Command / Action:
- Review the application’s log rotation and data retention settings to prevent recurrence
- Check application config for log rotation (e.g.,
logrotate, application-level retention settings)
- Expected result:
- Retention policies are configured to prevent unbounded growth
- additional info:
- For databases, review WAL retention, vacuum settings (PostgreSQL), or purge policies
- Command / Action:
-
Expand the PersistentVolume if the StorageClass supports it
- Command / Action:
- Edit the PVC to request more storage (requires
allowVolumeExpansion: truein the StorageClass) -
kubectl get storageclass <storageclass-name> -o yaml | grep allowVolumeExpansion
-
kubectl edit pvc <pvc-name> -n <namespace>
- Edit the PVC to request more storage (requires
- Expected result:
- The PVC
storagerequest is increased; the underlying volume expands (may require pod restart)
- The PVC
- additional info:
- After editing, monitor
kubectl describe pvc <pvc-name> -n <namespace>for the resize condition - Some storage backends require the pod to be restarted for the filesystem resize to take effect inside the container
- After editing, monitor
- Command / Action:
-
Monitor the fill rate to predict when the volume will be full
- Command / Action:
- Query Prometheus for the volume fill rate
-
predict_linear(kubelet_volume_stats_available_bytes{persistentvolumeclaim="<pvc-name>"}[6h], 4 * 3600)
- Expected result:
- The predicted value is positive (volume won’t fill in the next 4 hours)
- additional info:
- A negative result means the volume is predicted to fill within 4 hours — treat as urgent
- Command / Action:
-
Confirm recovery and monitor usage trend
- Command / Action:
- Verify disk usage is back below the threshold and stable
-
kubectl exec -it <pod-name> -n <namespace> – df -h
- Expected result:
- Volume usage is below 85% and the fill rate has stabilized
- additional info:
- Set up a recurring check or dashboard panel to track volume usage over time and catch growth early
- Command / Action:
Additional resources
- Kubernetes Persistent Volumes
- Expanding Persistent Volumes
- Kubernetes Storage Classes
- Related alert: KubePersistentVolumeErrors