KubePersistentVolumeFillingUp

Description

This alert fires when a PersistentVolume (PV) is running low on available disk space, typically above 85% utilization. If the volume fills up completely, the workload using it will likely crash or become read-only, potentially causing data loss, application errors, or service outages.

Possible Causes:

Continuous log or data accumulation without a retention or cleanup policy
Database growth exceeding the initial volume size estimate
Application bug causing unbounded file or data growth
Accumulated temporary files, crash dumps, or core files
Misconfigured backup or snapshot retention writing data back to the volume
Insufficient initial volume size for the actual workload requirements
Missing or misconfigured log rotation

Severity estimation

Medium to High severity, depending on fill rate and remaining space:

Medium: Volume at 85–90%; workload is still operational but action is required
High: Volume at 90–95%; imminent risk of the volume filling completely
Critical: Volume above 95% or filling rapidly; workload failure is imminent

Severity increases with:

How quickly the volume is filling (check the rate of growth)
Criticality of the workload depending on the volume (database, message queue, etc.)
Whether the volume is already causing write errors

Troubleshooting steps

Identify the affected PVC and namespace
- Command / Action:
  - Check alert labels for the PVC name and namespace, then confirm current usage
  - kubectl get pvc -n <namespace>
  - kubectl describe pvc <pvc-name> -n <namespace>
- Expected result:
  - The affected PVC is identified and its bound PV is confirmed
- additional info:
  - Note the StorageClass — it determines whether the volume can be expanded online

Check actual disk usage inside the pod
- Command / Action:
  - Exec into the pod using the volume and check disk usage
  - kubectl exec -it <pod-name> -n <namespace> – df -h
  - kubectl exec -it <pod-name> -n <namespace> – du -sh /<mount-path>/*
- Expected result:
  - The mount path shows high utilization; the du output identifies which directories are largest
- additional info:
  - Replace /<mount-path> with the actual volume mount path from the pod spec

Identify what is consuming the most space
- Command / Action:
  - Find the largest files and directories on the volume
  - kubectl exec -it <pod-name> -n <namespace> – du -sh /<mount-path>/* | sort -rh | head -20
- Expected result:
  - The top space consumers are identified (logs, data files, temp files, dumps, etc.)
- additional info:
  - Log files and database write-ahead logs are common culprits; identify the pattern before deleting anything

Clean up unnecessary files to recover space immediately
- Command / Action:
  - Remove stale logs, temporary files, or completed dump files that are safe to delete
  - kubectl exec -it <pod-name> -n <namespace> – find /<mount-path> -name “*.log” -mtime +7 -delete
- Expected result:
  - Disk usage drops below the alert threshold; immediate pressure is relieved
- additional info:
  - Only delete files you are certain are safe to remove; coordinate with the application team if unsure

Check and configure log rotation or data retention
- Command / Action:
  - Review the application’s log rotation and data retention settings to prevent recurrence
  - Check application config for log rotation (e.g., logrotate, application-level retention settings)
- Expected result:
  - Retention policies are configured to prevent unbounded growth
- additional info:
  - For databases, review WAL retention, vacuum settings (PostgreSQL), or purge policies

Expand the PersistentVolume if the StorageClass supports it
- Command / Action:
  - Edit the PVC to request more storage (requires allowVolumeExpansion: true in the StorageClass)
  - kubectl get storageclass <storageclass-name> -o yaml | grep allowVolumeExpansion
  - kubectl edit pvc <pvc-name> -n <namespace>
- Expected result:
  - The PVC storage request is increased; the underlying volume expands (may require pod restart)
- additional info:
  - After editing, monitor kubectl describe pvc <pvc-name> -n <namespace> for the resize condition
  - Some storage backends require the pod to be restarted for the filesystem resize to take effect inside the container

Monitor the fill rate to predict when the volume will be full
- Command / Action:
  - Query Prometheus for the volume fill rate
  - predict_linear(kubelet_volume_stats_available_bytes{persistentvolumeclaim="<pvc-name>"}[6h], 4 * 3600)
- Expected result:
  - The predicted value is positive (volume won’t fill in the next 4 hours)
- additional info:
  - A negative result means the volume is predicted to fill within 4 hours — treat as urgent

Confirm recovery and monitor usage trend
- Command / Action:
  - Verify disk usage is back below the threshold and stable
  - kubectl exec -it <pod-name> -n <namespace> – df -h
- Expected result:
  - Volume usage is below 85% and the fill rate has stabilized
- additional info:
  - Set up a recurring check or dashboard panel to track volume usage over time and catch growth early

Additional resources

Kubernetes Persistent Volumes
Expanding Persistent Volumes
Kubernetes Storage Classes
Related alert: KubePersistentVolumeErrors