AllBlackboxProbesUnsuccessful
Runbook: AllBlackboxProbesUnsuccessful Alert
Alert Details
- Alert Name: AllBlackboxProbesUnsuccessful
- Expression:
sum without (cluster) (probe_success{nanocosmosGroup=~".+", environment=~".+"}) == 0
Description
This alert is triggered when the sum of successful probes (probe_success) for all hosts in a specific group (nanocosmosGroup) and environment (environment) is equal to zero. This indicates that all blackbox probes in this group and environment have failed.
Possible Causes
- Network issues affecting the reachability of the hosts.
- All hosts in the group are down or powered off.
- Misconfiguration of probes or monitoring tools.
- Power supply issues or hardware failures.
Troubleshooting Steps
1. Check Network Connectivity
Verify the network connections to the affected hosts.
|
|
Expected Output:
|
|
2. Verify Host Status
Ensure that the hosts are running and reachable.
|
|
Expected Output:
|
|
3. Check Probe Configuration
Ensure that the probes are correctly configured and running.
|
|
Expected Output:
|
|
4. Review Logs
Check the logs of the affected hosts and probes for errors.
|
|
Expected Output:
|
|
Additional Steps
If the issue persists, consider:
- Restarting the affected services or hosts.
- Checking the hardware for failures.
- Contacting the network or system administrator.
POSSIBLE ADITIONS
- Grafana link
- How to get hostname_or_ip