AllBlackboxProbesUnsuccessful
Runbook: AllBlackboxProbesUnsuccessful Alert
Alert Details
- Alert Name: AllBlackboxProbesUnsuccessful
- Expression:
sum without (cluster) (probe_success{nanocosmosGroup=~".+", environment=~".+"}) == 0
Description
This alert is triggered when the sum of successful probes (probe_success) for all hosts in a specific group (nanocosmosGroup) and environment (environment) is equal to zero. This indicates that all blackbox probes in this group and environment have failed.
Possible Causes
- Network issues affecting the reachability of the hosts.
- All hosts in the group are down or powered off.
- Misconfiguration of probes or monitoring tools.
- Power supply issues or hardware failures.
Troubleshooting Steps
1. If the Alert is triggered by bintu service (Bintu API, Token, Dashboard) refer to this Runbook instead.
2. Check Network Connectivity
Verify the network connections to the affected hosts.
|
|
Expected Output:
|
|
3. Verify Host Status
Ensure that the hosts are running and reachable.
|
|
Expected Output:
|
|
4. Check Probe Configuration
Ensure that the probes are correctly configured and running.
|
|
Expected Output:
|
|
5. Review Logs
Check the logs of the affected hosts and probes for errors.
|
|
Expected Output:
|
|
Additional Steps
If the issue persists, consider:
- Restarting the affected services or hosts.
- Checking the hardware for failures.
- Contacting the network or system administrator.