SslProbeFail
Runbook: SslProbeFail Alert
Alert Details
- Alert Name: SslProbeFail
- Expression:
sum without (cluster) (probe_http_ssl{job=~".+", nanocosmosGroup=~".+", environment=~".+"}) == 0
Description
This alert is triggered when the sum of successful SSL probes (probe_http_ssl) for all jobs within a specific group (nanocosmosGroup) and environment (environment) is equal to zero. This indicates that all SSL probes in this group and environment have failed.
Possible Causes
- SSL certificate issues (expired, misconfigured, or invalid).
- Network issues affecting the reachability of the target.
- Misconfiguration of the probe or monitoring tools.
- DNS resolution issues.
- Firewall or security group rules blocking the probe.
Troubleshooting Steps
1. Check SSL Certificate
Verify the SSL certificate of the target service.
|
|
Expected Output:
|
|
2. Check Network Connectivity
Verify the network connections to the target.
|
|
Expected Output:
|
|
3. Verify Target Service Status
Ensure that the target service is running and reachable.
|
|
Expected Output:
|
|
4. Check Probe Configuration
Ensure that the probe is correctly configured and running.
|
|
Expected Output:
|
|
5. Review Logs
Check the logs of the target service and the probe for any errors or warnings.
|
|
Expected Output:
|
|
|
|
Expected Output:
|
|
6. DNS Resolution Check
Ensure that DNS resolution for the target is working correctly and not causing delays.
|
|
Expected Output:
|
|
Additional Steps
If the issue persists, consider:
- Restarting the affected services or hosts.
- Checking firewall or security group rules.
- Contacting the network or system administrator for further investigation.