HttpProbeFail
Runbook: HttpProbeFail Alert
Alert Details
- Alert Name: HttpProbeFail
- Expression:
probe_http_status_code{job=~".+", nanocosmosGroup=~".+", environment=~".+"} <= 199 or probe_http_status_code{job=~".+", nanocosmosGroup=~".+", environment=~".+"} >= 400
Description
This alert is triggered when the HTTP status code returned by a probe is less than or equal to 199 or greater than or equal to 400 for any job within a specific group (nanocosmosGroup) and environment (environment). This indicates that the HTTP probe has failed, either due to client errors (4xx) or server errors (5xx).
Possible Causes
- The target service is down or unresponsive.
- Misconfiguration of the probe or target service.
- Network issues affecting the reachability of the target.
- DNS resolution issues.
- Firewall or security group rules blocking the probe.
Troubleshooting Steps
1. Check HTTP Status Code
Verify the HTTP status code returned by the target service.
|
|
Expected Output:
|
|
2. Verify Target Service Status
Ensure that the target service is running and reachable.
|
|
Expected Output:
|
|
3. Check Probe Configuration
Ensure that the probe is correctly configured and running.
|
|
Expected Output:
|
|
4. Review Logs
Check the logs of the target service and the probe for any errors or warnings.
|
|
Expected Output:
|
|
|
|
Expected Output:
|
|
5. DNS Resolution Check
Ensure that DNS resolution for the target is working correctly and not causing delays.
|
|
Expected Output:
|
|
Additional Steps
If the issue persists, consider:
- Restarting the affected services or hosts.
- Checking firewall or security group rules.
- Contacting the network or system administrator for further investigation.