HttpProbeFail
Runbook: HttpProbeFail Alert
Alert Details
- Alert Name: HttpProbeFail
- Expression:
probe_http_status_code{job=~".+", nanocosmosGroup=~".+", environment=~".+"} <= 199 or probe_http_status_code{job=~".+", nanocosmosGroup=~".+", environment=~".+"} >= 400
Description
This alert is triggered when the HTTP status code returned by a probe is less than or equal to 199 or greater than or equal to 400 for any job within a specific group (nanocosmosGroup) and environment (environment). This indicates that the HTTP probe has failed, either due to client errors (4xx) or server errors (5xx).
Possible Causes
- The target service is down or unresponsive.
- Misconfiguration of the probe or target service.
- Network issues affecting the reachability of the target.
- DNS resolution issues.
- Firewall or security group rules blocking the probe.
Troubleshooting Steps
1. If the Alert is triggered by bintu service (Bintu API, Token, Dashboard) refer to this Runbook instead.
2. Check HTTP Status Code
Verify the HTTP status code returned by the target service.
|
|
Expected Output:
|
|
3. Verify Target Service Status
Ensure that the target service is running and reachable.
|
|
Expected Output:
|
|
4. Check Probe Configuration
Ensure that the probe is correctly configured and running.
|
|
Expected Output:
|
|
5. Review Logs
Check the logs of the target service and the probe for any errors or warnings.
|
|
Expected Output:
|
|
|
|
Expected Output:
|
|
6. DNS Resolution Check
Ensure that DNS resolution for the target is working correctly and not causing delays.
|
|
Expected Output:
|
|
Additional Steps
If the issue persists, consider:
- Restarting the affected services or hosts.
- Checking firewall or security group rules.
- Contacting the network or system administrator for further investigation.