Runbook: VtransOverload
Alert Details
- Alert Name: VtransOverload
- Expression:
max without (cluster, provider) (vtrans_overloaded{instance!~"vtrans.+", environment=~".+"}) == 0
Description
This alert triggers when the vtrans_overloaded metric indicates that the vtrans service is overloaded across all instances in the specified environment, excluding those matching the pattern vtrans.+.
Possible Causes
- High load on the vtrans service.
- Insufficient resources allocated to the vtrans service.
- Network issues affecting the vtrans service.
- Configuration errors in the vtrans setup.
Troubleshooting Steps
-
Check Vtrans Service Status
- Command:
systemctl status vtrans-service - Expected Output: The status of the vtrans service. Look for “active (running)”.
- Example:
$ systemctl status vtrans-service ● vtrans-service.service - Vtrans Service Loaded: loaded (/etc/systemd/system/vtrans-service.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2024-11-13 14:00:00 UTC; 19min ago
- Command:
-
Restart Vtrans Service
- Command:
sudo systemctl restart vtrans-service - Expected Output: The service restarts without errors.
- Example:
$ sudo systemctl restart vtrans-service
- Command:
-
Check Resource Utilization
- Command:
top -b -n 1 | grep vtrans - Expected Output: Resource usage statistics for the vtrans service.
- Example:
$ top -b -n 1 | grep vtrans 1234 vtrans 20 0 123456 12345 1234 S 0.0 0.1 0:00.00 vtrans-service
- Command:
-
Check Network Connectivity
- Command:
ping -c 4 vtrans-service-hostname - Expected Output: Successful ping responses.
- Example:
$ ping -c 4 vtrans-service-hostname PING vtrans-service-hostname (192.168.1.5) 56(84) bytes of data. 64 bytes from vtrans-service-hostname: icmp_seq=1 ttl=64 time=0.123 ms 64 bytes from vtrans-service-hostname: icmp_seq=2 ttl=64 time=0.124 ms 64 bytes from vtrans-service-hostname: icmp_seq=3 ttl=64 time=0.125 ms 64 bytes from vtrans-service-hostname: icmp_seq=4 ttl=64 time=0.126 ms
- Command:
-
Verify Vtrans Configuration
- Command:
cat /etc/vtrans-service/config.yml - Expected Output: Configuration file contents. Ensure all settings are correct.
- Example:
$ cat /etc/vtrans-service/config.yml job_name: 'vtrans' max_load: 80
- Command:
Additional Steps
-
Check Logs for Errors
- Command:
journalctl -u vtrans-service --since "1 hour ago" - Expected Output: Recent logs for the vtrans service. Look for any error messages.
- Example:
$ journalctl -u vtrans-service --since "1 hour ago" -- Logs begin at Wed 2024-11-13 13:00:00 UTC, end at Wed 2024-11-13 14:00:00 UTC. -- Nov 13 13:45:00 hostname vtrans-service[1234]: Vtrans job processed Nov 13 13:50:00 hostname vtrans-service[1234]: Error: Service overloaded
- Command:
-
Check Underlying Infrastructure
- Ensure the server hosting the vtrans service is up and running.
- Verify there are no ongoing maintenance activities or outages.