Runbook: VtransOverload

Alert Details

  • Alert Name: VtransOverload
  • Expression: max without (cluster, provider) (vtrans_overloaded{instance!~"vtrans.+", environment=~".+"}) == 0

Description

This alert triggers when the vtrans_overloaded metric indicates that the vtrans service is overloaded across all instances in the specified environment, excluding those matching the pattern vtrans.+.

Possible Causes

  • High load on the vtrans service.
  • Insufficient resources allocated to the vtrans service.
  • Network issues affecting the vtrans service.
  • Configuration errors in the vtrans setup.

Troubleshooting Steps

  1. Check Vtrans Service Status

    • Command: systemctl status vtrans-service
    • Expected Output: The status of the vtrans service. Look for “active (running)”.
    • Example:
      $ systemctl status vtrans-service
      ● vtrans-service.service - Vtrans Service
         Loaded: loaded (/etc/systemd/system/vtrans-service.service; enabled; vendor preset: enabled)
         Active: active (running) since Wed 2024-11-13 14:00:00 UTC; 19min ago
      
  2. Restart Vtrans Service

    • Command: sudo systemctl restart vtrans-service
    • Expected Output: The service restarts without errors.
    • Example:
      $ sudo systemctl restart vtrans-service
      
  3. Check Resource Utilization

    • Command: top -b -n 1 | grep vtrans
    • Expected Output: Resource usage statistics for the vtrans service.
    • Example:
      $ top -b -n 1 | grep vtrans
      1234 vtrans    20   0  123456  12345  1234 S  0.0  0.1   0:00.00 vtrans-service
      
  4. Check Network Connectivity

    • Command: ping -c 4 vtrans-service-hostname
    • Expected Output: Successful ping responses.
    • Example:
      $ ping -c 4 vtrans-service-hostname
      PING vtrans-service-hostname (192.168.1.5) 56(84) bytes of data.
      64 bytes from vtrans-service-hostname: icmp_seq=1 ttl=64 time=0.123 ms
      64 bytes from vtrans-service-hostname: icmp_seq=2 ttl=64 time=0.124 ms
      64 bytes from vtrans-service-hostname: icmp_seq=3 ttl=64 time=0.125 ms
      64 bytes from vtrans-service-hostname: icmp_seq=4 ttl=64 time=0.126 ms
      
  5. Verify Vtrans Configuration

    • Command: cat /etc/vtrans-service/config.yml
    • Expected Output: Configuration file contents. Ensure all settings are correct.
    • Example:
      $ cat /etc/vtrans-service/config.yml
      job_name: 'vtrans'
      max_load: 80
      

Additional Steps

  • Check Logs for Errors

    • Command: journalctl -u vtrans-service --since "1 hour ago"
    • Expected Output: Recent logs for the vtrans service. Look for any error messages.
    • Example:
      $ journalctl -u vtrans-service --since "1 hour ago"
      -- Logs begin at Wed 2024-11-13 13:00:00 UTC, end at Wed 2024-11-13 14:00:00 UTC. --
      Nov 13 13:45:00 hostname vtrans-service[1234]: Vtrans job processed
      Nov 13 13:50:00 hostname vtrans-service[1234]: Error: Service overloaded
      
  • Check Underlying Infrastructure

    • Ensure the server hosting the vtrans service is up and running.
    • Verify there are no ongoing maintenance activities or outages.