Alert Runbooks

ExporterDown

VtransExporterDown

Description

This alert triggers when the exporter service for the vtrans-metrics job is not able to collect metrics from the host.


Possible Causes:


Severity estimation

If vtrans load in the region of the affected server is not at full capacity, i.e. if there are still servers which are not overloaded,this alert is not critical. This Dashboard can be utilized to get an vtrans load overview for all regions and make sure the dashboard filter are set to the desired values.


Troubleshooting steps

  1. Log into server

  2. Check exporter service status

    • the service runs via the service user, switch user with sudo su and then su service
    • service runs with pm2 process manager
    • Command / Action:
      • pm2 list

    • Expected output:
      • The service status is online.
      • Example:
        $ pm2 list
        In memory PM2 version: 5.3.0
        Local PM2 version: 6.0.6 
        ┌────┬────────────────────┬──────────┬──────┬───────────┬──────────┬──────────┐
        │ id │ name               │ mode     │ ↺    │ status    │ cpu      │ memory   │
        ├────┼────────────────────┼──────────┼──────┼───────────┼──────────┼──────────┤
        │ 0  │ push-worker        │ fork     │ 15   │ online    │ 0%       │ 316.8mb  │
        └────┴────────────────────┴──────────┴──────┴───────────┴──────────┴──────────┘
  3. Restart exporter service

    • Command / Action:
      • pm2 restart <name/id>

    • Expected result:
      • service restarts without errors.
  4. Check service logs

    • Command / Action:
      • check logs for errors, also compare to other logs of healthy instances to see how the logs should look like
      • pm2 logs pm2 logs –lines 200 pm2 monit

  5. Verify HTTP endpoint

    • Command / Action:
      • https://<fqdn>/vtrans2stats
        
    • Expected result:
      • Exporter exposes HTTP endpoint and provides metric data
    • Example:
      # HELP ffmpeg_idle A Vtrans/2 srver is IDLE if it is not running any push, pull or passthrough process
      # TYPE ffmpeg_idle gauge
      ffmpeg_idle 1
      
      # HELP ffmpeg_processes_active The number of active processes at the moment
      # TYPE ffmpeg_processes_active gauge
      
      # HELP ffmpeg_processes_total The total number of processes, including (if any) the ones that were respawned because of errors
      # TYPE ffmpeg_processes_total counter
      ...
      ...
      ...
      # HELP vtrans_overloaded Whether the host is overloaded at the moment, meaning that it is either out of slots or with a high CPU load
      # TYPE vtrans_overloaded gauge
      vtrans_overloaded 0
      
      # HELP vtrans_capacity The current capacity of this host in number of concurrent processes
      # TYPE vtrans_capacity gauge
      vtrans_capacity 8
      
      # HELP vtrans_maxcapacity The maximum capacity of this host in number of concurrent processes, calculated from the Number of Processes Per Core minus a 10% headroom
      # TYPE vtrans_maxcapacity gauge
      vtrans_maxcapacity 8.64
      
      # HELP vtrans_version The deployed versions of the engine and application
      # TYPE vtrans_version gauge

Additional resources

PM2 documentation Streamcloud server naming scheme todo: streamcloud balancing runbook todo : streamcloud load estimation dashboarad