Runbook: ExporterScrapeTimeLimit Alert

Alert Details

  • Alert Name: ExporterScrapeTimeLimit
  • Expression: scrape_duration_seconds{job="integrations/postgres"} > 2

Description

This alert triggers when the time to scrape the database metrics exceeds 2 sec.

Possible Causes

  • long running queries
  • high system load on PostgreSQL server

Troubleshooting Steps

1. Check CPU and IOPS usage of the PostgreSQL server

An overloaded server may have difficulty collecting metrics.

2. Look at PostgreSQL Server logs to identify long running queries.

You may need to enable log_min_duration_statement to identify which queries are long to be executed.

3. identify and kill heavy queries

SELECT pg_terminate_backend(pid), usename, datname, application_name, client_addr, client_port, state, wait_event_type, wait_event, state_change, query FROM pg_stat_activity WHERE pid in ('<replace_with_pids>');