Alert Runbooks

LongRunningQueries

Runbook: LongRunningQueries Alert

Alert Details

Description

Alert is triggered when SQL queries run for an extended period.

Possible Causes

Troubleshooting Steps

1. Identify the PIDs of the long running queries

SELECT pid, leader_pid, datname as database, usename AS user, application_name, EXTRACT(EPOCH FROM now() - xact_start) as transaction_duration, CASE WHEN state = 'active' THEN EXTRACT(EPOCH FROM now() - query_start) ELSE null END query_duration, state, wait_event_type, wait_event, CASE WHEN state = 'active' THEN query ELSE null END query, xact_start as transaction_start, CASE WHEN state = 'active' THEN query_start ELSE null END query_start, CASE WHEN state != 'active' THEN EXTRACT(EPOCH FROM state_change - query_start) ELSE null END last_query_duration, CASE WHEN state != 'active' THEN query ELSE null END last_query, pg_blocking_pids(pid) as blocked_by, client_addr, backend_start, ceil(EXTRACT(EPOCH FROM now() - backend_start)) as backend_duration FROM pg_stat_activity WHERE query_start is not null AND usename!='rdsrepladmin' AND query not like 'START_REPLICATION %' AND pid != pg_backend_pid();

Queries could be blocked in trying to acquire a lock, so pay particular attention at the blocked_by column. If you identify specific queries blocking others, note down their PIDs. Below is a focused view of current locks on the database:

SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statement, blocking_activity.query AS current_statement_in_blocking_process FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted;

2. Terminate in priority the blocking queries, if not enough, terminate the other long running queries

SELECT pg_terminate_backend(pid), usename, datname, application_name, client_addr, client_port, state, wait_event_type, wait_event, state_change, query FROM pg_stat_activity WHERE pid in ('<replace_with_pids>');