Runbook: HighTransactionFailRate Alert

Alert Details

  • Alert Name: HighTransactionFailRate
  • Expression: rate(pg_stat_activity_count{state="idle in transaction (aborted)", datname!~"template.*"}[5m]) >= 5

Description

This alert triggers when when the rate of aborted transactions in a 5 min interval is equal or greater than 5. These are transactions that were aborted but remain idle, potentially locking resources and causing performance degradation.

Possible Causes

  • Not properly handling transaction errors
  • Connection leaks in application code
  • ORM misconfiguration
  • Applications crashing mid-transaction
  • Long-running clients that don’t properly close connections
  • Missing or too high idle_in_transaction_session_timeout
  • Insufficient connection timeouts

Troubleshooting Steps

1. Terminate Problem Sessions:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction (aborted)';

2. Investigate Source:

  • Check application logs for errors

  • Identify which client hosts are involved (client_addr in pg_stat_activity)