Runbook: SslCertificateExpiresSoon Alert

Alert Details

  • Alert Name: SslCertificateExpiresSoon
  • Expression: (min without (cluster) (probe_ssl_earliest_cert_expiry{job=~".+", nanocosmosGroup=~".+", environment=~".+"}) - time()) / 3600 / 24 <=

Description

This alert is triggered when the SSL certificate for any job within a specific group (nanocosmosGroup) and environment (environment) is about to expire within a specified number of days. This indicates that the SSL certificate needs to be renewed soon to avoid service disruptions.

Possible Causes

  1. SSL certificate is nearing its expiration date.
  2. Misconfiguration of the SSL certificate monitoring.
  3. Delays in the certificate renewal process.

Troubleshooting Steps

1. Check SSL Certificate Expiry Date

Verify the expiry date of the SSL certificate for the target service.

# Example: Check SSL certificate expiry date using OpenSSL
echo | openssl s_client -connect <target_hostname_or_ip>:443 2>/dev/null | openssl x509 -noout -dates

Expected Output:

notBefore=Nov 13 00:00:00 2023 GMT
notAfter=Nov 13 00:00:00 2024 GMT

2. Renew SSL Certificate

If the certificate is nearing its expiry date, renew it using your certificate authority (CA).

# Example: Renew SSL certificate using Certbot (for Let's Encrypt)
sudo certbot renew

Expected Output:

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Cert not yet due for renewal
...

3. Verify SSL Certificate Renewal

After renewing the certificate, verify that the new certificate is in place and has the correct expiry date.

# Example: Verify the new SSL certificate expiry date
echo | openssl s_client -connect <target_hostname_or_ip>:443 2>/dev/null | openssl x509 -noout -dates

Expected Output:

notBefore=Nov 13 00:00:00 2024 GMT
notAfter=Nov 13 00:00:00 2025 GMT

4. Check Probe Configuration

Ensure that the probe is correctly configured to monitor the SSL certificate expiry.

# Example: Check probe configuration
cat /etc/prometheus/prometheus.yml | grep -A 10 'scrape_configs:'

Expected Output:

scrape_configs:
  - job_name: 'probe'
    metrics_path: /probe
    params:
      module: [ssl_earliest_cert_expiry]
    static_configs:
      - targets:
        - <target_hostname_or_ip>
...

5. Review Logs

Check the logs of the probe for any errors or warnings related to SSL certificate monitoring.

# Example: Review logs of the probe
cat /var/log/prometheus/probe.log | tail -n 50

Expected Output:

<timestamp> <log_level> <log_message>
...

Additional Steps

If the issue persists, consider:

  • Contacting your certificate authority (CA) for assistance with the renewal process.
  • Checking for any automation issues in the certificate renewal process.
  • Contacting the network or system administrator for further investigation.