Runbook: SslCertificateExpiresSoon Alert
Alert Details
- Alert Name: SslCertificateExpiresSoon
- Expression:
(min without (cluster) (probe_ssl_earliest_cert_expiry{job=~".+", nanocosmosGroup=~".+", environment=~".+"}) - time()) / 3600 / 24 <=
Description
This alert is triggered when the SSL certificate for any job within a specific group (nanocosmosGroup) and environment (environment) is about to expire within a specified number of days. This indicates that the SSL certificate needs to be renewed soon to avoid service disruptions.
Possible Causes
- SSL certificate is nearing its expiration date.
- Misconfiguration of the SSL certificate monitoring.
- Delays in the certificate renewal process.
Troubleshooting Steps
1. Check SSL Certificate Expiry Date
Verify the expiry date of the SSL certificate for the target service.
# Example: Check SSL certificate expiry date using OpenSSL
echo | openssl s_client -connect <target_hostname_or_ip>:443 2>/dev/null | openssl x509 -noout -dates
Expected Output:
notBefore=Nov 13 00:00:00 2023 GMT
notAfter=Nov 13 00:00:00 2024 GMT
2. Renew SSL Certificate
If the certificate is nearing its expiry date, renew it using your certificate authority (CA).
# Example: Renew SSL certificate using Certbot (for Let's Encrypt)
sudo certbot renew
Expected Output:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Cert not yet due for renewal
...
3. Verify SSL Certificate Renewal
After renewing the certificate, verify that the new certificate is in place and has the correct expiry date.
# Example: Verify the new SSL certificate expiry date
echo | openssl s_client -connect <target_hostname_or_ip>:443 2>/dev/null | openssl x509 -noout -dates
Expected Output:
notBefore=Nov 13 00:00:00 2024 GMT
notAfter=Nov 13 00:00:00 2025 GMT
4. Check Probe Configuration
Ensure that the probe is correctly configured to monitor the SSL certificate expiry.
# Example: Check probe configuration
cat /etc/prometheus/prometheus.yml | grep -A 10 'scrape_configs:'
Expected Output:
scrape_configs:
- job_name: 'probe'
metrics_path: /probe
params:
module: [ssl_earliest_cert_expiry]
static_configs:
- targets:
- <target_hostname_or_ip>
...
5. Review Logs
Check the logs of the probe for any errors or warnings related to SSL certificate monitoring.
# Example: Review logs of the probe
cat /var/log/prometheus/probe.log | tail -n 50
Expected Output:
<timestamp> <log_level> <log_message>
...
Additional Steps
If the issue persists, consider:
- Contacting your certificate authority (CA) for assistance with the renewal process.
- Checking for any automation issues in the certificate renewal process.
- Contacting the network or system administrator for further investigation.