Runbook: NodeTextFileCollectorScrapeError
Alert Details
- Alert Name: NodeTextFileCollectorScrapeError
- Expression:
node_textfile_scrape_error{nanocosmosGroup=~".+", instance=~".+", environment=~".+"} == 1
Description
This alert triggers when there is an error in scraping metrics from the Node Exporter’s text file collector. The text file collector allows you to expose custom metrics to Prometheus by writing them to files in a specified directory.
Possible Causes
- Incorrect file format or syntax errors in the text files
- Permissions issues preventing the Node Exporter from reading the files
- Missing or corrupted text files
- Configuration errors in the Node Exporter
Troubleshooting Steps
1. Check the Text File Directory
Verify the directory where the Node Exporter reads the text files. The default directory is /var/lib/node_exporter/textfile_collector/. List the files in this directory:
ls -l /var/lib/node_exporter/textfile_collector/
Expected Output
You should see a list of text files with their permissions:
-rw-r--r-- 1 root root 1234 Nov 13 14:00 custom_metrics.prom
...
2. Verify File Format and Syntax
Check the format and syntax of the text files. Each file should contain metrics in the Prometheus text format. For example:
# HELP custom_metric_name Description of the custom metric
# TYPE custom_metric_name gauge
custom_metric_name{label="value"} 1234
Use the following command to view the contents of a file:
cat /var/lib/node_exporter/textfile_collector/custom_metrics.prom
Expected Output
You should see the metrics in the correct format:
# HELP custom_metric_name Description of the custom metric
# TYPE custom_metric_name gauge
custom_metric_name{label="value"} 1234
3. Check File Permissions
Ensure that the Node Exporter has the necessary permissions to read the text files. The files should be readable by the user running the Node Exporter (usually node_exporter or root):
sudo chown node_exporter:node_exporter /var/lib/node_exporter/textfile_collector/custom_metrics.prom
sudo chmod 644 /var/lib/node_exporter/textfile_collector/custom_metrics.prom
Expected Output
Verify the permissions:
ls -l /var/lib/node_exporter/textfile_collector/custom_metrics.prom
You should see the correct ownership and permissions:
-rw-r--r-- 1 node_exporter node_exporter 1234 Nov 13 14:00 custom_metrics.prom
4. Check Node Exporter Logs
If the issue persists, check the Node Exporter logs for any error messages related to the text file collector:
sudo journalctl -u node_exporter
Expected Output
Look for any error messages that might indicate why the text file collector is failing to scrape metrics. Common issues include file format errors, permission issues, or missing files.
5. Restart Node Exporter
If you have made changes to the text files or their permissions, restart the Node Exporter to ensure it picks up the changes:
sudo systemctl restart node_exporter
Expected Output
Check the status to ensure the Node Exporter restarted successfully:
sudo systemctl status node_exporter
You should see an output indicating that the service is active and running.
Additional Steps
1. Monitor Text File Collector
Continuously monitor the text file collector to ensure it is functioning correctly. Use tools like prometheus and grafana to set up dashboards and alerts.
2. Automate Text File Generation
If you are generating text files programmatically, ensure that the generation process includes validation steps to prevent syntax errors and ensure correct formatting.
By following these steps, you should be able to troubleshoot and resolve the “NodeTextFileCollectorScrapeError” alert. If the issue persists, further investigation into the specific text files and their generation process may be necessary.