Creating a nagios monitor
Overview
Our nagios server is jojobe.mayfirst.org (which is available via https as https://monitor.mayfirst.org/).
We monitor network accessible services (like https, smtp, imap, etc) the standard way: Our nagios server periodically tries to connect over the network to these services on each server to see if they are still running properly.
In addition, it's useful to check services that are not publicly accessible over the network - such as disk usage, or whether MySQL (which only listens on a local port) is still running.
The standard way to setup nagios alerts for local services like these is to have every server run a nagios service open on a public IP address. Then, the nagios server connects on this port and queries all the local services.
We do it differently because we want to avoid running another publicly accessible service on every server.
We run local scripts on each server via cron jobs and then use SCP to copy the output to our nagios server every hour. Then the nagios server checks the output of the scripts to decide whether to throw an alert or not.
Set up a script to run locally
These files are stored in puppet/modules/mayfirst/files/monitor-utils/ you can find examples of different versions of monitoring scripts there. Each script should generate output that starts with either OK:, WARNING: or CRITICAL:. It then sends this output via standard in to the script mf-monitor-output, with the type of check as the first argument, for example:
echo "Warning: /root partition at 80%" | mf-monitor-output df
In this example "df" is the type of check being run.
The mf-monitor-output
script is smart enough to detect if it is being run via a terminal (e.g. by an admin) and if so, it prints the output to standard out so you can read it. On the other hand, if there is no terminal (e.g. cron job), it writes the output to /var/log/mfpl/monitor/$(hostname).$(type).txt, which then gets copied to jojobe.
If you want to add a check, review the existing files for an example.
Set up a cronjob
Your script won't get called unless it is included in the cron job. You'll need to edit:
puppet/modules/mayfirst/templates/monitor-utils/cron.d/mf-monitor
And add your script.
Add to utils.pp
You also need to ensure your script gets copied.
Modify:
puppet/modules/mayfirst/manifests/utils.pp
The code should look something like this, with the correct file from the executable specified.:
file { "/usr/local/sbin/mf-monitor-df": source => "puppet:///modules/mayfirst/monitor-utils/mf-monitor-df", }
Define hostgroup
Next, we have to define a host group - this is a group of servers that will use this check.
See:
projects/puppet/modules/mayfirst/files/nagios/nagios3/conf.d/
This code section should look something like this:
define hostgroup { hostgroup_name df-servers alias File System Check Servers }
Add the check as a service
The service part of the infrastructure is the display component for nagios.
See:
puppet/modules/mayfirst/files/nagios/nagios3/conf.d/services_nagios2.cfg
Copy a pre-existing stanza and make the necessary changes. It will look something like this:
define service{ hostgroup_name df-servers service_description DF check_command mf-checker!df notification_interval 0 use generic-service }
NOTE: the ! separates the command from the first argument (type of check). In this example "df" is the service being checked. Replace "df" with the service you created.
Finally add the hostgroup to nagios manifest
This is not a mandatory step. If the monitor should be run on all servers, then add the hostgroup/service here.
Otherwise leave it out and instead add it to the individual server's .pp file.
projects/puppet/modules/mayfirst/manifests/nagios.pp
if ( $include_standard_hostgroups == true ) { $standard_hostgroups = [ 'df-servers', 'upgrade-servers', 'mailq-servers' ] $assigned_hostgroups = concat($hostgroups, $standard_hostgroups) } else { $assigned_hostgroups = $hostgroups }