Changes between Version 5 and Version 6 of how-to/servers/puppet/setup-nagios-monitor


Ignore:
Timestamp:
Dec 15, 2016, 9:29:05 AM (3 years ago)
Author:
Jamie McClelland
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • how-to/servers/puppet/setup-nagios-monitor

    v5 v6  
    11== Creating a nagios monitor ==
    2 This page explains how to create a nagios monitor configuration in [wiki:how-to/puppet puppet].  You will need to make changes to get the correct configuration for the specific type of monitoring needed. 
    32
    4 === Set up an executable ===
    5 These files are stored in puppet/modules/mayfirst/files/monitor-utils/ you can
    6 find examples of different versions of monitoring scripts there.
     3=== Overview ===
     4
     5Our nagios server is jojobe.mayfirst.org (which is available via https as https://monitor.mayfirst.org/).
     6
     7We monitor network accessible services (like https, smtp, imap, etc) the standard way: Our nagios server periodically tries to connect over the network to these services on each server to see if they are still running properly.
     8
     9In addition, it's useful to check services that are not publicly accessible over the network - such as disk usage, or whether MySQL (which only listens on a local port) is still running.
     10
     11The standard way to setup nagios alerts for local services like these is to have every server run a nagios service open on a public IP address. Then, the nagios server connects on this port and queries all the local services.
     12
     13We do it differently because we want to avoid running another publicly accessible service on every server.
     14
     15We run local scripts on each server via cron jobs and then use SCP to copy the output to our nagios server every hour. Then the nagios server checks the output of the scripts to decide whether to throw an alert or not.
     16
     17=== Set up a script to run locally ===
     18
     19These files are stored in puppet/modules/mayfirst/files/monitor-utils/ you can find examples of different versions of monitoring scripts there. Each script should generate output that starts with either OK:, WARNING: or CRITICAL:. It then sends this output via standard in to the script mf-monitor-output, with the '''type''' of check as the first argument, for example:
     20
     21{{{
     22echo "Warning: /root partition at 80%" | mf-monitor-output df
     23}}}
     24
     25In this example "df" is the '''type''' of check being run.
     26
     27The `mf-monitor-output` script is smart enough to detect if it is being run via a terminal (e.g. by an admin) and if so, it prints the output to standard out so you can read it. On the other hand, if there is no terminal (e.g. cron job), it writes the output to /var/log/mfpl/monitor/$(hostname).$(type).txt, which then gets copied to jojobe.
     28
     29If you want to add a check, review the existing files for an example.
    730
    831=== Set up a cronjob ===
    9 projects/puppet/modules/mayfirst/templates/monitor-utils/cron.d/mf-monitor
    1032
    11 You'll need to add your script to this directory.
     33Your script won't get called unless it is included in the cron job. You'll need to edit:
     34
     35{{{
     36puppet/modules/mayfirst/templates/monitor-utils/cron.d/mf-monitor
     37}}}
     38
     39And add your script.
    1240
    1341=== Add to utils.pp ===
     42
     43You also need to ensure your script gets copied.
     44
     45Modify:
     46
     47{{{
    1448puppet/modules/mayfirst/manifests/utils.pp
     49}}}
    1550
    1651The code should look something like this, with the correct file from the executable specified.:
    1752
    1853{{{
    19   file { "/usr/local/sbin/mf-monitor-mailq":
    20     source => "puppet:///modules/mayfirst/monitor-utils/mf-monitor-mailq",
    21     ensure => present,
    22     mode => 755,
    23     owner => "root",
    24     group => "root"
     54  file { "/usr/local/sbin/mf-monitor-df":
     55    source => "puppet:///modules/mayfirst/monitor-utils/mf-monitor-df",
    2556  }
    2657}}}
    2758
    2859=== Define hostgroup ===
     60
     61Next, we have to define a host group - this is a group of servers that will use this check.
     62
     63See:
     64
     65{{{
    2966projects/puppet/modules/mayfirst/files/nagios/nagios3/conf.d/
     67}}}
    3068
    3169This code section should look something like this:
     
    3371{{{
    3472define hostgroup {
    35   hostgroup_name  mailq-servers
    36   alias           Mail Check Servers
     73  hostgroup_name  df-servers
     74  alias           File System Check Servers
    3775}
    3876}}}
    3977
    40 === Define nagios command ===
    41 puppet/modules/mayfirst/files/nagios/nagios3/commands.cfg
     78=== Add the check as a service ===
    4279
    43 Should look like this:
    44 
    45 {{{
    46 define command{
    47   command_name    check-upgrade
    48   command_line    /usr/local/share/nagios/plugins/mf-nagios-check-upgrade '$HOSTNAME$'
    49 }
    50 }}}
    51 === Create parsing script ===
    52 You will also need to create a script that parses the output of the monitoring
    53 files.
    54 
    55 '''puppet/modules/mayfirst/files/nagios/nagios-plugins/plugins/mf-SCRIPT-NAME'''
    56 
    57 You can model scripts that already exist to check this.
    58 
    59 === Add the check as a service ===
    6080The service part of the infrastructure is the display component for nagios.
    6181
    62 '''puppet/modules/mayfirst/files/nagios/nagios3/conf.d/services_nagios2.cfg'''
     82See:
     83{{{
     84puppet/modules/mayfirst/files/nagios/nagios3/conf.d/services_nagios2.cfg
     85}}}
    6386
    6487Copy a pre-existing stanza and make the necessary changes.  It will look something like this:
     
    6689{{{
    6790define service{
    68         hostgroup_name                  upgrade-servers
    69         service_description             Upgrade
    70         check_command                   check-upgrade
     91        hostgroup_name                  df-servers
     92        service_description             DF
     93        check_command                   mf-checker!df
    7194        notification_interval           0
    7295        use                             generic-service
     
    7497}}}
    7598
     99NOTE: the ! separates the command from the first argument ('''type''' of check). In this example "df" is the service being checked. Replace "df" with the service you created.
     100
    76101=== Finally add the hostgroup to nagios manifest ===
    77 This is not a mandatory step.  If the monitor should be run on all servers, then add the hostgroup/service here.  Otherwise leave it out, but be sure to include in the monitor script a line that specifies under what context the script should be run.  For example (from mf-monitor-fcgid):
     102
     103This is not a mandatory step.  If the monitor should be run on all servers, then add the hostgroup/service here. 
     104
     105Otherwise leave it out and instead add it to the individual server's .pp file.
    78106
    79107{{{
    80 # Only run if fcgid is installed
    81 [ ! -e "/etc/apache2/mods-enabled/fcgid.conf" ] && exit 0
     108projects/puppet/modules/mayfirst/manifests/nagios.pp
    82109}}}
    83110
    84 '''projects/puppet/modules/mayfirst/manifests/nagios.pp'''
    85 
    86 One example for standard_hostgroups is:
    87111{{{
    88112  if ( $include_standard_hostgroups == true ) {
     
    95119}}}
    96120
    97 This is from 'define m_nagios_host'.
    98 
    99 '''Make sure all executable scripts have execute permissions'''