Contents
Abstract
Nagios is a stable, scalable and extensible enterprise-class network and system monitoring tool which allows administrators to monitor network and host resources such as HTTP, SMTP, POP3, disk usage and processor load. Originally Nagios was designed to run under Linux, but it can also be used on several UNIX operating systems. This chapter covers the installation and parts of the configuration of Nagios (http://www.nagios.org/).
The most important features of Nagios are:
Monitoring of network services (SMTP, POP3, HTTP, NNTP, etc.).
Monitoring of host resources (processor load, disk usage, etc.).
Simple plug-in design that allows administrators to develop further service checks.
Support for redundant Nagios servers.
Install Nagios either with zypper or using YaST.
For further information on how to install packages see:
Раздел “Using Zypper” (Глава 9, Managing Software with Command Line Tools, ↑Вступление)
Раздел “Installing and Removing Packages or Patterns” (Глава 5, Installing or Removing Software, ↑Вступление)
Both methods install the packages
nagios and
nagios-www. The later RPM
package contains a Web interface for Nagios which allows, for example, to
view the service status and the problem history. However, this is not
absolutely necessary.
Nagios is modular designed and, thus, uses external check plug-ins to
verify whether a service is available or not. It is recommended to
install the nagios-plugin RPM package that
contains ready-made check plug-ins. However, it is also possible to write
your own, custom check plug-ins.
Nagios organizes the configuration files as follows:
/etc/nagios/nagios.cfg
Main configuration file of Nagios containing a number of directives which define how Nagios operates. See http://nagios.sourceforge.net/docs/3_0/configmain.html for a complete documentation.
/etc/nagios/resource.cfg
Containing path to all Nagios plug-ins (default:
/usr/lib/nagios/plugins).
/etc/nagios/command.cfg
Defining the programs to be used to determine the availability of services or the commands which are used to send e-mail notifications.
/etc/nagios/cgi.cfg
Contains options regarding the Nagios Web interface.
/etc/nagios/objects/
A directory containing object definition files. See Section 3.3.1, “Object Definition Files” for a more complete documentation.
In addition to those configuration files Nagios comes with very flexible and highly customizable configuration files called Object Definition configuration files. Those configuration files are very important since they define the following objects:
Hosts
Services
Contacts
The flexibility lies in the fact that objects are easily enhanceable. Imagine you are responsible for a host with only one service running. However, you want to install another service on the same host machine and you want to monitor that service as well. It is possible to add another service object and assign it to the host object without huge efforts.
Right after the installation, Nagios offers default templates for object
definition configuration files. They can be found at
/etc/nagios/objects. In the following see a
description on how hosts, services and contacts are added:
Example 3.1. A Host Object Definition
define host {
name SRV1
host_name SRV1
address 192.168.0.1
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
notification_period workhours
notification_interval 120
notification_options d,u,r
}
The host_name option defines a name to identify
the host that has to be monitored. address is
the IP address of this host. The use statement
tells Nagios to inherit other configuration values from the generic-host
template. check_period defines whether the
machine has to be monitored 24x7.
check_interval makes Nagios checking the
service every 5 minutes and retry_interval
tells Nagios to schedule host check retries at 1 minute intervals.
Nagios tries to execute the checks multiple times when they do not pass.
You can define how many attempts Nagios should do with the
max_check_attempts directive. All configuration
flags beginning with notification handle how
Nagios should behave when a failure of a monitored service occurs. In
the host definition above, Nagios notifies the administrators only on
working hours. However, this can be adjusted with
notification_period. According to
notification_interval notifications will be
resend every two hours. notification_options
contains four different flags: d, u, r and
n. They control in which state Nagios should
notify the administrator. d stands for a
down state, u for
unreachable and r for
recoveries. n does not send
any notifications anymore.
Example 3.2. A Service Object Definition
define service {
use generic-service
host_name SRV1
service_description PING
contact_groups router-admins
check_command check_ping!100.0,20%!500.0,60%
}
The first configuration directive use tells
Nagios to inherit from the generic-service
template. host_name is the name that assigns
the service to the host object. The host itself is defined in the host
object definition. A description can be set with
service_description. In the example above the
description is just PING. Within the
contact_groups option it is possible to refer
to a group of people who will be contacted on a failure of the service.
This group and its members are later defined in a contact group object
definition. check_command sets the program that
checks whether the service is available, or not.
Example 3.3. A Contact and Contactgroup Definition
define contact {
contact_name admins
use generic-contact
alias Nagios Admin
email nagios@localhost
}
define contactgroup {
contactgroup_name router-admins
alias Administrators
members admins
}
The example listing above shows the direct
contact definition and its proper
contactgroup. The
contact definition contains the e-mail address
and the name of the person who is contacted on a failure of a service.
Usually this is the responsible administrator.
use inherits configuration values from the
generic-contact definition.
An overview of all Nagios objects and further information about them can be found at: http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html.
Learn step-by-step how to configure Nagios to monitor different things like remote services or remote host-resources.
This section explains how to monitor remote services with Nagios. Proceed as follows to monitor a remote service:
Procedure 3.1. Monitoring a Remote HTTP Service with Nagios
Create a directory inside /etc/nagios/objects
using mkdir. You can use any desired name for
it.
Open /etc/nagios/nagios.conf and set
cfg_dir (configuration directory) to the
directory you have created in the first step.
Change to the configuration directory created in the first step and
create the following files: hosts.cfg,
services.cfg and
contacts.cfg
Insert a host object in hosts.cfg:
define host {
name host.name.com
host_name host.name.com
address 192.168.0.1
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
contact_groups admins
notification_interval 60
notification_options d,u,r
}
Insert a service object in services.cfg:
define service {
use generic-service
host_name host.name.com
service_description HTTP
contact_groups router-admins
check_command check_http
}
Insert a contact and contactgroup object in
contacts.cfg:
define contact {
contact_name max-mustermann
use generic-contact
alias Webserver Administrator
email mmustermann@localhost
}
define contactgroup {
contactgroup_name admins
alias Administrators
members max-mustermann
}
Execute rcnagios restart to (re)start Nagios.
Execute cat /var/log/nagios/nagios.log and verify whether the following content appears:
[1242115343] Nagios 3.0.6 starting... (PID=10915) [1242115343] Local time is Tue May 12 10:02:23 CEST 2009 [1242115343] LOG VERSION: 2.0 [1242115343] Finished daemonizing... (New PID=10916)
If you need to monitor a different remote service, it is possible to
adjust check_command in step
Step 5. A full list of all available check
programs can be obtained by executing ls
/usr/lib/nagios/plugins/check_*
See Section 3.5, “Troubleshooting” if an error occurred.
This section explains how to monitor remote host resources with Nagios.
Proceed as follows on the Nagios server:
Procedure 3.2. Monitoring a Remote Host Resource with Nagios (Server)
Install nagios-nsca (for
example, zypper in nagios-nsca).
Set the following options in
/etc/nagios/nagios.cfg:
check_external_commands=1 accept_passive_service_checks=1 accept_passive_host_checks=1 command_file=/var/spool/nagios/nagios.cmd
Set the command_file option in
/etc/nagios/nsca.conf to the same file defined in
/etc/nagios/nagios.conf.
Add another host and service object:
define host {
name foobar
host_name foobar
address 10.10.4.234
use generic-host
check_period 24x7
check_interval 0
retry_interval 1
max_check_attempts 1
active_checks_enabled 0
passive_checks_enabled 1
contact_groups router-admins
notification_interval 60
notification_options d,u,r
}define service {
use generic-service
host_name foobar
service_description diskcheck
active_checks_enabled 0
passive_checks_enabled 1
contact_groups router-admins
check_command check_ping
}Execute rcnagios restart and rcnsca restart.
Proceed as follows on the client you want to monitor:
Procedure 3.3. Monitoring a Remote Host Resource with Nagios (client)
Install nagios-nsca-client
on the host you want to monitor.
Write your test scripts (for example a script that checks the disk usage) like this:
#!/bin/bash
NAGIOS_SERVER=10.10.4.166
THIS_HOST=foobar
#
# Write own test algorithm here
#
# Execute On SUCCESS:
echo "$THIS_HOST;diskcheck;0;OK: test ok" \
| send_nsca -H $NAGIOS_SERVER -p 5667 -c /etc/nagios/send_nsca.cfg -d ";"
# Execute On Warning:
echo "$THIS_HOST;diskcheck;1;Warning: test warning" \
| send_nsca -H $NAGIOS_SERVER -p 5667 -c /etc/nagios/send_nsca.cfg -d ";"
# Execute On FAILURE:
echo "$THIS_HOST;diskcheck;2;CRITICAL: test critical" \
| send_nsca -H $NAGIOS_SERVER -p 5667 -c /etc/nagios/send_nsca.cfg -d ";"Insert a new cron entry with crontab -e. A typical cron entry could look like this:
*/5 * * * * /directory/to/check/program/check_diskusage
Error: ABC 'XYZ' specified in ... '...' is not defined anywhere!
Make sure that you have defined all necessary objects correctly. Be careful with the spelling.
(Return code of 127 is out of bounds - plugin may be missing)
Make sure that you have installed
nagios-plugins.
Make sure that you have installed and configured a mail server like
postfix or exim
correctly. You can verify if your mail server works with echo
"Mail Server Test!" | mail foo@bar.com which sends an e-mail
to foo@bar.com. If this e-mail arrives, your mail server is working
correctly. Otherwise, check the log files of the mail server.
http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html