Contents
Abstract
Nagios is a stable, scalable and extensible enterprise-class network and system monitoring tool which allows administrators to monitor network and host resources such as HTTP, SMTP, POP3, disk usage and processor load. Originally Nagios was designed to run under Linux, but it can also be used on several UNIX operating systems. This chapter covers the installation and parts of the configuration of Nagios (http://www.nagios.org/).
The most important features of Nagios are:
Monitoring of network services (SMTP, POP3, HTTP, NNTP, etc.).
Monitoring of host resources (processor load, disk usage, etc.).
Simple plug-in design that allows administrators to develop further service checks.
Support for redundant Nagios servers.
Install Nagios either with zypper or using YaST.
For further information on how to install packages see:
Раздел “Using Zypper” (Глава 9, Managing Software with Command Line Tools, ↑Вступление)
Раздел “Installing and Removing Packages or Patterns” (Глава 5, Installing or Removing Software, ↑Вступление)
Both methods install the packages
nagios
and
nagios-www
. The later RPM
package contains a Web interface for Nagios which allows, for example, to
view the service status and the problem history. However, this is not
absolutely necessary.
Nagios is modular designed and, thus, uses external check plug-ins to
verify whether a service is available or not. It is recommended to
install the nagios-plugin
RPM package that
contains ready-made check plug-ins. However, it is also possible to write
your own, custom check plug-ins.
Nagios organizes the configuration files as follows:
/etc/nagios/nagios.cfg
Main configuration file of Nagios containing a number of directives which define how Nagios operates. See http://nagios.sourceforge.net/docs/3_0/configmain.html for a complete documentation.
/etc/nagios/resource.cfg
Containing path to all Nagios plug-ins (default:
/usr/lib/nagios/plugins
).
/etc/nagios/command.cfg
Defining the programs to be used to determine the availability of services or the commands which are used to send e-mail notifications.
/etc/nagios/cgi.cfg
Contains options regarding the Nagios Web interface.
/etc/nagios/objects/
A directory containing object definition files. See Section 3.3.1, “Object Definition Files” for a more complete documentation.
In addition to those configuration files Nagios comes with very flexible and highly customizable configuration files called Object Definition configuration files. Those configuration files are very important since they define the following objects:
Hosts
Services
Contacts
The flexibility lies in the fact that objects are easily enhanceable. Imagine you are responsible for a host with only one service running. However, you want to install another service on the same host machine and you want to monitor that service as well. It is possible to add another service object and assign it to the host object without huge efforts.
Right after the installation, Nagios offers default templates for object
definition configuration files. They can be found at
/etc/nagios/objects
. In the following see a
description on how hosts, services and contacts are added:
Example 3.1. A Host Object Definition
define host { name SRV1 host_name SRV1 address 192.168.0.1 use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 notification_period workhours notification_interval 120 notification_options d,u,r }
The host_name
option defines a name to identify
the host that has to be monitored. address
is
the IP address of this host. The use
statement
tells Nagios to inherit other configuration values from the generic-host
template. check_period
defines whether the
machine has to be monitored 24x7.
check_interval
makes Nagios checking the
service every 5 minutes and retry_interval
tells Nagios to schedule host check retries at 1 minute intervals.
Nagios tries to execute the checks multiple times when they do not pass.
You can define how many attempts Nagios should do with the
max_check_attempts
directive. All configuration
flags beginning with notification
handle how
Nagios should behave when a failure of a monitored service occurs. In
the host definition above, Nagios notifies the administrators only on
working hours. However, this can be adjusted with
notification_period
. According to
notification_interval
notifications will be
resend every two hours. notification_options
contains four different flags: d, u, r
and
n
. They control in which state Nagios should
notify the administrator. d
stands for a
down
state, u
for
unreachable
and r
for
recoveries
. n
does not send
any notifications anymore.
Example 3.2. A Service Object Definition
define service { use generic-service host_name SRV1 service_description PING contact_groups router-admins check_command check_ping!100.0,20%!500.0,60% }
The first configuration directive use
tells
Nagios to inherit from the generic-service
template. host_name
is the name that assigns
the service to the host object. The host itself is defined in the host
object definition. A description can be set with
service_description
. In the example above the
description is just PING
. Within the
contact_groups
option it is possible to refer
to a group of people who will be contacted on a failure of the service.
This group and its members are later defined in a contact group object
definition. check_command
sets the program that
checks whether the service is available, or not.
Example 3.3. A Contact and Contactgroup Definition
define contact { contact_name admins use generic-contact alias Nagios Admin email nagios@localhost } define contactgroup { contactgroup_name router-admins alias Administrators members admins }
The example listing above shows the direct
contact
definition and its proper
contactgroup
. The
contact
definition contains the e-mail address
and the name of the person who is contacted on a failure of a service.
Usually this is the responsible administrator.
use
inherits configuration values from the
generic-contact definition.
An overview of all Nagios objects and further information about them can be found at: http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html.
Learn step-by-step how to configure Nagios to monitor different things like remote services or remote host-resources.
This section explains how to monitor remote services with Nagios. Proceed as follows to monitor a remote service:
Procedure 3.1. Monitoring a Remote HTTP Service with Nagios
Create a directory inside /etc/nagios/objects
using mkdir
. You can use any desired name for
it.
Open /etc/nagios/nagios.conf
and set
cfg_dir
(configuration directory) to the
directory you have created in the first step.
Change to the configuration directory created in the first step and
create the following files: hosts.cfg
,
services.cfg
and
contacts.cfg
Insert a host object in hosts.cfg
:
define host { name host.name.com host_name host.name.com address 192.168.0.1 use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 contact_groups admins notification_interval 60 notification_options d,u,r }
Insert a service object in services.cfg
:
define service { use generic-service host_name host.name.com service_description HTTP contact_groups router-admins check_command check_http }
Insert a contact and contactgroup object in
contacts.cfg
:
define contact { contact_name max-mustermann use generic-contact alias Webserver Administrator email mmustermann@localhost } define contactgroup { contactgroup_name admins alias Administrators members max-mustermann }
Execute rcnagios restart to (re)start Nagios.
Execute cat /var/log/nagios/nagios.log and verify whether the following content appears:
[1242115343] Nagios 3.0.6 starting... (PID=10915) [1242115343] Local time is Tue May 12 10:02:23 CEST 2009 [1242115343] LOG VERSION: 2.0 [1242115343] Finished daemonizing... (New PID=10916)
If you need to monitor a different remote service, it is possible to
adjust check_command
in step
Step 5. A full list of all available check
programs can be obtained by executing ls
/usr/lib/nagios/plugins/check_*
See Section 3.5, “Troubleshooting” if an error occurred.
This section explains how to monitor remote host resources with Nagios.
Proceed as follows on the Nagios server:
Procedure 3.2. Monitoring a Remote Host Resource with Nagios (Server)
Install nagios-nsca
(for
example, zypper in nagios-nsca).
Set the following options in
/etc/nagios/nagios.cfg
:
check_external_commands=1 accept_passive_service_checks=1 accept_passive_host_checks=1 command_file=/var/spool/nagios/nagios.cmd
Set the command_file
option in
/etc/nagios/nsca.conf
to the same file defined in
/etc/nagios/nagios.conf
.
Add another host and service object:
define host { name foobar host_name foobar address 10.10.4.234 use generic-host check_period 24x7 check_interval 0 retry_interval 1 max_check_attempts 1 active_checks_enabled 0 passive_checks_enabled 1 contact_groups router-admins notification_interval 60 notification_options d,u,r }
define service { use generic-service host_name foobar service_description diskcheck active_checks_enabled 0 passive_checks_enabled 1 contact_groups router-admins check_command check_ping }
Execute rcnagios restart and rcnsca restart.
Proceed as follows on the client you want to monitor:
Procedure 3.3. Monitoring a Remote Host Resource with Nagios (client)
Install nagios-nsca-client
on the host you want to monitor.
Write your test scripts (for example a script that checks the disk usage) like this:
#!/bin/bash NAGIOS_SERVER=10.10.4.166 THIS_HOST=foobar # # Write own test algorithm here # # Execute On SUCCESS: echo "$THIS_HOST;diskcheck;0;OK: test ok" \ | send_nsca -H $NAGIOS_SERVER -p 5667 -c /etc/nagios/send_nsca.cfg -d ";" # Execute On Warning: echo "$THIS_HOST;diskcheck;1;Warning: test warning" \ | send_nsca -H $NAGIOS_SERVER -p 5667 -c /etc/nagios/send_nsca.cfg -d ";" # Execute On FAILURE: echo "$THIS_HOST;diskcheck;2;CRITICAL: test critical" \ | send_nsca -H $NAGIOS_SERVER -p 5667 -c /etc/nagios/send_nsca.cfg -d ";"
Insert a new cron entry with crontab -e. A typical cron entry could look like this:
*/5 * * * * /directory/to/check/program/check_diskusage
Error: ABC 'XYZ' specified in ... '...' is not defined anywhere!
Make sure that you have defined all necessary objects correctly. Be careful with the spelling.
(Return code of 127 is out of bounds - plugin may be missing)
Make sure that you have installed
nagios-plugins.
Make sure that you have installed and configured a mail server like
postfix
or exim
correctly. You can verify if your mail server works with echo
"Mail Server Test!" | mail foo@bar.com which sends an e-mail
to foo@bar.com. If this e-mail arrives, your mail server is working
correctly. Otherwise, check the log files of the mail server.
http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html