Thursday, December 04, 2008  
Google
Web pcquest.com

CIOL Network sites

Search by Issue | CD Search | Sitemap | Advanced Search

"Ad:Discover Green Intelligence, make your business strong"
   
 Home > Network

Monitor And Alert For Anomalies

Monitor your network to know what is going wrong with it, where and when so that you can fix the problem before it gets out of control 

Wednesday, April 07, 2004

You cannot stop problems in a network from occurring, but what you can do is detect these problems whenever they occur (and sometimes before they occur) and take steps to resolve them. You could do so in two ways. One way is to set up monitoring software that will periodically poll the hosts and services for abnormal behavior, such as the host not being up or not providing the necessary services correctly. In case of problems the software can send alerts. We will talk about how you can use this method in this article. 

In the other way, the monitoring host does not actively check for normalcy. Instead, it waits for other hosts, such as switches or routers, to get back to it in case of a problem before notifying you. The protocol that the host uses in this case is SNMP (Simple Network Management Protocol). We talk about this method in detail in the following article. 

Because of the type of problems the two methods identify, they are not exclusive. You may need to use them simultaneously. 

Let’s talk about the first method. To monitor our network, we used Nagios, an open-source network-monitoring software, which you can get from this month’s PCQXtreme CD. Nagios works in Linux, but can be used to monitor any machine or service be it on Windows, Linux, UNIX or any other environment.

As a first step, you need to install and do the basic configuration in Nagios.

INSTALLING NAGIOS 
Nagios can be installed on a machine with PCQlinux 2004 server installed. First extract the setup files from the Nagios tarball.

#tar -zxvf nagios-1.2.tar.gz

This will create a directory named nagios-1.2 in your current directory. This will be your Nagios setup directory. Next create the installation directory where the Nagios binary and configuration files will be stored.

#mkdir /usr/local/nagios

Now add a user to the system, which will be used by the Nagios process to execute.

#adduser nagios

Go to the Nagios setup directory

#cd nagios-1.2

and run the configuration script:

#./configure —prefix=/usr/local/nagios —with-cgiurl=/nagios/cgi-bin —with-htmurl=/nagios/ —with-nagios-user=nagios —with-nagios-grp=nagios

Watch out for How to fix it
The e-mail server being down, if you have configured to send alerts through e-mail. Problem: You will not receive alerts  Configure to send alerts using SMS as well by connecting a GSM phone to your monitoring console
The monitoring console and other hosts being down at the same time. 
Problem: You will not receive any alerts 
Create a fault-tolerant setup with one standby monitoring machine that will take over if the primary machine fails
DNS being down if you have specified the FQDN, instead of IP addresses, of remote machines. Problem: DNS names will not be resolved to IP addresses Use IP addresses while defining critical hosts to be monitored. Hosts should have fixed IP addresses instead of having them assigned from a DHCP server

This will configure the Nagios setup before you start up the compilation process to build the Nagios binaries.
Now compile the Nagios binaries.

#make all

and install the binaries and HTML files to the installation directory.

#make install

Install the init script, which will be used to start Nagios at boot time.

#make install-init

CIM protocol
The CIM (Common Information Model) is a DMTF (Distributed Management Task Force) backed initiative that aims to standardize the message formats used to describe management data across vendors as well as different ‘systems, applications, networks and services’. Using XML as its backbone, it allows vendors to extend the standard as per the requirement, as long as the extensions conform to broad guidelines. These guidelines are defined in the CIM schema, which details the complete data model of the specification. 
The other part of the standard is the CIM specification, which contains guidelines for integration with other data models. CIM has played a very important part in accelerating the growth and adoption of various network management implementations available today, as it has given a common ground to various vendors and made interoperability between various standards possible.

The script is stored as the file /etc/rc.d/init.d/nagios
Create and configure permissions on the directory for holding the external command file.

#make install-commandmode

Now install the SAMPLE configuration files. These files will work as a starting point when you configure Nagios to monitor hosts and services on the network. 

#make install-config

These files will be stored in the /usr/local/nagios/etc directory and you’ll have to change their default extension from *.cfg-sample to *.cfg.

This is the initial base install of Nagios. The core of Nagios is a collection of Nagios binary and few configuration files. For Nagios to do anything useful it relies on plugins. Plugins are external scripts or executable programs, which the Nagios process uses to monitor the status of various hosts and services. Let’s see how to install plugins on the monitoring host to make Nagios functional.

To install the Nagios plugins you have to first get the plugin RPMs either from http://sourceforge.net/ projects/nagiosplug/ or from the PCQuest CD and install them.

#rpm -ivh nagios-plugins-1.3.1-1.9.i386.rpm

Nagios expects the plugins to be placed in the directory /usr/local/nagios/libexec but the RPM installs the plugins in the directory /usr/lib/nagios/plugins. So, make a directory

#mkdir /usr/local/nagios/libexec

Move the plugins to that directory

#mv /usr/lib/nagios/plugins/* /usr/local/nagios/libexec

Lastly, create two symlinks for openssl files.

#ln -s /lib/libcrypto.so.0.9.6b /lib/libcrypto.so.4
#ln -s /lib/libssl.so.0.9.6b /lib/libssl.so.4

IPMI Protocol
The IPMI (Intelligent Platform Management Interface) specification offers different vendors a standard way of monitoring all the mechanical components of a computer. This spares each vendor the need of having it’s own monitoring technique and, hence, virtually no interoperability.
At the heart of any IPMI setup is a small, dedicated processor on the motherboard called the BMC (Baseboard Management Controller) that is used to monitor the hardware. It can communicate with the main processor and other hardware elements, collecting their temperature and voltages in addition to checking to see if the fans and the power supplies are working or not. Since a separate processor is handling all this, there is little performance impact on the system and it can continue to function even when the main processor goes down. Also supported is logging of all the data collected and raising alerts under specified conditions. Since all participating vendors use the same standard, it is possible to have cross vendor interoperability and management.
IPMI v2.0 adds many new capabilities to the standard. It offers enhanced security thanks to better authentication methods. Encryption is also incorporated. Also available is SOL (Serial Over LAN) that allows the serial controllers to be managed remotely, over the LAN. Support for VLANs has also been incorporated, thus preventing sensitive data from flowing all over the network, but localized to the ‘management’ VLAN only.

This will install most common plugins to check for the status of hosts, TCP services, local users, local swap space usage, etc. You can get more plugins from the Nagios plugin page at http://sourceforge.net/projects/nagiosplug/. You can also create your plugins and use it with Nagios.

After the plugins and basic setup, it is time to configure the web interface for Nagios. 

The web interface provides you with a quick snapshot of the status of all monitored hosts and services. With the web interface you can also generate reports on the service, host availability trends and many other things. The Web interface includes a set of static HTML files and a few CGIs that provide you with dynamic content.

Setting up the Web interface
Open up the Apache configuration file /etc/httpd/conf/httpd.conf and append the following lines to it:

ScriptAlias /nagios/cgi-bin/
/usr/local/nagios/sbin/

AllowOverride AuthConfig

Options ExecCGI

Order allow,deny

Allow from all

Alias /nagios/ /usr/local/nagios/share/

Options None

AllowOverride AuthConfig

Order allow,deny

Allow from all

Also add the following lines to provide authentication to the Web interface.

AllowOverride AuthConfig

order allow,deny

allow from all

Options ExecCGI

AllowOverride AuthConfig

order allow,deny

allow from all

Now create a file named .htaccess in both /usr/local/nagios/share and 

/usr/local/nagios/sbin

vi /usr/local/nagios/share/.htaccess

vi /usr/local/nagios/sbin/.htaccess

and add the following lines to the files.

AuthName “Nagios Access”

AuthType Basic

AuthUserFile /usr/local/nagios/etc/htpasswd.users require valid-user

After this you will have to create the users who can assess the Nagios Web interface. This is done by using the htpasswd command supplied with Apache. 

#htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin 

Enter the password on the screen and confirm it.

This will create a username (nagiosadmin) and password that has to be used to access the Nagios Web interface. You can add more users by running the above command and supplying a different username.

The command creates a file named htpasswd.users in the /usr/local/nagios/etc directory and it stores usernames and encrypted passwords which are then matched with the username and password supplied by the user, when accessing the Nagios interface. For things to work this way the system account apache should have read access to this file, as the apache Web server process works under this account. For that run the following command:

#chmod o+r /usr/local/nagios/etc/htpasswd.users 

After making all these changes to your system restart the apache Web server.

#/etc/rc.d/init.d/httpd restart

Now the Nagios binaries are set, plugins installed and the web interface also configured, it is time to configure the way Nagios and the Web interface will work. 

Configuring Nagios and the Web interface
The files used for configuring Nagios and the Web interface are Main configuration file and GI configuration file.

The main configuration file is /usr/local/nagios/etc/nagios.cfg and it contains a number of directives that affect how Nagios operates. This config file is read by both the Nagios process and the CGIs. This is the first configuration file that should be modified. The default file is appropriate for most cases but can be modified as per your requirement. One change that we suggest is to enable external commands by changing the value of check_external_commands option from 0 to 1 in the file.

This file/usr/local/nagios/etc

/cgi.cfg determines how the various Nagios CGIs will work, which are used when the Nagios Web interface is accessed. Like the main configuration file the default values are suitable for most requirements but you may want to have the following changes in it.

authorized_for_system_information=nagiosadmin

authorized_for_configuration_information=nagiosadmin

authorized_for_all_services=nagiosadmin

authorized_for_all_hosts=nagiosadmin

authorized_for_all_service_commands=nagiosadmin

authorized_for_all_host_commands=nagiosadmin

DEFINING HOSTS, SERVICES AND CONTACTS
Object configuration files are used to define hosts, services and hostgroups which Nagios will monitor and contacts, contactgroups, plugin commands, etc. This is where you define what things you want monitor and how you want to monitor them and whom to send notifications. The various object configuration files are hosts.cfg, services.cfg, hostgroups.cfg, contacts.cfg, contactgroups.cfg, checkcommands.cfg, misccommands.cfg, timeperiods.cfg, escalations.cfg, dependencies.cfg all found in the /usr/local/nagios/etc directory. Let’s see how to configure each of these files.We will configure two hosts, a Windows 2003 machine and a PCQLinux 8 machine. 

Open the hosts.cfg file. By default it contains several host definitions, which you can comment out if not required. The file contains a generic host definition template which should be used in the configuration so do not comment it out. Add the following definitions.

define host{

use generic-host host_name windows2k3 alias Win Server #1 address 192.168.3.11 check_command check-host-alive max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
define host{
use generic-host
host_name pcqlinux8
alias Linux Server #1 address 192.168.3.13
check_command check-host-alive
max_check_attempts 10
notification_interval 480
notification_period 24x7
notification_options d,u,r
}

The definitions are very simple, you have to define the host name, the template to use, the IP address of the host, the number of check commands before a notification is sent out and the time interval in minutes for re-notification. The notification options “d,u,r” define for what states the notifications are sent out, which are down, unreachable, recovered respectively. The check command option tells which of the plugins to be used to check the status of the host.

A host group definition is used to group one or more hosts together for the purposes of simplifying notifications. Each host that you define must be a member of at least one host group, even if it is the only host in that group. Hosts can be in more than one host group. So add the following lines to the hostgroups.cfg file.

define hostgroup{

hostgroup_name windows-servers
alias Windows Servers
contact_groups nt-admins
members windows2k3
}
define hostgroup{

hostgroup_name linux-servers
alias Linux Servers
contact_groups linux-admins
members pcqlinux8
}

The definitions of hostgroups are self explanatory. Contact groups define which contact groups will be notified of the status of hosts in that particular hostgroup.

A service definition is used to identify a service that runs on a host, which you’ll want to monitor from Nagios. The term service, as used here, can mean an actual service that runs on the host (POP, SMTP, HTTP, etc.) or some other type of metric associated with the host (response to a ping, number of logged in users, free disk space, etc.). Open the file services.cfg, and comment out the various service definition in it except for the generic service template. Now put definitions in this file for all services you want to monitor. Below are few example definitions for our two hosts.

define service{

use generic-service host_name windows2k3 service_description PING
is_volatile 0
check_period 24x7 max_check_attempts 3
normal_check_interval 5 retry_check_interval 1
contact_groups nt-admin
notification_interval 120
notification_period 24x7
notification_options c,r
check_command check_p-
ing!100.0,20%!500.0,60%
}

define service{

use generic-
service
host_name pcqlinux
service_description HTTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups li-admin
notification_interval 240
notification_period 24x7
notification_options w,u,c,r
check_command check_http
}

These definitions are also similar to the host definitions with few differences. The normal check interval option controls the number of minutes before the service is checked for status when the last status was OK. The retry check interval option specifies the minutes for rechecking the service when the last service status was non-OK. The notification w,u,c,r stand for warning, unknown, critical and recovered respectively. Similar to the above definitions, more definitions can be added to the file. Use the commands stored in the plugin directory for the check command option to monitor other services.

A contact definition is used to identify someone who should be contacted in the event of a problem on your network.

define contact{

contact_name nagiosad alias Admin
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify- by-email, notify- by-epager
host_notification_commands host-no tify-by- email, host-no tify-by- epager
email admin@
cmil.com
pager 98xxxxxx
}

The e-mail option defines the e-mail address where notifications for that contact are to be sent. The pager option can be used to send SMS alerts (read about it in the following story).

A contact group definition is used to group one or more contacts together for the purpose of sending out alert/recovery notifications. When a host or service has a problem or recovers, Nagios will find the appropriate contact groups to send notifications to, and notify all contacts in those contact groups.

To define contact groups: 

define contactgroup{

contactgroup_name nt-admins
alias Administrators
members nagiosad }
define contactgroup{
contactgroup_name linux-admins
alias Linux Admin members nagiosad
}

Unless there is a compelling need you do not need to make changes to other files, as the default settings will work fine. But if you do want to change them, then the files themselves contain detailed information about the various options.

RUNNING NAGIOS
Now after configuring your files, you are all set to start the Nagios process, but before doing that open the file dependencies.cfg and comment out all lines in it. Dependencies are an advanced feature of Nagios that allow you to suppress notifications for hosts based on the status of one or more other hosts. But you don’t need that at this moment so comment out the lines in it or it will prevent Nagios from running.

To start the Nagios process issue the following command.

#service nagios start

to make nagios start automatically at system boot issue the command.

#chkconfig nagios on

Now the Nagios process is up and running. When any host or service defined to be monitored by Nagios goes down or a service doesn’t work properly you will be notified by e-mail and/or by SMS if you have configured Nagios to send SMS alerts as well. 

In case you want to know the status of your monitored hosts and services at any particular point in time then you can use the web interface of Nagios. The interface can also be accessed by a WAP phone.

To access Nagios’ Web interface open and access the url http:///nagios/. Replace with the IP address of the machine running nagios. You will be asked for a username and password,  provide the details and you  will log on to the nagios interface. From here you can view the details about the hosts and services defined in the configuration files. Generate reports to look for trends in host and service availability, look at the configuration  files to see everything is defined properly or not, etc.

Next Page :

IBM TIVOLI, HP OPENVIEW, CAUNICENTER: THE BIG DADDIES

Page(s)   1  2  

I am interested in more information about this product
I am interested in buying this product


Untitled 1


Does your business have Green Intelligence


Before you press ctrl+p, get innovative


Conferencing: Merge time zones


   
 


 
 

Magazine Subscription | RQS | Contact Us | Team PCQuest