User Tools

Site Tools


server_monitoring

This is an old revision of the document!


Server Monitoring

  • Ganglia - Monitors cluster CPU, disk, network usage
  • Monit - Monitors specific services
  • Nagios - Monitors servers,hosts and services
  • Zabbix -Monitor servers,host and services

Ganglia

Ganglia is a system for measuring, recording, and graphing certain metrics about hosts in a cluster. Metrics include things like CPU load, network traffic, disk space, RAM utilization, etc. These metrics are saved and graphed periodically by RRDtool. Rocks automatically configures the head node to query the compute nodes and has a web-based interface where you can monitor the health of the cluster.

You can see the ganglia installation here: http://hpc.ilri.cgiar.org/ganglia

Interesting documentation: http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia

Troubleshooting

From time to time there are problems with Ganglia's web interface. You can restart the needed services following this basic procedure:

  1. Stop data collection daemon on HPC: service gmetad stop
  2. Stop monitoring daemon on compute nodes: rocks run host compute 'service gmond stop'
  3. Start data collection daemon on HPC: service gmetad start
  4. Wait a minute or two
  5. Start monitoring daemon on compute nodes: rocks run host compute 'service gmond start'

Now go check the Ganglia web interface and see if the nodes have returned.

Monit

Monit is a free open source utility for managing and monitoring, processes, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations. Monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses too much resources. it logs to syslog or to its own log file and notifies you about error conditions and recovery status via customizable alert. Monit provides a built-in HTTP(S) interface and you can use a browser to access the Monit server.

M/Monit expand upon Monit's capabilities to provide monitoring and management of all Monit enabled hosts from one easy to use web-interface. Status and events from each monitored system is updated in real-time and displayed in charts, graphs and tables.

Get the latest version at: http://mmonit.com/monit/download

$ wget http://mmonit.com/monit/dist/monit-5.0.3.tar.gz
$ tar xfz monit-5.0.3.tar.gz
$ cd monit-5.0.3
$ ./configure && make && make install

Accessing monit: http://hpc.ilri.cgiar.org:2812

Nagios

Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes. http://www.nagios.org/about

Installation


Download the latest version of nagios while hot, from http://www.nagios.org/download

$ wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz 
$ cd nagios-3.2.0
$ ./configure
$ make all 
$ useradd nagios
$ make install
$ make install-init
$ make install-commandmode
$ make install-config
$ make install-webconf

Configuration


Running the following command will create a new file called htpasswd.users in the /usr/local/nagios/etc directory. It will also create an username/password entry for nagiosadmin. You will be asked to provide a password that will be used when nagiosadmin authenticates to the web server.

htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin 

Download and install plugins

$ wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
$ tar xfz nagios-plugins-1.4.14.tar.gz
$ cd nagios-plugins-1.4.14
$ ./configure && make && make install 

Edit the configuration files to add host and services to be monitored:

vim /usr/local/nagios/etc/objects/localhost.cfg 

Check remote services http://wiki.nagios.org/index.php/Howtos:checkbyssh_RedHat

Accessing Nagios


http://172.26.0.205:4020/nagios/

with username = "nagiosadmin" and password = "nagios"

Zabbix


Installation:

RHEL-compatible Linux:

sudo echo '[andrewfarley]
name=Andrew Farley RPM Repository
baseurl=http://repo.andrewfarley.com/centos/$releasever/$basearch/
enabled=1
gpgcheck=0' > /etc/yum.repos.d/andrewfarley.com.repo

And then you can install zabbix agent, zabbix server, zabbix get, or zabbix proxy with…

    sudo yum install zabbix-agent
    sudo yum install zabbix-server
    sudo yum install zabbix-get
    sudo yum install zabbix-proxy 

If it fails to install, you might need to clean the metadata with the following command and try again…

  <code>sudo yum clean metadata</code>
server_monitoring.1276681050.txt.gz · Last modified: 2010/06/16 09:37 by 172.26.0.166