User Tools

Site Tools


server_monitoring

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
server_monitoring [2010/06/16 09:41] 172.26.0.166server_monitoring [2010/06/21 22:41] 172.26.14.218
Line 1: Line 1:
 ===== Server Monitoring ===== ===== Server Monitoring =====
-  * [[#ganglia|Ganglia]] - Monitors cluster CPU, disk, network usage +  * [[server_monitoring:ganglia|Ganglia]] - Monitors cluster CPU, disk, network usage 
-  * [[#monit|Monit]] - Monitors specific services +  * [[server_monitoring:monit|Monit]] - Monitors specific services 
-  * [[#nagios|Nagios]] - Monitors servers,hosts and services +  * [[server_monitoring:nagios|Nagios]] - Monitors servers,hosts and services 
-  * [[#Zabbix|Zabbix]] -Monitor servers,host and services +  * [[server_monitoring:zabbix|Zabbix]] -Monitor servers,host and services
- +
- +
-===== Ganglia ===== +
- +
-[[http://ganglia.info/|Ganglia]] is a system for measuring, recording, and graphing certain metrics about hosts in a cluster.  Metrics include things like CPU load, network traffic, disk space, RAM utilization, etc.  These metrics are saved and graphed periodically by [[http://oss.oetiker.ch/rrdtool|RRDtool]]. Rocks automatically configures the head node to query the compute nodes and has a web-based interface where you can monitor the health of the cluster. +
- +
-You can see the ganglia installation here:  http://hpc.ilri.cgiar.org/ganglia +
- +
-{{:ganglia_diagram_smaller.gif|}} +
- +
-Interesting documentation: http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia +
-==== Troubleshooting ==== +
-From time to time there are problems with Ganglia's web interface.  You can restart the needed services following this basic procedure: +
- +
-  - Stop data collection daemon on HPC: ''service gmetad stop'' +
-  - Stop monitoring daemon on compute nodes: ''rocks run host compute %%'%%service gmond stop%%'%%'' +
-  - Start data collection daemon on HPC: ''service gmetad start'' +
-  - Wait a minute or two +
-  - Start monitoring daemon on compute nodes: ''rocks run host compute %%'%%service gmond start%%'%%'' +
- +
-Now go check the Ganglia web interface and see if the nodes have returned.+
  
 ===== Monit ===== ===== Monit =====
Line 86: Line 65:
  
 with  username = "nagiosadmin" and password = "nagios" with  username = "nagiosadmin" and password = "nagios"
- 
 ==== Zabbix ==== ==== Zabbix ====
 ---- ----
-Installation:+Installation: Ref : http://www.zabbix.com/documentation/1.8/start
  
-RHEL-compatible Linux:+RHEL-compatible Linux: Ref: http://andrewfarley.com/sysadmin/rpm-repository-online
 <code>sudo echo '[andrewfarley] <code>sudo echo '[andrewfarley]
 name=Andrew Farley RPM Repository name=Andrew Farley RPM Repository
Line 113: Line 91:
  
 Debian-Based Linux: Debian-Based Linux:
 +---- 
 +<code> 
 +root@simple:~# apt-cache search zabbix  
 +zabbix-agent - network monitoring solution - agent 
 +zabbix-frontend-php - network monitoring solution - PHP front-end 
 +zabbix-proxy-mysql - network monitoring solution - proxy (using MySQL) 
 +zabbix-proxy-pgsql - network monitoring solution - proxy (using PostgreSQL) 
 +zabbix-server-mysql - network monitoring solution - server (using MySQL) 
 +zabbix-server-pgsql - network monitoring solution - server (using PostgreSQL) 
 +root@simple:~# aptitude install zabbix-proxy-mysql zabbix-agent zabbix-server-mysql zabbix-frontend-php 
 +</code>
  
 === Accessing Zabbix === === Accessing Zabbix ===