User Tools

Site Tools


server_monitoring:ganglia

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

server_monitoring:ganglia [2010/06/21 22:41] – created 172.26.14.218server_monitoring:ganglia [2013/06/18 11:27] (current) – removed aorth
Line 1: Line 1:
-====== Ganglia ====== 
  
-[[http://ganglia.info/|Ganglia]] is a system for measuring, recording, and graphing certain metrics about hosts in a cluster.  Metrics include things like CPU load, network traffic, disk space, RAM utilization, etc.  These metrics are saved and graphed periodically by [[http://oss.oetiker.ch/rrdtool|RRDtool]]. Rocks automatically configures the head node to query the compute nodes and has a web-based interface where you can monitor the health of the cluster. 
- 
-You can see the ganglia installation here:  http://hpc.ilri.cgiar.org/ganglia 
- 
-{{:ganglia_diagram_smaller.gif|}} 
- 
-Interesting documentation: http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia 
-===== Troubleshooting ===== 
-From time to time there are problems with Ganglia's web interface.  You can restart the needed services following this basic procedure: 
- 
-  - Stop data collection daemon on HPC: ''service gmetad stop'' 
-  - Stop monitoring daemon on compute nodes: ''rocks run host compute %%'%%service gmond stop%%'%%'' 
-  - Start data collection daemon on HPC: ''service gmetad start'' 
-  - Wait a minute or two 
-  - Start monitoring daemon on compute nodes: ''rocks run host compute %%'%%service gmond start%%'%%'' 
- 
-Now go check the Ganglia web interface and see if the nodes have returned. 
server_monitoring/ganglia.1277160091.txt.gz · Last modified: 2010/06/21 22:41 by 172.26.14.218