User Tools

Site Tools


server_monitoring

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
server_monitoring [2010/01/11 11:26] 172.26.0.166server_monitoring [2010/01/28 11:29] 172.26.0.166
Line 13: Line 13:
  
 Interesting documentation: http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia Interesting documentation: http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia
- 
 ==== Troubleshooting ==== ==== Troubleshooting ====
 From time to time there are problems with Ganglia's web interface.  You can restart the needed services following this basic procedure: From time to time there are problems with Ganglia's web interface.  You can restart the needed services following this basic procedure:
Line 20: Line 19:
   - Stop monitoring daemon on compute nodes: ''rocks run host compute %%'%%service gmond stop%%'%%''   - Stop monitoring daemon on compute nodes: ''rocks run host compute %%'%%service gmond stop%%'%%''
   - Start data collection daemon on HPC: ''service gmetad start''   - Start data collection daemon on HPC: ''service gmetad start''
 +  - Wait a minute or two
   - Start monitoring daemon on compute nodes: ''rocks run host compute %%'%%service gmond start%%'%%''   - Start monitoring daemon on compute nodes: ''rocks run host compute %%'%%service gmond start%%'%%''
  
Line 84: Line 84:
  
 with  username = "nagiosadmin" and password = "nagios" with  username = "nagiosadmin" and password = "nagios"
- 
-