server_monitoring
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
server_monitoring [2010/06/16 09:58] – 172.26.0.166 | server_monitoring [2010/06/21 22:41] – 172.26.14.218 | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Server Monitoring ===== | ===== Server Monitoring ===== | ||
- | * [[#ganglia|Ganglia]] - Monitors cluster CPU, disk, network usage | + | * [[server_monitoring: |
- | * [[#monit|Monit]] - Monitors specific services | + | * [[server_monitoring: |
- | * [[#nagios|Nagios]] - Monitors servers, | + | * [[server_monitoring: |
- | * [[#Zabbix|Zabbix]] -Monitor servers, | + | * [[server_monitoring: |
- | + | ||
- | + | ||
- | ===== Ganglia ===== | + | |
- | + | ||
- | [[http:// | + | |
- | + | ||
- | You can see the ganglia installation here: http:// | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | Interesting documentation: | + | |
- | ==== Troubleshooting ==== | + | |
- | From time to time there are problems with Ganglia' | + | |
- | + | ||
- | - Stop data collection daemon on HPC: '' | + | |
- | - Stop monitoring daemon on compute nodes: '' | + | |
- | - Start data collection daemon on HPC: '' | + | |
- | - Wait a minute or two | + | |
- | - Start monitoring daemon on compute nodes: '' | + | |
- | + | ||
- | Now go check the Ganglia web interface and see if the nodes have returned. | + | |
===== Monit ===== | ===== Monit ===== |