server_monitoring
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
server_monitoring [2009/11/17 11:08] – 172.26.0.166 | server_monitoring [2009/11/27 12:10] – alan | ||
---|---|---|---|
Line 8: | Line 8: | ||
[[http:// | [[http:// | ||
- | You can see the ganglia installation here: http:// | + | You can see the ganglia installation here: http:// |
- | + | ||
- | For some reason sometimes the graphs do not draw. If you reboot the cluster the graphs work for some days and then seem to stop drawing (the graphs appear blank). | + | |
{{: | {{: | ||
Line 16: | Line 14: | ||
Interesting documentation: | Interesting documentation: | ||
- | ==== Notes ==== | + | ==== Troubleshooting |
+ | From time to time there are problems with Ganglia' | ||
- | It appears as if other CGIAR clusters have been configured to query our '' | + | - Stop data collection daemon on HPC: '' |
- | < | + | - Stop monitoring daemon on HPC: '' |
- | # Gmond config file for Cluster Cluster. | + | - Stop monitoring daemon |
- | # Generated by ganglia.xml node without aid from the database. | + | - Start data collection daemon on HPC: '' |
- | # | + | - Star monitoring daemon on HPC: '' |
- | name " | + | - Start monitoring daemon on compute nodes: '' |
- | owner " | + | |
- | url "http:// | + | |
- | latlong " | + | |
- | mcast_channel " | + | |
- | + | Now go check the Ganglia web interface and see if the nodes have returned. | |
- | # | + | |
- | # Increase size of gmond user (gmetric) hash table. | + | |
- | # | + | |
- | num_custom_metrics 2048 | + | |
- | + | ||
- | # Uncomment | + | |
- | trusted_hosts 220.227.242.214 # hpc.icrisat.cgiar.org | + | |
- | trusted_hosts 202.123.56.187 | + | |
- | trusted_hosts 216.244.151.133 # ? | + | |
- | trusted_hosts 200.62.229.37 | + | |
- | + | ||
- | # Listen only on the private cluster | + | |
- | mcast_if eth0</ | + | |
- | + | ||
- | ==== Connections refused ==== | + | |
- | I kept seeing this error in ''/ | + | |
- | < | + | |
- | Apparently that is the Potato Center' | + | |
===== Monit ===== | ===== Monit ===== | ||
Line 67: | Line 44: | ||
===== Nagios ===== | ===== Nagios ===== | ||
+ | |||
+ | Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes. http:// | ||
+ | |||
+ | === Installation === | ||
---- | ---- | ||
+ | |||
+ | Download the latest version of nagios while hot, from http:// | ||
+ | < | ||
+ | $ cd nagios-3.2.0 | ||
+ | $ ./configure | ||
+ | $ make all | ||
+ | $ useradd nagios | ||
+ | $ make install | ||
+ | $ make install-init | ||
+ | $ make install-commandmode | ||
+ | $ make install-config | ||
+ | $ make install-webconf | ||
+ | </ | ||
+ | === Configuration === | ||
+ | |||
+ | ---- | ||
+ | Running the following command will create a new file called htpasswd.users in the / | ||
+ | < | ||
+ | |||
+ | Download and install plugins | ||
+ | < | ||
+ | $ wget http:// | ||
+ | $ tar xfz nagios-plugins-1.4.14.tar.gz | ||
+ | $ cd nagios-plugins-1.4.14 | ||
+ | $ ./configure && make && make install | ||
+ | </ | ||
+ | Edit the configuration files to add host and services to be monitored: | ||
+ | < | ||
+ | |||
+ | Check remote services http:// | ||
+ | === Accessing Nagios === | ||
+ | |||
+ | ---- | ||
+ | http:// | ||
+ | |||
+ | with username = " | ||
+ | |||