server_monitoring
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
server_monitoring [2009/11/17 12:21] – 172.26.0.166 | server_monitoring [2024/01/16 09:21] (current) – removed aorth | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Server Monitoring ===== | ||
- | * [[# | ||
- | * [[# | ||
- | * [[# | ||
- | |||
- | ===== Ganglia ===== | ||
- | |||
- | [[http:// | ||
- | |||
- | You can see the ganglia installation here: http:// | ||
- | |||
- | For some reason sometimes the graphs do not draw. If you reboot the cluster the graphs work for some days and then seem to stop drawing (the graphs appear blank). | ||
- | |||
- | {{: | ||
- | |||
- | Interesting documentation: | ||
- | |||
- | ==== Notes ==== | ||
- | |||
- | It appears as if other CGIAR clusters have been configured to query our '' | ||
- | < | ||
- | # Gmond config file for Cluster Cluster. | ||
- | # Generated by ganglia.xml node without aid from the database. | ||
- | # | ||
- | name " | ||
- | owner " | ||
- | url " | ||
- | latlong " | ||
- | mcast_channel " | ||
- | |||
- | |||
- | # | ||
- | # Increase size of gmond user (gmetric) hash table. | ||
- | # | ||
- | num_custom_metrics 2048 | ||
- | |||
- | # Uncomment the next line for monitoring by the Rocks Cluster Network. | ||
- | trusted_hosts 220.227.242.214 # hpc.icrisat.cgiar.org | ||
- | trusted_hosts 202.123.56.187 | ||
- | trusted_hosts 216.244.151.133 # ? | ||
- | trusted_hosts 200.62.229.37 | ||
- | |||
- | # Listen only on the private cluster interface. | ||
- | mcast_if eth0</ | ||
- | |||
- | ==== Connections refused ==== | ||
- | I kept seeing this error in ''/ | ||
- | < | ||
- | Apparently that is the Potato Center' | ||
- | |||
- | ===== Monit ===== | ||
- | |||
- | Monit is a free open source utility for managing and monitoring, processes, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations. | ||
- | Monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses too much resources. it logs to syslog or to its own log file and notifies you about error conditions and recovery status via customizable alert. | ||
- | Monit provides a built-in HTTP(S) interface and you can use a browser to access the Monit server. | ||
- | |||
- | M/Monit expand upon Monit' | ||
- | |||
- | Get the latest version at: http:// | ||
- | |||
- | < | ||
- | $ tar xfz monit-5.0.3.tar.gz | ||
- | $ cd monit-5.0.3 | ||
- | $ ./configure && make && make install</ | ||
- | Accessing monit: | ||
- | http:// | ||
- | |||
- | ===== Nagios ===== | ||
- | |||
- | Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes. http:// | ||
- | |||
- | === Installation === | ||
- | |||
- | ---- | ||
- | |||
- | Download the latest version of nagios while hot, from http:// | ||
- | < | ||
- | $ cd nagios-3.2.0 | ||
- | $ ./configure | ||
- | $ make all </ | ||
- | $ useradd nagios | ||
- | $ make install | ||
- | $ make install-init | ||
- | $ make install-commandmode | ||
- | $ make install-config | ||
- | $ make install-webconf | ||
- | |||
- | === Configuration === | ||
- | |||
- | ---- | ||