server_monitoring
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
server_monitoring [2009/11/27 08:38] – alan | server_monitoring [2010/06/21 22:47] – 172.26.14.218 | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Server Monitoring ===== | ===== Server Monitoring ===== | ||
- | * [[#ganglia|Ganglia]] - Monitors cluster CPU, disk, network usage | + | * [[server_monitoring: |
- | * [[#monit|Monit]] - Monitors specific services | + | * [[server_monitoring: |
- | * [[#nagios|Nagios]] - Monitors services services | + | * [[server_monitoring: |
+ | * [[server_monitoring: | ||
- | ===== Ganglia | + | ====== Zabbix ====== |
- | [[http://ganglia.info/ | + | Installation: |
- | You can see the ganglia installation here: http://hpc.ilri.cgiar.org/ganglia (if it's working) | + | RHEL-compatible Linux: Ref: http://andrewfarley.com/ |
+ | < | ||
+ | name=Andrew Farley RPM Repository | ||
+ | baseurl=http:// | ||
+ | enabled=1 | ||
+ | gpgcheck=0' | ||
- | For some reason sometimes the graphs do not draw. If you reboot the cluster the graphs work for some days and then seem to stop drawing (the graphs appear blank). | + | And then you can install zabbix agent, zabbix server, zabbix get, or zabbix proxy with: |
+ | < | ||
+ | $ sudo yum install zabbix-server | ||
+ | $ sudo yum install zabbix-get | ||
+ | $ sudo yum install zabbix-proxy< | ||
- | {{:ganglia_diagram_smaller.gif|}} | + | If it fails to install, you might need to clean the metadata with the following command and try again: |
+ | < | ||
- | Interesting documentation: | ||
- | ==== Troubleshooting ==== | + | Debian-Based Linux: |
- | When Ganglia has problems displaying nodes it may need to be restarted. | + | < |
- | * Stop data collection daemon on HPC: '' | + | zabbix-agent - network |
- | * Stop monitoring daemon on HPC: '' | + | zabbix-frontend-php - network monitoring solution |
- | * Stop monitoring daemon on compute nodes: '' | + | zabbix-proxy-mysql - network monitoring solution |
- | * Start data collection daemon on HPC: '' | + | zabbix-proxy-pgsql - network monitoring solution - proxy (using PostgreSQL) |
- | * Star monitoring daemon on HPC: '' | + | zabbix-server-mysql - network monitoring solution |
- | * Start monitoring daemon on compute nodes: '' | + | zabbix-server-pgsql - network monitoring solution - server (using PostgreSQL) |
- | + | # aptitude | |
- | Now go check the Ganglia web interface and see if the nodes have returned. | + | |
- | + | ||
- | ===== Monit ===== | + | |
- | + | ||
- | Monit is a free open source utility for managing and monitoring, processes, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations. | + | |
- | Monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses too much resources. it logs to syslog or to its own log file and notifies you about error conditions and recovery status via customizable alert. | + | |
- | Monit provides a built-in HTTP(S) interface and you can use a browser to access the Monit server. | + | |
- | + | ||
- | M/Monit expand upon Monit' | + | |
- | + | ||
- | Get the latest version at: http:// | + | |
- | + | ||
- | < | + | |
- | $ tar xfz monit-5.0.3.tar.gz | + | |
- | $ cd monit-5.0.3 | + | |
- | $ ./configure && make && make install</ | + | |
- | Accessing monit: | + | |
- | http:// | + | |
- | + | ||
- | ===== Nagios ===== | + | |
- | + | ||
- | Nagios is a powerful | + | |
- | + | ||
- | === Installation === | + | |
- | + | ||
- | ---- | + | |
- | + | ||
- | Download the latest version of nagios while hot, from http:// | + | |
- | < | + | |
- | $ cd nagios-3.2.0 | + | |
- | $ ./ | + | |
- | $ make all | + | |
- | $ useradd nagios | + | |
- | $ make install | + | |
- | $ make install-init | + | |
- | $ make install-commandmode | + | |
- | $ make install-config | + | |
- | $ make install-webconf | + | |
- | </ | + | |
- | === Configuration === | + | |
- | + | ||
- | ---- | + | |
- | Running the following command will create a new file called htpasswd.users in the / | + | |
- | < | + | |
- | + | ||
- | Download and install | + | |
- | < | + | |
- | $ wget http:// | + | |
- | $ tar xfz nagios-plugins-1.4.14.tar.gz | + | |
- | $ cd nagios-plugins-1.4.14 | + | |
- | $ ./configure && make && make install | + | |
- | </file> | + | |
- | Edit the configuration files to add host and services to be monitored: | + | |
- | <code>vim / | + | |
- | + | ||
- | Check remote services http:// | + | |
- | === Accessing Nagios === | + | |
- | + | ||
- | ---- | + | |
- | http:// | + | |
- | + | ||
- | with username = " | + | |
+ | ====== Accessing Zabbix ====== | ||
+ | http:// | ||
+ | username: Admin | ||
+ | password: zabbix |