The HPC's head node is connected to an APC battery backup unit which has a USB cable that can be used for monitoring power status. The APC UPS Daemon is configured on the server to monitor for power interuptions and notify the computer of the status. If power is off too long it can instruct the computer to shut down.
Download pre-compiled RPMs for Enterprise Linux 5 from the EPEL project: http://download.fedora.redhat.com/pub/epel/5/x86_64/repoview/apcupsd.html
Download source from: http://www.apcupsd.com/
tar zxf apcupsd-3.14.7.tar.gz cd apcupsd-3.14.7 ./configure --enable-usb --enable-cgi make sudo make install
After the software is installed, there are two files which need attention:
Make sure to list the the localhost and give it a name. This is used for remote monitoring and also for checking the status via the multimon CGI script:
# Network UPS Tools - hosts.conf # # This file does double duty - it lists the systems that multimon will # monitor, and also specifies the systems that upsstats is allowed to # watch. It keeps people from feeding random addresses to upsstats, # among other things. upsimage also uses this file to know who it # may speak to. upsfstats too. # # Usage: list systems running upsd that you want to monitor # # MONITOR <address> "<host description>" # # Please note, MONITOR must start in column 1 (no spaces permitted) # # Example: # MONITOR 10.64.1.1 "Finance department" # MONITOR 10.78.1.1 "Sierra High School data room #1" # MONITOR 127.0.0.1 "ILRI-HPC"
The defaults are ok, but make sure the daemon is configured to use the following settings (leave
DEVICE blank for auto detection):
UPSCABLE usb UPSTYPE usb DEVICE
apcupsd package comes with a web-based frontend to the daemon which monitors battery status and generates graphs. The files need to be placed in the filesystem where Apache has been configured to allow scripts to run.
Copy the CGI scripts:
$ cd /etc/apcupsd $ sudo cp multimon.cgi upsimage.cgi upsstats.cgi /var/www/cgi-bin/
Check the status by visiting http://hpc.ilri.cgiar.org/cgi-bin/multimon.cgi
Add the daemon to the default runlevels:
sudo chkconfig --level 2345 apcupsd on
Make sure the daemon is stopped, then run:
We have seen problems with the USB cable not being detected, but switching ports fixed the problem. If you see the
apcupsd service failing to start, check the log file to see if something is wrong:
Aug 17 15:08:36 hpc-ilri apcupsd: apcupsd startup succeeded Aug 17 15:08:46 hpc-ilri apcupsd: apcupsd FATAL ERROR in linux-usb.c at line 649 Cannot find UPS device -- For a link to detailed USB trouble shooting information, please see <http://www.apcupsd.com/support.html>. Aug 17 15:08:46 hpc-ilri apcupsd: apcupsd error shutdown completed