This is an old revision of the document!
Table of Contents
Tape backup
Tape backups are run manually once per week, on Friday afternoon. We have four cassettes, each of which can hold seven tapes. Our current tape backup needs are around ten tapes, so each pair has eleven tapes total just in case the size of the backups increases. Each week we rotate the set of cassettes so that we always have a week of archived data.
A full system backup includes:
/
← (OS)/mnt/export
(homes and biosoft applications)/mnt/export2
(segoli data is here)/mnt/export3
(videodata)
Example backup process
Insert tapes
Run Storix Backup
From an X11 window:
$ sudo sbadmin
- Utilities → Perform Tape Library Operations → Move Tapes in Library
- Move tape 1 → Drive 1
- Display → Clients, Servers & Media
- "Read Label From Media"
- "Expire/Remove"
- Actions → Run Backup Jobs
- "Run Now"
This takes about 30-35 hours depending on the load of the server and whether or not the robot is working properly.
Problems
- Sometimes tapes are hard to remove from the cassette (this causes the robot to jam sometimes)
- Even setting the virtual device to "sequential" doesn't work as desired (robot stops when a tape is full and waits for you to manually unload and load the next tape), so we use a "random tape library" instead
Monitoring the backup
The Storix Backup tool shows the current status of the backup but if you're not sitting at the machine there is no way to see. You can use a one-line shell script to loop periodically and check the status of the tape library. This essentially becomes a log of the progress. Output to somewhere web-readable, as web is accessible from outside ILRI:
# for num in `seq 1 1000`; do echo "Seq ${num}: $(mtx status)" >> /var/www/html/coffee.txt; sleep 1800; done
Log of backups
Date | Tape set | Notes |
---|---|---|
Oct 30, 2009 | A | Robot jammed on tape 7, backup did not complete |
Nov 6, 2009 | B | Completed successfully |
Nov 13, 2009 | A | Completed successfully |
Nov 20, 2009 | B | Backup completed successfully, Verify process failed at tape 4 |
Nov 27, 2009 | A | Completed successfully |
Dec 4, 2009 | B | Backup completed successfully, Verify process failed at tape 6 |
Dec 11, 2009 | A | Backup failed to start (appears to be a software problem, server might need a reboot) |
Dec 21, 2009 | A | Completed successfully |
Jan 8, 2010 | B | Completed successfully |
Jan 15, 2010 | A | Backup completed successfully, Verify process failed |
Jan 22, 2010 | B | Backup completed successfully, Verify stuck at 100%… |
Jan 29, 2010 | A | Backup complete successfully, Verify stuck at 8%… |
Feb 5, 2010 | B | Completed successfully |
Feb 12, 2010 | A | Completed successfully |
Feb 19, 2010 | B | Completed successfully |
March 12, 2010 | A | Completed successfully |
March 19, 2010 | B | Completed successfully |
April 1, 2010 | A | Completed successfully |
April 9, 2010 | B | Completed successfully |
April 16, 2010 | A | Completed successfully |
April 23, 2010 | A | Completed successfully |
April 30, 2010 | B | Completed successfully |
May 07, 2010 | A | Completed successfully |
May 21, 2010 | B | completed successfully |
June 4, 2010 | A | completed successfully |
June 9, 2010 | B | completed successfully |
June 18, 2010 | A | completed successfully |
June 25, 2010 | B | Completed successfully |
July 2, 2010 | A | Completed successfully |
July 9, 2010 | B | Completed successfully |
July 16, 2010 | A | Completed successfully |
July 23, 2010 | B | Completed successfully |
July 30, 2010 | A | Completed successfully |
August 6, 2010 | B | Completed successfully |
August 13, 2010 | A | … |
September 3, 2010 | A | Completed successfully, verify failed |
September 10, 2010 | B | Completed successfully, verify failed |
September 17, 2010 | A | HPC crashed during the previous night, backups couldn't run… will run them next week now that HPC is fixed |
September 24, 2010 | A | Completed successfully |
October 1, 2010 | B | Completed successfully |
October 8, 2010 | A | Completed successfully |
October 15, 2010 | B | Completed successfully |
October 22, 2010 | A | Completed successfully |
October 29, 2010 | … | Alan in Switzerland, Etienne in China |
November 5, 2010 | B | Apparently successful, but HPC crashed sometime during the weekend due to power fluctuations. Verify failed. |
November 12, 2010 | A | Completed successfully |
November 19, 2010 | B | Had a problem (can't remember why, power?) |
November 26, 2010 | B | Completed successfully |
December 3, 2010 | A | Completed successfully |
December 12, 2010 | B | Completed successfully |
December 17, 2010 | A | Completed successfully |
December 24, 2010 | B | Completed successfully |
December 31, 2010 | - | gone for holidays |
January 7, 2011 | A | Completed successfully |
January 14, 2011 | B | Backup failed, tape library has error 205: X Axis Error. Reset the library and it appears to be ok. |
January 21, 2011 | B | Completed successfully |
January 28, 2011 | A | failed, crashed because of a job Anne was running |
February 4, 2011 | - | gone for holidays |
February 11, 2011 | A | no backup because of work to server room air conditioning |
February 18, 2011 | A | Completed successfully |
February 25, 2011 | B | Completed successfully |
March 4, 2011 | A | failed |
March 11, 2011 | A | Completed successfully |
March 18, 2011 | B | Completed successfully |
March 25, 2011 | A | Completed successfully |
April 1, 2011 | B | Was running a restore for Anne so couldn't run backups |
April 8, 2011 | A | Completed successfully |
April 15, 2011 | B | Completed successfully |
April 21, 2011 | A | Completed successfully |
April 29, 2011 | B | Completed successfully |
May 6, 2011 | A | Completed successfully |
May 13, 2011 | B | … |
Storix Backup Administrator
We are using an Exabyte Tape library for backups and the commercial Storix Backup Administrator software http://www.storix.com/.
Version:
$ cat /opt/storix/instconfig/version 6.3.4.4
Storix System Backup Administrator: /home/villierse/software/storix
Graphicaluser interface: sbadmin
The Exabyte device has one tape "drive" and a library of tapes. It can hold three cassettes, each cassette can hold 7 tapes. The robotic arm moves the tapes from the cassettes to the tape drive where they are unwound and read for backup/restore.
Documentation
Notes
cat /proc/scsi/scsi
(Display attached scsi devices)
Tape drive: /dev/st0 Library: /dev/sg0
Test: mt -f /dev/st0 status
BOT keyword means tape in drive
Rewind tape: mt -f /dev/nst0 rewind or /mt -f /dev/nst0 rewoffl
Make backup: tar cvf /dev/st0 directory
List files on tape: tar tvf /dev/st0
Rewind and eject tape: mt -f /dev/st0 rewoffl
Restore tape (insert tape): tar xvf /dev/st0
To make more than one backup to same tape:
Use /dev/nst0
instead of /dev/st0
. This does not rewind the tape after the first backup finished.
Troubleshooting
The following commands can be useful in determining problems with devices.
mtx
mtx -f /dev/sg0 inquiry Product Type: Medium Changer Vendor ID: 'EXABYTE ' Product ID: 'EXB-480 ' Revision: '2.18' Attached Changer: No
tapeinfo
tapeinfo -f /dev/sg0 Product Type: Medium Changer Vendor ID: 'EXABYTE ' Product ID: 'EXB-480 ' Revision: '2.18' Attached Changer: No SerialNumber: '67001141 ' SCSI ID: 0 SCSI LUN: 0 Ready: yes
loaderinfo
loaderinfo -f /dev/sg0 Product Type: Medium Changer Vendor ID: 'EXABYTE ' Product ID: 'EXB-480 ' Revision: '2.18' Attached Changer: No Bar Code Reader: Yes EAAP: Yes Number of Medium Transport Elements: 1 Number of Storage Elements: 21 Number of Import/Export Element Elements: 1 Number of Data Transfer Elements: 1 Transport Geometry Descriptor Page: Yes Invertable: No Device Configuration Page: Yes Can Transfer: Yes
List SCSI devices
'/dev/sg*' are apparently all SCSI devices (some of which are the disks attached to the OS), which can be quite confusing. /proc/scsi/scsi
will show you information about attached scsi devices:
cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: EXABYTE Model: EXB-480 Rev: 2.18 Type: Medium Changer ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: IBM Model: ULTRIUM-TD1 Rev: 4561 Type: Sequential-Access ANSI SCSI revision: 03 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: 3ware Model: Logical Disk 00 Rev: 1.00 Type: Direct-Access ANSI SCSI revision: ffffffff Host: scsi1 Channel: 00 Id: 01 Lun: 00 Vendor: 3ware Model: Logical Disk 01 Rev: 1.00 Type: Direct-Access ANSI SCSI revision: ffffffff Host: scsi1 Channel: 00 Id: 02 Lun: 00 Vendor: 3ware Model: Logical Disk 02 Rev: 1.00 Type: Direct-Access ANSI SCSI revision: ffffffff
/dev/st0 not ready
Try to reset the library and drives from the front panel.
Tape library commands
mtx status
mtx unload <slotnum> <drivenum>
(Unloads media from drive <drivenum> into slot <slotnum>.)