User Tools

Site Tools


backup:tape

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
backup:tape [2010/10/16 00:42] aorthbackup:tape [2011/11/09 12:55] – created aorth
Line 1: Line 1:
-====== Tape backup ======+====== Storix ====== 
 +<note>As of mid 2011 we don't use Storix anymore.  The new HPC uses Amanda instead, as it is free, open-source software.</note>
  
 Tape backups are run manually __once per week__, __on Friday afternoon__.  We have four cassettes, each of which can hold seven tapes.  Our current tape backup needs are around ten tapes, so each pair has eleven tapes total just in case the size of the backups increases.  Each week we rotate the set of cassettes so that we always have a week of archived data. Tape backups are run manually __once per week__, __on Friday afternoon__.  We have four cassettes, each of which can hold seven tapes.  Our current tape backup needs are around ten tapes, so each pair has eleven tapes total just in case the size of the backups increases.  Each week we rotate the set of cassettes so that we always have a week of archived data.
Line 10: Line 11:
   * ''/mnt/export3''  (videodata)   * ''/mnt/export3''  (videodata)
  
-====== Backup Process ====== +===== Example backup process ===== 
-===== Insert tapes ===== +==== Insert tapes ==== 
-===== Run Storix Backup =====+==== Run Storix Backup ====
 From an X11 window: From an X11 window:
 <code>$ sudo sbadmin</code> <code>$ sudo sbadmin</code>
Line 23: Line 24:
     - "Read Label From Media"     - "Read Label From Media"
     - "Expire/Remove"     - "Expire/Remove"
-      - {{:backup:expire_label.png?nolink&200|}}+      - {{:backup:expire_label.png?direct&300|}}
   - Actions -> Run Backup Jobs   - Actions -> Run Backup Jobs
-    - {{:backup:run_backup.png?nolink&200|}}+    - {{:backup:run_backup.png?direct&300|}}
     - "Run Now"     - "Run Now"
  
 This takes about 30-35 hours depending on the load of the server and whether or not the robot is working properly. This takes about 30-35 hours depending on the load of the server and whether or not the robot is working properly.
  
-====== Problems ======+===== Problems =====
   * Sometimes tapes are hard to remove from the cassette (this causes the robot to jam sometimes)   * Sometimes tapes are hard to remove from the cassette (this causes the robot to jam sometimes)
   * Even setting the virtual device to "sequential" doesn't work as desired (robot stops when a tape is full and waits for you to manually unload and load the next tape), so we use a "random tape library" instead   * Even setting the virtual device to "sequential" doesn't work as desired (robot stops when a tape is full and waits for you to manually unload and load the next tape), so we use a "random tape library" instead
  
-====== Monitoring the backup ======+===== Monitoring the backup =====
  
 The Storix Backup tool shows the current status of the backup but if you're not sitting at the machine there is no way to see.  You can use a one-line shell script to loop periodically and check the status of the tape library.  This essentially becomes a log of the progress.  Output to somewhere web-readable, as web is accessible from outside ILRI: The Storix Backup tool shows the current status of the backup but if you're not sitting at the machine there is no way to see.  You can use a one-line shell script to loop periodically and check the status of the tape library.  This essentially becomes a log of the progress.  Output to somewhere web-readable, as web is accessible from outside ILRI:
 <code># for num in `seq 1 1000`; do echo "Seq ${num}: $(mtx status)" >> /var/www/html/coffee.txt; sleep 1800;  done</code> <code># for num in `seq 1 1000`; do echo "Seq ${num}: $(mtx status)" >> /var/www/html/coffee.txt; sleep 1800;  done</code>
-====== Backup history ======+===== Log of backups =====
  
 ^  Date  ^  Tape set  ^  Notes  ^ ^  Date  ^  Tape set  ^  Notes  ^
Line 82: Line 83:
 | October 1, 2010  |  B  | Completed successfully | | October 1, 2010  |  B  | Completed successfully |
 | October 8, 2010  |  A  | Completed successfully | | October 8, 2010  |  A  | Completed successfully |
-| October 15, 2010  |  B  | ... | +| October 15, 2010  |  B  | Completed successfully | 
- +| October 22, 2010  |  A  | Completed successfully | 
 +| October 29, 2010  |  ...  | Alan in Switzerland, Etienne in China 
 +| November 5, 2010  |  B  | Apparently successful, but HPC crashed sometime during the weekend due to power fluctuations.  Verify failed. | 
 +| November 12, 2010  |  A  | Completed successfully | 
 +| November 19, 2010  |  B  | Had a problem (can't remember why, power?) | 
 +| November 26, 2010  |  B  | Completed successfully | 
 +| December 3, 2010  |  A  | Completed successfully | 
 +| December 12, 2010  |  B  | Completed successfully | 
 +| December 17, 2010  |  A  | Completed successfully | 
 +| December 24, 2010  |  B  | Completed successfully | 
 +| December 31, 2010  |  -  | gone for holidays | 
 +| January 7, 2011  |  A  | Completed successfully | 
 +| January 14, 2011  |  B  | Backup failed, tape library has error 205: X Axis Error. Reset the library and it appears to be ok. | 
 +| January 21, 2011  |  B  | Completed successfully | 
 +| January 28, 2011  |  A  | failed, crashed because of a job Anne was running | 
 +| February 4, 2011  |  -  | gone for holidays | 
 +| February 11, 2011  |  A  | no backup because of work to server room air conditioning | 
 +| February 18, 2011  |  A  | Completed successfully | 
 +| February 25, 2011  |  B  | Completed successfully | 
 +| March 4, 2011  |  A  | failed | 
 +| March 11, 2011  |  A  | Completed successfully | 
 +| March 18, 2011  |  B  | Completed successfully | 
 +| March 25, 2011  |  A  | Completed successfully | 
 +| April 1, 2011  |  B  | Was running a restore for Anne so couldn't run backups | 
 +| April 8, 2011  |  A  | Completed successfully | 
 +| April 15, 2011  |  B  | Completed successfully | 
 +| April 21, 2011  |  A  | Completed successfully | 
 +| April 29, 2011  |  B  | Completed successfully | 
 +| May 6, 2011  |  A  | failed... not sure why | 
 +| May 13, 2011  |  A  | haven't started because I can't eject the tapes yet |
 ====== Storix Backup Administrator ====== ====== Storix Backup Administrator ======
 We are using an Exabyte Tape library for backups and the commercial Storix Backup Administrator software [[http://www.storix.com/]]. We are using an Exabyte Tape library for backups and the commercial Storix Backup Administrator software [[http://www.storix.com/]].
Line 97: Line 126:
  
 The Exabyte device has one tape "drive" and a library of tapes.  It can hold three cassettes, each cassette can hold 7 tapes.  The robotic arm moves the tapes from the cassettes to the tape drive where they are unwound and read for backup/restore. The Exabyte device has one tape "drive" and a library of tapes.  It can hold three cassettes, each cassette can hold 7 tapes.  The robotic arm moves the tapes from the cassettes to the tape drive where they are unwound and read for backup/restore.
-====== Documentation ======+===== Documentation =====
  
   * {{:sba.pdf}}   * {{:sba.pdf}}
Line 105: Line 134:
   * {{:exabyte_monitor.pdf}}   * {{:exabyte_monitor.pdf}}
  
-====== Notes ======+===== Notes =====
  
 ''cat /proc/scsi/scsi''  (Display attached scsi devices) ''cat /proc/scsi/scsi''  (Display attached scsi devices)
Line 125: Line 154:
 Use ''/dev/nst0'' instead of ''/dev/st0''. This does not rewind the tape after the first backup finished. Use ''/dev/nst0'' instead of ''/dev/st0''. This does not rewind the tape after the first backup finished.
  
-===== Tape library commands =====+===== Troubleshooting ====== 
 +The following commands can be useful in determining problems with devices. 
 +==== mtx ==== 
 +<code>mtx -f /dev/sg0 inquiry 
 +Product Type: Medium Changer 
 +Vendor ID: 'EXABYTE ' 
 +Product ID: 'EXB-480         ' 
 +Revision: '2.18' 
 +Attached Changer: No</code> 
 +==== tapeinfo ==== 
 +<code>tapeinfo -f /dev/sg0 
 +Product Type: Medium Changer 
 +Vendor ID: 'EXABYTE ' 
 +Product ID: 'EXB-480         ' 
 +Revision: '2.18' 
 +Attached Changer: No 
 +SerialNumber: '67001141 
 +SCSI ID: 0 
 +SCSI LUN: 0 
 +Ready: yes</code> 
 +==== loaderinfo ==== 
 +<code>loaderinfo -f /dev/sg0 
 +Product Type: Medium Changer 
 +Vendor ID: 'EXABYTE ' 
 +Product ID: 'EXB-480         ' 
 +Revision: '2.18' 
 +Attached Changer: No 
 +Bar Code Reader: Yes 
 +EAAP: Yes 
 +Number of Medium Transport Elements: 1 
 +Number of Storage Elements: 21 
 +Number of Import/Export Element Elements: 1 
 +Number of Data Transfer Elements: 1 
 +Transport Geometry Descriptor Page: Yes 
 +Invertable: No 
 +Device Configuration Page: Yes 
 +Can Transfer: Yes</code> 
 + 
 +==== List SCSI devices ==== 
 +'/dev/sg*' are apparently all SCSI devices (some of which are the disks attached to the OS), which can be quite confusing.  ''/proc/scsi/scsi'' will show you information about attached scsi devices: 
 +<code>cat /proc/scsi/scsi 
 +Attached devices: 
 +Host: scsi0 Channel: 00 Id: 00 Lun: 00 
 +  Vendor: EXABYTE  Model: EXB-480          Rev: 2.18 
 +  Type:   Medium Changer                   ANSI SCSI revision: 02 
 +Host: scsi0 Channel: 00 Id: 01 Lun: 00 
 +  Vendor: IBM      Model: ULTRIUM-TD1      Rev: 4561 
 +  Type:   Sequential-Access                ANSI SCSI revision: 03 
 +Host: scsi1 Channel: 00 Id: 00 Lun: 00 
 +  Vendor: 3ware    Model: Logical Disk 00  Rev: 1.00 
 +  Type:   Direct-Access                    ANSI SCSI revision: ffffffff 
 +Host: scsi1 Channel: 00 Id: 01 Lun: 00 
 +  Vendor: 3ware    Model: Logical Disk 01  Rev: 1.00 
 +  Type:   Direct-Access                    ANSI SCSI revision: ffffffff 
 +Host: scsi1 Channel: 00 Id: 02 Lun: 00 
 +  Vendor: 3ware    Model: Logical Disk 02  Rev: 1.00 
 +  Type:   Direct-Access                    ANSI SCSI revision: ffffffff</code> 
 +   
 +==== Force tape location ==== 
 +<code>TAPE=/dev/sg0 mtx status</code> 
 +==== /dev/st0 not ready ==== 
 +Try to reset the library and drives from the front panel. 
 + 
 +==== Tape library commands ====
  
   * ''mtx status''   * ''mtx status''
-  *''mtx unload <slotnum> <drivenum>''  (Unloads media from drive <drivenum> into slot  <slotnum>.)+  * ''mtx unload <slotnum> <drivenum>''  (Unloads media from drive <drivenum> into slot  <slotnum>.)
  
 ===== Bootable USB recovery ===== ===== Bootable USB recovery =====
  
 http://www.storix.com/how-to/202-how-to-configure-a-bootable-usb-drive-for-bare-metal-recovery-sbadmin-v6 http://www.storix.com/how-to/202-how-to-configure-a-bootable-usb-drive-for-bare-metal-recovery-sbadmin-v6