User Tools

Site Tools


backup

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
backup [2010/10/16 00:01] – created aorthbackup [2019/05/08 09:35] (current) – removed aorth
Line 1: Line 1:
-====== Backups ====== 
  
-We utilize three methods for backups: 
-  * [[backup:tape|Tape backup]] for storage connected to the HPC 
-  * [[backup:rsync|Rsync]] for files not stored on the HPC (other websites, etc) 
-  * [[backup:backuppc|BackupPC]] for files on users' desktops 
-====== Tape backup ====== 
- 
-Tape backups are run manually __once per week__, __on Friday afternoon__.  We have four cassettes, each of which can hold seven tapes.  Our current tape backup needs are around ten tapes, so each pair has eleven tapes total just in case the size of the backups increases.  Each week we rotate the set of cassettes so that we always have a week of archived data. 
- 
-A full system backup includes: 
- 
-  * ''/'' <- (OS) 
-  * ''/mnt/export''  (homes and biosoft applications) 
-  * ''/mnt/export2''  (segoli data is here) 
-  * ''/mnt/export3''  (videodata) 
- 
-This takes about 30-35 hours depending on the load of the server and whether or not the robot is working properly. 
-==== Problems ==== 
-  * Sometimes tapes are hard to remove from the cassette (this causes the robot to jam sometimes) 
-  * Even setting the virtual device to "sequential" doesn't work as desired (robot stops when a tape is full and waits for you to manually unload and load the next tape), so we use a "random tape library" instead 
- 
-==== Monitoring the backup ==== 
- 
-The Storix Backup tool shows the current status of the backup but if you're not sitting at the machine there is no way to see.  You can use a one-line shell script to loop periodically and check the status of the tape library.  This essentially becomes a log of the progress.  Output to somewhere web-readable, as web is accessible from outside ILRI: 
-<code># for num in `seq 1 1000`; do echo "Seq ${num}: $(mtx status)" >> /var/www/html/coffee.txt; sleep 1800;  done</code> 
-==== Backup history ==== 
- 
-^  Date  ^  Tape set  ^  Notes  ^ 
-| Oct 30, 2009  |  A  | Robot jammed on tape 7, backup did not complete  | 
-| Nov 6, 2009  |  B  | Completed successfully  | 
-| Nov 13, 2009  |  A  | Completed successfully  | 
-| Nov 20, 2009  |  B  | Backup completed successfully, Verify process failed at tape 4 |  
-| Nov 27, 2009  |  A  | Completed successfully  | 
-| Dec 4, 2009  |  B  | Backup completed successfully, Verify process failed at tape 6  | 
-| Dec 11, 2009  |  A  | Backup failed to start (appears to be a software problem, server might need a reboot) | 
-| Dec 21, 2009  |  A  | Completed successfully  | 
-| Jan 8, 2010  |  B  | Completed successfully  | 
-| Jan 15, 2010  |  A  | Backup completed successfully, Verify process failed  | 
-| Jan 22, 2010  |  B  | Backup completed successfully, Verify stuck at 100%...  | 
-| Jan 29, 2010  |  A  | Backup complete successfully, Verify stuck at 8%...  | 
-| Feb 5, 2010  |  B  | Completed successfully | 
-| Feb 12, 2010  |  A  |Completed successfully | 
-| Feb 19, 2010  |  B  | Completed successfully | 
-| March 12, 2010  |  A  | Completed successfully | 
-| March 19, 2010  |  B  |Completed successfully | 
-| April 1, 2010 |  A  | Completed successfully | 
-| April 9, 2010  |  B  |Completed successfully | 
-| April 16, 2010  |  A  |Completed successfully | 
-| April 23, 2010  |  A  | Completed successfully | 
-| April 30, 2010  |  B  | Completed successfully | 
-| May 07, 2010  |  A  | Completed successfully | 
-| May 21, 2010  |  B  | completed successfully | 
-| June 4, 2010  |  A  | completed successfully | 
-| June 9, 2010  |  B  | completed successfully | 
-| June 18, 2010  |  A  | completed successfully | 
-| June 25, 2010  |  B  | Completed successfully | 
-| July 2, 2010  |  A  | Completed successfully | 
-| July 9, 2010  |  B  | Completed successfully | 
-| July 16, 2010  |  A  | Completed successfully | 
-| July 23, 2010  |  B  | Completed successfully | 
-| July 30, 2010  |  A  | Completed successfully | 
-| August 6, 2010  |  B  | Completed successfully | 
-| August 13, 2010  |  A  | ... | 
-| September 3, 2010  |  A  | Completed successfully, verify failed | 
-| September 10, 2010  |  B  | Completed successfully, verify failed | 
-| September 17, 2010  |  A  | HPC crashed during the previous night, backups couldn't run... will run them next week now that HPC is fixed | 
-| September 24, 2010  |  A  | Completed successfully | 
-| October 1, 2010  |  B  | Completed successfully | 
-| October 8, 2010  |  A  | Completed successfully | 
-| October 15, 2010  |  B  | ... | 
- 
-===== Rsync ===== 
-Rsync is a tool for keeping a source and destination synchronized (on the same physical machine or over a network).  We use Rsync to backup several servers over the network to HPC, where the data can be included in the system tape backups.  Servers using Rsync for backup: 
- 
-  * BecA-ILRI wiki: http://lims.ilri.cgiar.org/wiki 
-  * TParvaDB: http://tparvadb.ilri.cgiar.org 
-==== Configure HPC ==== 
-Create the backup directories: 
-<code># mkdir -p /mnt/export2/backup/wiki /mnt/export2/backup/tparvadb</code> 
- 
-==== BecA-ILRI Wiki Backup ==== 
-Because the wiki does not use SQL we can back it up by simply copying its web-accessible data files. 
- 
-=== Set up SSH keys === 
-We will use password-less logins to HPC for the backup.  Make sure you specify no password when making the keys: 
-<code># ssh-keygen -t rsa</code> 
-Paste the new public key inside root's ''~/.ssh/authorized_keys'' file on the HPC: 
-<code># cat ~/.ssh/id_rsa.pub</code> 
-Now try to login to HPC as root: 
-<code># ssh hpc.ilri.cgiar.org</code> 
-If you were logged in successfully without typing a password you can now test the backup. 
- 
-=== Test the backup === 
-<code># rsync -av --delete /var/www/html/wiki/ hpc.ilri.cgiar.org:/mnt/export2/backup/wiki</code> 
- 
-=== Automate the backup === 
-If everything is working you can set the rsync job to run automatically with cron.  Add the following entry to root's crontab: 
-<file># synchronize the wiki to HPC so it can get backed up on tape 
-0   18  *       rsync -av --delete /var/www/html/wiki/ hpc.ilri.cgiar.org:/mnt/export2/backup/wiki</file> 
-This will run the backup job at 6:00 pm every day. 
-==== TParvaDB ==== 
-We have to preserve system permissions, owners, groups, etc, so we need to connect as root (unprivileged users can't change permissions/owners). 
- 
-=== Set up SSH keys === 
-We will use password-less logins to HPC for the backup.  Make sure you specify no password when making the keys: 
-<code># ssh-keygen -t rsa</code> 
-Paste the new public key inside root's ''~/.ssh/authorized_keys'' file on the HPC: 
-<code># cat ~/.ssh/id_rsa.pub</code> 
-Now try to login to HPC as root: 
-<code># ssh hpc.ilri.cgiar.org</code> 
-If you were logged in successfully without typing a password you can now test the backup. 
- 
-=== Test the backup === 
-<code># rsync -xrlptgoEv --exclude=/proc --exclude=/sys --delete /home hpc.ilri.cgiar.org:/mnt/export2/backup/tparvadb</code> 
- 
-=== Automate the backup === 
-If everything is working you can set the rsync job to run automatically with cron.  Add the following entry to root's crontab: 
-<file># synchronize the wiki to HPC so it can get backed up on tape 
-0   19  *       rsync -xrlptgoEv --exclude=/proc --exclude=/sys --delete / hpc.ilri.cgiar.org:/mnt/export2/backup/tparvadb</file> 
-This will run the backup job at 7:00 pm every day. 
- 
-=== To restore === 
-Make sure to stop the affected services first (Apache, PostgreSQL, MySQL). Use the same rsync command as above, but switch the order.  The ending "/" is important.  You can use a "--dry-run" to see WHAT rsync would actually do without doing it: 
-<code># rsync -xrlptgoEv --exclude=/proc --exclude=/sys --delete  hpc.ilri.cgiar.org:/mnt/export2/backup/tparvadb/ /</code> 
-Then restart the services. 
- 
-==== Links ==== 
-  * Backup a running system with rsync: http://users.telenet.be/mydotcom/howto/linux/clone.htm 
- 
-===== Storix Backup Administrator ===== 
-We are using an Exabyte Tape library for backups and the commercial Storix Backup Administrator software [[http://www.storix.com/]]. 
- 
-Version: 
-<code>$ cat /opt/storix/instconfig/version  
-6.3.4.4</code> 
- 
-Storix System Backup Administrator: ''/home/villierse/software/storix'' 
- 
-Graphicaluser interface: ''sbadmin'' 
- 
-The Exabyte device has one tape "drive" and a library of tapes.  It can hold three cassettes, each cassette can hold 7 tapes.  The robotic arm moves the tapes from the cassettes to the tape drive where they are unwound and read for backup/restore. 
-==== Documentation ==== 
- 
-  * {{:sba.pdf}} 
-  * {{:sbalinuxinst.pdf}} 
-  * {{:exabyte-basicbackup.pdf}} 
-  * {{:exabyte221l_manual.pdf}} 
-  * {{:exabyte_monitor.pdf}} 
- 
-==== Notes ==== 
- 
-''cat /proc/scsi/scsi''  (Display attached scsi devices) 
- 
-Tape drive: /dev/st0 
-Library: /dev/sg0 
- 
-Test: ''mt -f /dev/st0 status'' 
-BOT keyword means tape in drive 
- 
-Rewind tape: ''mt -f /dev/nst0 rewind or /mt -f /dev/nst0 rewoffl'' 
- 
-Make backup:            ''tar cvf /dev/st0 directory'' 
-List files on tape:     ''tar tvf /dev/st0'' 
-Rewind and eject tape:  ''mt -f /dev/st0 rewoffl'' 
-Restore tape (insert tape): ''tar xvf /dev/st0'' 
- 
-To make more than one backup to same tape: 
-Use ''/dev/nst0'' instead of ''/dev/st0''. This does not rewind the tape after the first backup finished. 
- 
-Tape library commands: 
- 
-''mtx status'' 
- 
-''mtx unload <slotnum> <drivenum>''  (Unloads media from drive <drivenum> into slot  <slotnum>.) 
- 
-==== Bootable USB recovery ==== 
- 
-http://www.storix.com/how-to/202-how-to-configure-a-bootable-usb-drive-for-bare-metal-recovery-sbadmin-v6 
-===== Example Scripts ===== 
-Example scripts to be used on ILRI servers for automated backup and maintenance of data.  Most should be configured to run via cron, usually as root.  Make sure that any critical scripts are protected by changing the permissions to 700 so only root can read them. 
- 
-To add a cron entry, edit root's cron tab: 
-<code>$ sudo crontab -e</code> 
-... to contain the following lines: 
-<file># Backup the MySQL database at 12:13 every night 
-13 0 * * * /home/backup/scripts/backup_mysql.sh</file> 
-Which will run the script every night at 12:13. 
- 
-==== Backup MySQL Database ===== 
-This script will dump a given database to a ''mysql'' folder in the specified backup directory.  Make sure the backup directory and the ''mysql'' subdirectory exist. 
-<code>#!/bin/sh 
- 
-BACKUP_DIR=/home/backup 
-DATE=$(date +%Y%m%d) 
- 
-mysqldump --opt -u root -p'yourpassword' databasename | bzip2 -c > ${BACKUP_DIR}/mysql/databasename_${DATE}.sql.bz2 
- 
-exit 0 
-</code> 
- 
-==== Backup PostgreSQL Database ===== 
-This script will dump a given database to a ''postgres'' folder in the specified backup directory.  Make sure the backup directory and the ''postgres'' subdirectory exist. 
-<code>#!/bin/sh 
- 
-# set the user's Postgres password in the variable because we are 
-# not using pg_dump interactively, so we can't type it in! 
-export PGPASSWORD="database password" 
- 
-# Grab the current date and save it to the DATE variable. 
-# February 22, 2010 would look like this: 20100222 
-DATE=$(date +%Y%m%d) 
- 
-# Backup the PostgreSQL dspace database at 12:15 every night 
-/usr/bin/pg_dump -b -v -o --format=custom -U user /home/backup/postgres/database_${DATE}.backup database 
- 
-exit 0</code> 
-==== Cleanup Old Backups ==== 
-This script will search the specified backup directory for backup files older than 2 weeks and delete them.  Make sure you don't delete backup scripts or non-backup files (directories, maybe?). 
-<code>#!/bin/sh 
- 
-BACKUP_DIR="/home/backup" 
- 
-# Find files older than 2 weeks and delete them 
-find ${BACKUP_DIR} -type f \! -newermt "2 weeks ago" \! -name "*.sh" -exec rm {} \; 
- 
-exit 0</code> 
backup.1287187293.txt.gz · Last modified: by aorth