raid
                Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| raid [2009/11/02 14:20] – 172.26.0.166 | raid [2010/09/19 23:58] (current) – aorth | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| We have two RAIDs on the HPC | We have two RAIDs on the HPC | ||
| * Linux kernel software RAID | * Linux kernel software RAID | ||
| - | * 3mware | + | * 3ware hardware RAID | 
| ==== Drive numbering ==== | ==== Drive numbering ==== | ||
| - | If you're looking at the front of the HPC you'll see four rows of drives. | + | If you're looking at the front of the HPC you'll see four rows of drives. | 
| * Rows 0 - 2 are SATA, connected to the hardware 3ware RAID card | * Rows 0 - 2 are SATA, connected to the hardware 3ware RAID card | ||
| * Row 3 are IDE | * Row 3 are IDE | ||
| + | |||
| ===== Software RAID ===== | ===== Software RAID ===== | ||
| The Linux kernel has the '' | The Linux kernel has the '' | ||
| Line 49: | Line 50: | ||
|  |  | ||
| unused devices: < | unused devices: < | ||
| - | |||
| ==== Repair RAID ==== | ==== Repair RAID ==== | ||
| + | When a disk is failing you need to replace the drive. | ||
| + | < | ||
| + | Personalities : [raid1] [raid0] | ||
| + | md3 : active raid1 hdd1[1] hda1[0] | ||
| + | 200704 blocks [2/2] [UU] | ||
| + |  | ||
| + | md1 : active raid1 hdd3[1] hda3[0] | ||
| + | 26627648 blocks [2/2] [UU] | ||
| + |  | ||
| + | md2 : active raid0 hdd5[1] hda5[0] | ||
| + | 36868608 blocks 256k chunks | ||
| + |  | ||
| + | md4 : active raid1 hdd6[1] hda6[0] | ||
| + | 2168640 blocks [2/2] [UU] | ||
| + |  | ||
| + | md0 : active raid1 hdd2[1] hda2[0] | ||
| + | 30716160 blocks [2/2] [UU] | ||
| + |  | ||
| + | unused devices: < | ||
| - | Setting a disk faulty/failed: | + | If '' | 
| - | + | < | |
| - | # mdadm --fail /dev/md0 /dev/hdc1 | + | # mdadm / | 
| - | + | # mdadm /dev/md3 --fail /dev/hda1 --remove /dev/hda1 | |
| - | DO NOT run this every on a raid0 or linear device or your data is toasted! | + | # mdadm /dev/md4 --fail /dev/hda6 --remove / | 
| - | + | ''/ | |
| - | Removing | + | < | 
| - | + | # mdadm --stop /dev/md2</code> | |
| - | # mdadm --remove | + | <note warning> You must Shutdown the server before you physically remove the drive! </note> | 
| - | Clearing any previous raid info on a disk (eg. reusing a disk from another decommissioned raid array) | + | Shut the server down and replace the faulty drive with a new one. After booting your drive letters may have shifted around, so just be sure to verify which is which before proceeding. | 
| - | + | Clone the partition table from the good drive to the bad one: | |
| - | # mdadm --zero-superblock | + | < | 
| - | Adding a disk to an array | + | Verify the new partitions can be seen: | 
| - | + | < | |
| - | # mdadm --add /dev/md0 /dev/hdc1 | + | /dev/hda: msdos partitions 1 2 3 4 <5 6> | 
| - | + | /dev/hdd: msdos partitions 1 2 3 4 <5 6> | |
| - | + | /dev/sda: msdos partitions 1 | |
| - | === To Do list: === | + | /dev/sdb: msdos partitions 1 | 
| - | + | /dev/sdc: msdos partitions 1 | |
| - | + | </ | |
| - | Prepare written instructions on how to repair disk arrays. | + | Re-create the scratch partition (RAID0): | 
| - | + | < | |
| - | What disks to we have? | + | # mkfs.ext3 /dev/md2 | 
| - | + | # mount /dev/md2 / | |
| - | Add extra spare disks? | + | You can now add the new partitions back to the RAID1 arrays: | 
| - | + | < | |
| - | How do you know which physical disk is broken to replace it? | + | # mdadm / | 
| - | + | # mdadm /dev/md3 --add /dev/hdd1 | |
| + | # mdadm /dev/md4 --add / | ||
| + | After adding you can monitor the progress of the RAID rebuilds by looking in ''/ | ||
| + | < | ||
| + | md3 : active raid1 hdd1[1] hda1[0] | ||
| + | 200704 blocks [2/2] [UU] | ||
| + | |||
| + | md1 : active raid1 hdd3[2] hda3[0] | ||
| + | 26627648 blocks [2/1] [U_] | ||
| + | [===================> | ||
| + | |||
| + | md2 : inactive hda5[0] | ||
| + |  | ||
| + |  | ||
| + | md4 : active raid1 hdd6[2] hda6[0] | ||
| + |  | ||
| + |  | ||
| + | |||
| + | md0 : active raid1 hdd2[1] hda2[0] | ||
| + |  | ||
| + | |||
| + | unused devices: < | ||
| ===== Hardware RAID ===== | ===== Hardware RAID ===== | ||
| - | A 3ware 9500S SATA RAID card using the 3w-9xxx kernel module. | + | A 3ware 9500S-12 SATA RAID card using the 3w-9xxx kernel module. | 
| ==== Physical Disk Layout ==== | ==== Physical Disk Layout ==== | ||
| - | We have one RAID controller, ' | + | We have one RAID controller, ' | 
| | Port 8 | Port 9 | Port 10 | Port 11 | | | Port 8 | Port 9 | Port 10 | Port 11 | | ||
| Line 95: | Line 134: | ||
| ==== Repairing ' | ==== Repairing ' | ||
| - | There is a utility, tw_cli, which can be used to control/ | + | There is a utility, | 
| Study the output of '' | Study the output of '' | ||
| Line 122: | Line 161: | ||
| 3w-9xxx: scsi1: AEN: ERROR (0x04: | 3w-9xxx: scsi1: AEN: ERROR (0x04: | ||
| - | < | + | < | 
| Password: | Password: | ||
| // | // | ||
raid.1257171659.txt.gz · Last modified:  (external edit)
                
                