raid
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
raid [2009/10/13 05:31] – 172.26.0.166 | raid [2010/09/15 17:56] – aorth | ||
---|---|---|---|
Line 2: | Line 2: | ||
We have two RAIDs on the HPC | We have two RAIDs on the HPC | ||
* Linux kernel software RAID | * Linux kernel software RAID | ||
- | * 3mware | + | * 3ware hardware RAID |
==== Drive numbering ==== | ==== Drive numbering ==== | ||
- | If you're looking at the front of the HPC you'll see four rows of drives. | + | If you're looking at the front of the HPC you'll see four rows of drives. |
* Rows 0 - 2 are SATA, connected to the hardware 3ware RAID card | * Rows 0 - 2 are SATA, connected to the hardware 3ware RAID card | ||
* Row 3 are IDE | * Row 3 are IDE | ||
Line 32: | Line 32: | ||
A snapshot of the software RAID's health: | A snapshot of the software RAID's health: | ||
+ | < | ||
+ | Personalities : [raid1] [raid0] | ||
+ | md3 : active raid1 hdd1[1] hda1[0] | ||
+ | 200704 blocks [2/2] [UU] | ||
+ | | ||
+ | md1 : active raid1 hdd3[1] hda3[0] | ||
+ | 26627648 blocks [2/2] [UU] | ||
+ | | ||
+ | md2 : active raid0 hdd5[1] hda5[0] | ||
+ | 36868608 blocks 256k chunks | ||
+ | | ||
+ | md4 : active raid1 hdd6[1] hda6[0] | ||
+ | 2168640 blocks [2/2] [UU] | ||
+ | | ||
+ | md0 : active raid1 hdd2[1] hda2[0] | ||
+ | 30716160 blocks [2/2] [UU] | ||
+ | | ||
+ | unused devices: < | ||
+ | ==== Repair RAID ==== | ||
+ | When a disk is failing you need to replace the drive. | ||
< | < | ||
Personalities : [raid1] [raid0] | Personalities : [raid1] [raid0] | ||
Line 51: | Line 71: | ||
unused devices: < | unused devices: < | ||
- | === To Do list: === | + | If ''/ |
- | + | < | |
- | + | # mdadm /dev/md1 --fail /dev/hda3 --remove /dev/hda3 | |
- | Prepare written instructions on how to repair disk arrays. | + | # mdadm /dev/md3 --fail /dev/hda1 --remove /dev/hda1 |
- | + | # mdadm /dev/md4 --fail /dev/hda6 --remove / | |
- | What disks to we have? | + | ''/ |
- | + | < | |
- | Add extra spare disks? | + | # mdadm --stop / |
- | + | <note warning> You must Shutdown the server before | |
- | How do you know which physical disk is broken | + | Shut the server down and replace the faulty drive with a new one. After booting your drive letters may have shifted around, so just be sure to verify |
+ | Clone the partition table from the good drive to the bad one: | ||
+ | < | ||
+ | Verify the new partitions can be seen: | ||
+ | < | ||
+ | /dev/hda: msdos partitions 1 2 3 4 <5 6> | ||
+ | /dev/hdd: msdos partitions 1 2 3 4 <5 6> | ||
+ | /dev/sda: msdos partitions 1 | ||
+ | /dev/sdb: msdos partitions 1 | ||
+ | /dev/sdc: msdos partitions 1 | ||
+ | </ | ||
+ | Re-create the scratch partition (RAID0): | ||
+ | < | ||
+ | # mkfs.ext3 /dev/md2 | ||
+ | # mount /dev/md2 / | ||
+ | You can now add the new partitions back to the RAID1 arrays: | ||
+ | < | ||
+ | # mdadm /dev/md1 --add /dev/hdd3 | ||
+ | # mdadm /dev/md3 --add /dev/hdd1 | ||
+ | # mdadm /dev/md4 --add / | ||
+ | After adding you can monitor the progress of the RAID rebuilds by looking in ''/ | ||
+ | < | ||
+ | md3 : active raid1 hdd1[1] hda1[0] | ||
+ | 200704 blocks [2/2] [UU] | ||
+ | |||
+ | md1 : active raid1 hdd3[2] hda3[0] | ||
+ | 26627648 blocks [2/1] [U_] | ||
+ | [===================> | ||
+ | |||
+ | md2 : inactive hda5[0] | ||
+ | 18434304 blocks | ||
+ | |||
+ | md4 : active raid1 hdd6[2] hda6[0] | ||
+ | 2168640 blocks [2/1] [U_] | ||
+ | resync=DELAYED | ||
+ | |||
+ | md0 : active raid1 hdd2[1] hda2[0] | ||
+ | 30716160 blocks [2/2] [UU] | ||
+ | |||
+ | unused devices: < | ||
===== Hardware RAID ===== | ===== Hardware RAID ===== |
raid.txt · Last modified: 2010/09/19 23:58 by aorth