User Tools

Site Tools


raid

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
raid [2009/11/16 13:59] 172.26.0.166raid [2009/11/20 14:13] 172.26.0.166
Line 50: Line 50:
              
 unused devices: <none></code> unused devices: <none></code>
- 
 ==== Repair RAID ==== ==== Repair RAID ====
 +When a disk is failing you need to replace the drive.  First, look at the RAID configuration to see which partitions are in use by which arrays.  For example:
 +<code># cat /proc/mdstat 
 +Personalities : [raid1] [raid0] 
 +md3 : active raid1 hdd1[1] hda1[0]
 +      200704 blocks [2/2] [UU]
 +      
 +md1 : active raid1 hdd3[1] hda3[0]
 +      26627648 blocks [2/2] [UU]
 +      
 +md2 : active raid0 hdd5[1] hda5[0]
 +      36868608 blocks 256k chunks
 +      
 +md4 : active raid1 hdd6[1] hda6[0]
 +      2168640 blocks [2/2] [UU]
 +      
 +md0 : active raid1 hdd2[1] hda2[0]
 +      30716160 blocks [2/2] [UU]
 +      
 +unused devices: <none></code>
  
-Setting a disk faulty/failed: +If ''/dev/hda'' is having problems, set all its RAID1 partitions as failed and remove them
- +<code># mdadm /dev/md0 --fail /dev/hda2 --remove /dev/hda2 
-# mdadm --fail /dev/md0 /dev/hdc1 +# mdadm /dev/md1 --fail /dev/hda3 --remove /dev/hda3 
- +# mdadm /dev/md3 --fail /dev/hda1 --remove /dev/hda1 
-DO NOT run this every on a raid0 or linear device or your data is toasted! +# mdadm /dev/md4 --fail /dev/hda6 --remove /dev/hda6</code> 
- +''/dev/md2'' is RAID0 stripe mounted as ''/scratch'', so we have to umount it and then stop it (you can't remove volumes from a stripe)
-Removing faulty disk from an array+<code># umount /dev/md2 
- +# mdadm --stop /dev/md2</code> 
-# mdadm --remove /dev/md0 /dev/hdc1 +<note warning> You must Shutdown the server before you physically remove the drive! </note> 
-Clearing any previous raid info on disk (egreusing a disk from another decommissioned raid array) +Shut the server down and replace the faulty drive with new one After booting your drive letters may have shifted around, so just be sure to verify which is which before proceeding. 
- +Clone the partition table from the good drive to the bad one: 
-# mdadm --zero-superblock /dev/hdc1 +<code># sfdisk -d /dev/hda | sfdisk --force /dev/hdd</code> 
-Adding a disk to an array +Verify the new partitions can be seen: 
- +<code># partprobe -s 
-# mdadm --add /dev/md0 /dev/hdc1 +/dev/hda: msdos partitions 1 2 3 4 <5 6> 
- +/dev/hdd: msdos partitions 1 2 3 4 <5 6> 
- +/dev/sda: msdos partitions 1 
-=== To Do list: === +/dev/sdb: msdos partitions 1 
- +/dev/sdc: msdos partitions 1 
- +</code> 
-Prepare written instructions on how to repair disk arrays. +Re-create the scratch partition (RAID0): 
- +<code># mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/hda5 /dev/hdd5 
-What disks to we have? +# mkfs.ext3 /dev/md2 
- +# mount /dev/md2 /scratch</code> 
-Add extra spare disks? +You can now add the new partitions back to the RAID1 arrays: 
- +<code># mdadm /dev/md0 --add /dev/hdd2 
-How do you know which physical disk is broken to replace it? +# mdadm /dev/md1 --add /dev/hdd3 
 +# mdadm /dev/md3 --add /dev/hdd1 
 +# mdadm /dev/md4 --add /dev/hdd6</code> 
 +After adding you can monitor the progress of the RAID rebuilds by looking in ''/proc/mdstat'': 
 +<file>Personalities : [raid1] [raid0]  
 +md3 : active raid1 hdd1[1] hda1[0] 
 +      200704 blocks [2/2] [UU] 
 +       
 +md1 : active raid1 hdd3[2] hda3[0] 
 +      26627648 blocks [2/1] [U_] 
 +      [===================>.]  recovery = 95.4% (25407552/26627648) finish=0.7min speed=28648K/sec 
 +       
 +md2 : inactive hda5[0] 
 +      18434304 blocks 
 +        
 +md4 : active raid1 hdd6[2] hda6[0] 
 +      2168640 blocks [2/1] [U_] 
 +        resync=DELAYED 
 +       
 +md0 : active raid1 hdd2[1] hda2[0] 
 +      30716160 blocks [2/2] [UU] 
 +       
 +unused devices: <none></file>
  
 ===== Hardware RAID ===== ===== Hardware RAID =====
raid.txt · Last modified: 2010/09/19 23:58 by aorth