ILRI Research Computing

This is an old revision of the document!

RAID

We have two RAIDs on the HPC

Linux kernel software RAID
3mware hardware RAID

0	1	2	3
8	9	10	11
4	5	6	7
0	1	2	3

Software RAID

It is currently reporting a degraded array (27 Aug 2009):

#cat /proc/mdstat
Personalities : [raid0] [raid1] 
md1 : active raid1 hda1[0]
      129920 blocks [2/1] [U_]
      
md3 : active raid1 hdc3[1] hda3[0]
      2097024 blocks [2/2] [UU]
      
md2 : active raid1 hdc5[1]
      65437696 blocks [2/1] [_U]
      
md0 : active raid0 hdc2[1] hda2[0]
      20971008 blocks 256k chunks
      
unused devices: <none>

To Do list:

Prepare written instructions on how to repair disk arrays.

What disks to we have?

Add extra spare disks?

How do you know which physical disk is broken to replace it?

Hardware RAID

There is a utility, tw_cli, which can be used to control the hardware raid. The hardware RAID has three arrays, all RAID 5. Each "unit" (row) is one array.

Study the output of show to know which controller to manage. Then you can use /c1 show to show the status of that particular controller. Things to look for:

Which controller is active? (c0, c1, etc)
Which unit is degraded? (u0, u1, u2, etc)
Which

Remove the faulty port:

maint remove c1 p5

Insert a new drive and rescan:

maint rescan

Rebuild the degraded array:

maint rebuild c1 u2 p5

Check the status of the rebuild by monitoring /c1 show

Table of Contents

RAID

Software RAID

To Do list:

Hardware RAID