This is an old revision of the document!
Table of Contents
RAID
We have two RAIDs on the HPC
- Linux kernel software RAID
- 3mware hardware RAID
0 | 1 | 2 | 3 |
8 | 9 | 10 | 11 |
4 | 5 | 6 | 7 |
0 | 1 | 2 | 3 |
Software RAID
It is currently reporting a degraded array (27 Aug 2009):
#cat /proc/mdstat Personalities : [raid0] [raid1] md1 : active raid1 hda1[0] 129920 blocks [2/1] [U_] md3 : active raid1 hdc3[1] hda3[0] 2097024 blocks [2/2] [UU] md2 : active raid1 hdc5[1] 65437696 blocks [2/1] [_U] md0 : active raid0 hdc2[1] hda2[0] 20971008 blocks 256k chunks unused devices: <none>
To Do list:
Prepare written instructions on how to repair disk arrays.
What disks to we have?
Add extra spare disks?
How do you know which physical disk is broken to replace it?
Hardware RAID
There is a utility, tw_cli, which can be used to control the hardware raid. The hardware RAID has three arrays, all RAID 5. Each "unit" (row) is one array.
Study the output of show
to know which controller to manage. Then you can use /c1 show
to show the status of that particular controller. Things to look for:
- Which controller is active? (c0, c1, etc)
- Which unit is degraded? (u0, u1, u2, etc)
- Which
Remove the faulty port:
maint remove c1 p5
Insert a new drive and rescan:
maint rescan
Rebuild the degraded array:
maint rebuild c1 u2 p5
Check the status of the rebuild by monitoring /c1 show