User Tools

Site Tools


raid

This is an old revision of the document!


RAID

We have two RAIDs on the HPC

  • Linux kernel software RAID
  • 3mware hardware RAID

Drive numbering

If you're looking at the front of the HPC you'll see four rows of drives. From the bottom;

  • Rows 0 - 2 are SATA, connected to the hardware 3ware RAID card
  • Row 3 are IDE

Software RAID

The Linux kernel has the md (mirrored devices) driver for software RAID devices. There are currently two 80 GB IDE hard drives connected to the server, /dev/hda and /dev/hdd. These were set up as five RAID devices during the install of Rocks/CentOS.

Here is information on their configuration:

# mount | grep md
/dev/md0 on / type ext3 (rw)
/dev/md3 on /boot type ext3 (rw)
/dev/md2 on /scratch type ext3 (rw)
/dev/md1 on /export type ext3 (rw)

It should be noted that /dev/md4 is being used as swap:

# swapon -s
Filename				Type		Size	Used	Priority
/dev/md4                                partition	2168632	0	-1

A snapshot of the software RAID's health:

# cat /proc/mdstat 
Personalities : [raid1] [raid0] 
md3 : active raid1 hdd1[1] hda1[0]
      200704 blocks [2/2] [UU]
      
md1 : active raid1 hdd3[1] hda3[0]
      26627648 blocks [2/2] [UU]
      
md2 : active raid0 hdd5[1] hda5[0]
      36868608 blocks 256k chunks
      
md4 : active raid1 hdd6[1] hda6[0]
      2168640 blocks [2/2] [UU]
      
md0 : active raid1 hdd2[1] hda2[0]
      30716160 blocks [2/2] [UU]
      
unused devices: <none>

To Do list:

Prepare written instructions on how to repair disk arrays.

What disks to we have?

Add extra spare disks?

How do you know which physical disk is broken to replace it?

Hardware RAID

There is a utility, tw_cli, which can be used to control the hardware raid. The hardware RAID has three arrays, all RAID 5. Each "unit" (row) is one array.

8 9 10 11
4 5 6 7
0 1 2 3

Study the output of show to know which controller to manage. Then you can use /c1 show to show the status of that particular controller. Things to look for:

  • Which controller is active? (c0, c1, etc)
  • Which unit is degraded? (u0, u1, u2, etc)
  • Which

Remove the faulty port:

maint remove c1 p5

Insert a new drive and rescan:

maint rescan

Rebuild the degraded array:

maint rebuild c1 u2 p5

Check the status of the rebuild by monitoring /c1 show

raid.1254236508.txt.gz · Last modified: 2010/05/22 14:19 (external edit)