User Tools

Site Tools


raid

RAID

We have two RAIDs on the HPC

  • Linux kernel software RAID
  • 3ware hardware RAID

Drive numbering

If you're looking at the front of the HPC you'll see four rows of drives. From the bottom:

  • Rows 0 - 2 are SATA, connected to the hardware 3ware RAID card
  • Row 3 are IDE

Software RAID

The Linux kernel has the md (mirrored devices) driver for software RAID devices. There are currently two 80 GB IDE hard drives connected to the server, /dev/hda and /dev/hdd. These were set up as five RAID devices during the install of Rocks/CentOS.

Here is information on their configuration:

# mount | grep md
/dev/md0 on / type ext3 (rw)
/dev/md3 on /boot type ext3 (rw)
/dev/md2 on /scratch type ext3 (rw)
/dev/md1 on /export type ext3 (rw)
# df -h | grep md
/dev/md0               29G   11G   17G  39% /
/dev/md3              190M   60M  121M  34% /boot
/dev/md2               35G  177M   33G   1% /scratch
/dev/md1               25G  5.5G   18G  24% /export

It should be noted that /dev/md4 is being used as swap:

# swapon -s
Filename				Type		Size	Used	Priority
/dev/md4                                partition	2168632	0	-1

A snapshot of the software RAID's health:

# cat /proc/mdstat 
Personalities : [raid1] [raid0] 
md3 : active raid1 hdd1[1] hda1[0]
      200704 blocks [2/2] [UU]
      
md1 : active raid1 hdd3[1] hda3[0]
      26627648 blocks [2/2] [UU]
      
md2 : active raid0 hdd5[1] hda5[0]
      36868608 blocks 256k chunks
      
md4 : active raid1 hdd6[1] hda6[0]
      2168640 blocks [2/2] [UU]
      
md0 : active raid1 hdd2[1] hda2[0]
      30716160 blocks [2/2] [UU]
      
unused devices: <none>

Repair RAID

When a disk is failing you need to replace the drive. First, look at the RAID configuration to see which partitions are in use by which arrays. For example:

# cat /proc/mdstat 
Personalities : [raid1] [raid0] 
md3 : active raid1 hdd1[1] hda1[0]
      200704 blocks [2/2] [UU]
      
md1 : active raid1 hdd3[1] hda3[0]
      26627648 blocks [2/2] [UU]
      
md2 : active raid0 hdd5[1] hda5[0]
      36868608 blocks 256k chunks
      
md4 : active raid1 hdd6[1] hda6[0]
      2168640 blocks [2/2] [UU]
      
md0 : active raid1 hdd2[1] hda2[0]
      30716160 blocks [2/2] [UU]
      
unused devices: <none>

If /dev/hda is having problems, set all its RAID1 partitions as failed and remove them:

# mdadm /dev/md0 --fail /dev/hda2 --remove /dev/hda2
# mdadm /dev/md1 --fail /dev/hda3 --remove /dev/hda3
# mdadm /dev/md3 --fail /dev/hda1 --remove /dev/hda1
# mdadm /dev/md4 --fail /dev/hda6 --remove /dev/hda6

/dev/md2 is a RAID0 stripe mounted as /scratch, so we have to umount it and then stop it (you can't remove volumes from a stripe):

# umount /dev/md2
# mdadm --stop /dev/md2
You must Shutdown the server before you physically remove the drive!

Shut the server down and replace the faulty drive with a new one. After booting your drive letters may have shifted around, so just be sure to verify which is which before proceeding. Clone the partition table from the good drive to the bad one:

# sfdisk -d /dev/hda | sfdisk --force /dev/hdd

Verify the new partitions can be seen:

# partprobe -s
/dev/hda: msdos partitions 1 2 3 4 <5 6>
/dev/hdd: msdos partitions 1 2 3 4 <5 6>
/dev/sda: msdos partitions 1
/dev/sdb: msdos partitions 1
/dev/sdc: msdos partitions 1

Re-create the scratch partition (RAID0):

# mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/hda5 /dev/hdd5
# mkfs.ext3 /dev/md2
# mount /dev/md2 /scratch

You can now add the new partitions back to the RAID1 arrays:

# mdadm /dev/md0 --add /dev/hdd2
# mdadm /dev/md1 --add /dev/hdd3
# mdadm /dev/md3 --add /dev/hdd1
# mdadm /dev/md4 --add /dev/hdd6

After adding you can monitor the progress of the RAID rebuilds by looking in /proc/mdstat:

Personalities : [raid1] [raid0] 
md3 : active raid1 hdd1[1] hda1[0]
      200704 blocks [2/2] [UU]
      
md1 : active raid1 hdd3[2] hda3[0]
      26627648 blocks [2/1] [U_]
      [===================>.]  recovery = 95.4% (25407552/26627648) finish=0.7min speed=28648K/sec
      
md2 : inactive hda5[0]
      18434304 blocks
       
md4 : active raid1 hdd6[2] hda6[0]
      2168640 blocks [2/1] [U_]
        resync=DELAYED
      
md0 : active raid1 hdd2[1] hda2[0]
      30716160 blocks [2/2] [UU]
      
unused devices: <none>

Hardware RAID

A 3ware 9500S-12 SATA RAID card using the 3w-9xxx kernel module. It has 12 channels. The HPC is configured to use RAID5 for all of its RAID arrays on the hardware RAID.

Physical Disk Layout

We have one RAID controller, 'c1'. Disks are plugged into ports, 'p0' - 'p11'. The disks are then grouped into units (basically the rows), 'u0' - 'u2'.

Port 8 Port 9 Port 10 Port 11
Port 4 Port 5 Port 6 Port 7
Port 0 Port 1 Port 2 Port 3

Repairing 'degraded' arrays

There is a utility, tw_cli, which can be used to control/monitor the hardware raid controller.

Study the output of show to know which controller to manage. Then you can use /c1 show to show the status of that particular controller. Things to look for:

  • Which controller is active? (c0, c1, etc)
  • Which unit is degraded? (u0, u1, u2, etc)
  • Which port is inactive or missing? (p1, p5, etc)
The controller supports hot swapping but you must remove a faulty drive through the tw_cli tool before you can swap drives.

Remove the faulty port:

maint remove c1 p5

Insert a new drive and rescan:

maint rescan

Rebuild the degraded array:

maint rebuild c1 u2 p5

Check the status of the rebuild by monitoring /c1 show, but I have a feeling this might disturb the rebuild process. In any case, you can check the status by following the output of dmesg:

3w-9xxx: scsi1: AEN: INFO (0x04:0x000B): Rebuild started:unit=2.
3w-9xxx: scsi1: AEN: INFO (0x04:0x0005): Background rebuild done:unit=2.

This sucks:

3w-9xxx: scsi1: AEN: INFO (0x04:0x0029): Background verify started:unit=0.
3w-9xxx: scsi1: AEN: INFO (0x04:0x002B): Background verify done:unit=0.
3w-9xxx: scsi1: AEN: ERROR (0x04:0x0002): Degraded unit detected:unit=0, port=3
$ sudo tw_cli
Password: 
//hpc-ilri> /c1 show

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -       -       64K     698.461   ON     OFF    
u1    RAID-5    OK             -       -       64K     698.461   ON     OFF    
u2    RAID-5    OK             -       -       64K     698.461   ON     OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     232.88 GB   488397168     WD-WMAEP2714804     
p1     OK               u0     232.88 GB   488397168     WD-WMAEP1570106     
p2     OK               u0     232.88 GB   488397168     WD-WMAEP2712887     
p3     DEGRADED         u0     232.88 GB   488397168     WD-WMAEP2714418     
p4     OK               u2     232.88 GB   488397168     WD-WCAT1C715001     
p5     OK               u2     232.88 GB   488397168     WD-WMAEP2713449     
p6     OK               u2     232.88 GB   488397168     WD-WMAEP2715070     
p7     OK               u2     232.88 GB   488397168     WD-WMAEP2712590     
p8     OK               u1     232.88 GB   488397168     WD-WMAEP2712574     
p9     OK               u1     232.88 GB   488397168     WD-WMAEP2734142     
p10    OK               u1     232.88 GB   488397168     WD-WMAEP2702155     
p11    OK               u1     232.88 GB   488397168     WD-WMAEP2712472  

Looks like another drive failed.

raid.txt · Last modified: 2010/09/19 23:58 by aorth