DELL PowerEdge multiple drive failure

Our client’s PowerEdge 2900 with Perc 6i containing 8 x 300GB drives in a RAID 5 configuration has seen multiple drive failures.

Following a Microsoft update, the server reported a single drive failure and was showing as ‘degraded’. With no replacement drives from DELL, we attempted a full rebuild as a RAID 6 with 7 hard disks after taking an ‘image’ of all the healthy disks. However. after recreating the virtual disk and beginning to initialize, two more disks failed.

How probably is simultaneous failure in a RAID volume?

Simultaneous failure is much more probably than thought. When two disks fail the array also fails and the data is no longer accessible. Often hard disks in an array are of the same age and model. Manufacturing defects and/or age is often consistent across all disks. If the user is not monitoring the RAID volume, a server can often run in a ‘degraded’ mode for a considerable time. The pressure on the remaining disks can be enough to force another failure.

Thankfully, our client didn’t suffer any data loss because we had already taken an ‘image’ of each healthy drive. By replacing the damaged disk with a scratch disk, the RAID could be restored and configured as a duplicate of the original. There can often be hazards associated with this method and there are no guarantees that everything will go smoothly. The safest and most efficient method is to recreate the array using professional tools by connecting the disks in the array as independent local drives. Mount the RAW images and complete a scan. The software will automatically calculate the data parameters and rebuild the data. The data can then be transferred from the bootable disk to the newly built RAID array.

Further reading

Recovering an IBM server single disk

Redundancy and RAID 10

Protecting your digital privacy