Forum Home
Press F1
 
Thread ID: 69761 2006-06-11 08:35:00 600GB RAID 5 Array dies - dubious circumstances - please help restore! Growly (6) Press F1
Post ID Timestamp Content User
462292 2006-06-12 08:29:00 he PSU in the computer (which is a crappy Hyena one, although it works fine... well has been) throw that POS out, no point risking good hardware with dodgy power. i've had hardrives do werid things when supplied poor power. tweak'e (69)
462293 2006-06-12 12:44:00 Don't know that you will want to read this Growly, but this is what Google said:


In RAID 5, where there is a single parity block per stripe, the failure of a second drive results in total data loss.

The maximum number of drives in a RAID 5 redundancy group is theoretically unlimited, but it is common practice to limit the number of drives. The tradeoffs of larger redundancy groups are greater probability of a simultaneous double disk failure, the increased time to rebuild a redundancy group, and the greater probability of encountering an unrecoverable sector during RAID reconstruction.

As the number of disks in a RAID 5 group increases, the MTBF can become lower than that of a single disk. This happens when the likelihood of a second disk failing out of (N-1) dependent disks, within the time it takes to detect, replace and recreate a first failed disk, becomes larger than the likelihood of a single disk failing.
Looks like that version of RAID offers false security and an MTBF problem. It may be that one disk had failed already, and it was the second disk failure that caused the freeze.

Bad luck anyway, and you have my sympathy. :( That is a pretty devastating blow from both hardware and data perspectives.

Cheers

Billy 8-{)
Billy T (70)
462294 2006-06-13 06:19:00 Thanks again guys.

RAID 5, is, I guess, overrated - but there is no way I would ever expect two of four brand new Seagate drives to die. If I were to expect this to happen, I would need far more rigorous procedures to ensure redundancy than a simple RAID controller. However, most people don't even in factor in the death of one drive, so I thought I was step ahead with the file keeping :) That is partially why I am so distraught - two drives managed to drop from the arrays.

That's not to say they failed -no, I don't believe they've actually malfunctioned for one second. I believe that some very funny things happened.

In hindsight, the inclusion of that power supply may have been awfully naive. Having said that, I do not think I would've done it differently - the PSU still works, there is no abnormal behaviour, and I have hypothesised as to the cause...

That night, we had a smoke alarm go off for no reason. Father suspsected a power surge had fried something and set it off. Four hours earlier, my computer had frozen and died, and upon reset I witnessed what is now the destruction of my array. Yesterday, amidst the bad weather and in time with Auckland's power problems, our house was going through a series of brownouts. In the first one (the voltage appeared to drop to 120V, or something ridiculous), all my hardware crapped out in some shape or form. I resetted my server when power was restored, and holy crap, another drive had dropped from the array. This leads me to believe that the Hyena had not been able to cope with the surge, and had buzzed the hard drives or the controller into death.

However, unless it actually made the hard drives corrupt themselves, this should still have been recoverable. Given that Promise state that an array can be moved between controllers, it should've been easy enough to recreate the same array on the same controller and get my data back.

Why this didn't work is beyond me. When I first turned it on afterwards, Server 2k3 ran chkdsk, in which it decided to recover 90,000 odd files with "minor errors" (see above). I thought this was a good thing, but this could've actually been what killed the files. If this is the case, I will personally piss on Microsoft's building in town.

What worries me is how the data became corrupted - and why I can no longer run chkdsk without getting "unspecified errors". How come chkdsk knows the volume name, and it knows the file system type, but disk management knows squat? What gives?

Does anyone have any experience in this? Does anyone know what could've caused it?
Growly (6)
462295 2006-06-13 07:10:00 Hi, that is bad luck mate, after more consideration than most, to end up losing your data.
This is a good message to all where you have considerable important data, to have a clean power supply, better yet, UPS to allow data to be wriiten out of cache to the drive/s and graceful shutdown.
Full backs are always a must.

Please dont take offense OP, but removeable storage is the safest solution.

God, I sound like my father....i'm gone!!
SolMiester (139)
462296 2006-06-13 17:33:00 Hi, that is bad luck mate, after more consideration than most, to end up losing your data.
This is a good message to all where you have considerable important data, to have a clean power supply, better yet, UPS to allow data to be wriiten out of cache to the drive/s and graceful shutdown.
Full backs are always a must.

Please dont take offense OP, but removeable storage is the safest solution.

God, I sound like my father....i'm gone!!

No offense to your dad intended, but I have a question:

Would JBOD be any better? I quite RAID 0+1 on 4/200g's and had 4/200g's on JBOD. I have switched all of them to perform as JBODs now, and I feel that data losses would be minimum that way if one of two drives should fail. I just wonder about the associations, that's all.
SurferJoe46 (51)
462297 2006-06-14 02:28:00 Hi Joe, not that sure on JBOD as it is not really a network enivronment standard, but more a home desktop storage solution. I think, could be wrong, that JBOD is just stripe accross different size/types of disk to make one. RAID 5 is a fault tolerance solution in that it allows for array rebuild if a drive from the array fails, as 2-1=1, therefore knowing the answer, it can recreate data. SolMiester (139)
462298 2006-06-14 07:47:00 Actually, JBOD is "Just A Bunch Of Disks", or they are treated like individual disks with their own BAM, and therefore do not depend on any of them actually interfacing with the others, except for the root drive, of course!

I find access time to be really fast, as the operation of one does not interfere with the others, so data can be spooled from several sources at the same time . Naturally, I prefer the 10,000 rpm ones best, but with 8 200g hdds to feed plus 1 40g root drive, I cannot afford to upgrade yet to the 10k rpm ones .
SurferJoe46 (51)
462299 2006-06-14 11:28:00 Yeah, JBOD justs leaves the disks operating as standalone drives. This isn't fault tolerant at all, because in the event of failure you are definitely going to lose some data.

I had never counted on two drives failing, or anything like this happening, so I don't think my choice of array was really a factor. I may as well have assumed that all four drives would fail simultaneously...
Growly (6)
1 2