[mythtv-users] Slightly OT - How many People have Video libraries over 8TB?
Raymond Wagner
raymond at wagnerrp.com
Sat Jul 7 21:32:20 UTC 2012
On 7/7/2012 13:34, digid myth wrote:
> I would never build a raid with 2 TB drives. Takes too long to rebuild
> the raid and 60% of the time you will have a 2nd drive fail during the
> rebuild.
Huh? There was a whole lot of FUD several years ago about UBE rates
making it probable you experience a second bit failure by the time you
finish a recovery. So a brief lesson in statistics and hard drive
design, what is going on here?
A UBE (unrecoverable bit error) or URE (unrecoverable read error) is
simply when a hard drive fails to decode a single bit. You don't
actually read the data directly. That requires some really expensive
gear, and is too slow. You read most of a whole bunch of redundant
information, and decode your data back out of it using CRC recovery
codes. Many hard drives are rated for a UBE rate or less than one in
10^14 bits. That means within 100Tbits, or 12.5Tbytes (or 11.37TB), you
are less than 50% likely to experience an event in which you don't read
enough information off the platter to decode your data from that sector.
So what happens when you experience a UBE on a RAID array? These things
are expected due to the design of hard drives. It doesn't indicate a
failed drive. It merely indicates a single failed read. The RAID array
detects the fault, decodes the data from the parity block, writes it
back to the drive the fault occurred on, and life goes on. Some
companies sell "RAID Edition" drives, and the only difference between
these and the standard consumer fare is that the standard drive will go
to Herculean lengths trying to re-read that sector to recover the data,
while the RAID Edition drive will fault out early, allowing the array to
recover the data on its own, rather than stall operations for an
extended period.
What if it's a RAID0 array, or critical RAID5 array? As mentioned, this
isn't a drive failure, it's only a sector fault. The RAID controller
isn't going to expel the card and render itself incomplete. That would
be stupid. It's going to warn the administrator, and may wait for
confirmation, before going ahead and completing the recovery of the rest
of the array. Before with a single drive, you just lost the sector, 512B
or 4KB, while now the worst case is you lose that whole stripe, maybe
1MB or so. Of course you could always run RAID6, where after the loss of
that first drive, you still have a second parity set, and your chances
of two UBEs occurring within the same stripe on two drives is
infinitesimally small.
There is no RAIDocalypse. We are not all going to lose our data. We will
just get to the point where an array with no redundancy left guarantees
some loss of data.
> In the corporate world no drives larger than 500G or so are typicly put
> in a raid group for just this reason.
In the corporate world, no drives larger than 500G or so are typically
put in a RAID group, for the sole reason that they simply don't exist.
They just don't make 10K and 15K SAS drives larger than 600GB.
> Everytime I have lost something on a raid it was during a raid rebuild
> of a failed disk.
In all likelihood, you bought a bunch of hard drives at the same time,
from the same vendor, from the same manufacturer, such that they were
all built in the same production batch. You then proceeded to operate
them in the same temperature conditions, under the same use profile. It
is not at all unreasonable that you would experience two of them failing
at nearly the same time. That's why you're supposed to buy drives from
separate lots, and periodically cycle them, so you don't get a bunch
ready to fail simultaneously.
More information about the mythtv-users
mailing list