[mythtv-users] Hard Drive reliability esp. RAID issues

Mon Mar 8 21:28:19 UTC 2010

On Mon, Mar 8, 2010 at 1:56 PM, Ronald Frazier <ron at ronfrazier.net> wrote:

> Regarding this whole concept, I've never seen a good explanation that
> doesn't gloss over the specifics. When they say unrecoverable read
> error, I'm assuming they don't just mean a temporary read error (where
> the next time we read it we'll get the correct value), as that would
> be easy to deal with. So if we are talking about a permanently bad
> sector, then I also assume this isn't just a case where we can scan
> the drive ahead of time to find the problems, as again that would be
> easy to deal with.
>
> So I can only assume that what they mean is that, on average, after
> every 12TB read 1 sector randomly turns up as permanently bad (1 bit
> ruins the whole sector). But is this REALLY happening out there? Even
> without raid, we should be seeing issues from this on high usage
> devices. For example. A tivo is actually recording live tv 24x7, so
> it's reading/writing 50+ TB/year. Thus it should be getting bad
> sectors several times per year. Has anyone ever pulled a drive from a
> several year old tivo and found a dozen or more bad sectors on it
> (I've still got one that ran for 5 years or so, so maybe I should
> check it just for kicks)? I wouldn't be surprised if I, myself, have
> transferred enough data to have encountered such an error if this is
> really happening, but in all the years I've yet to encounter such a
> thing as randomly bad sectors (when a sector goes bad, usually the
> rest of the drive isn't far behind, and that's a totally different
> issue from the one being discussed here).
>

How would you know? Particularly with video content. You get a frame with
some pixelation and it's fine again on the next. Most people would probably
chalk that up to a signal problem rather than a disk problem. If they
noticed it at all. If there is any error correction in the stream, it would
probably get fixed transparently without you having any idea. As very few
filesystems have the ability to detect corruption at this level, it's hard
to say one way or the other based on user experience. I suppose we could set
up a test writing known patterns to the disk and test our ability to read
them back later. The sector might also go bad randomly before the user has
ever written data to it. The drive firmware will re-map that sector without
telling you about it unless it runs out of scratch space. That's probably
why you say that when you see a bad sector, the drive isn't far behind. It's
run out of space to map the bad sectors to and has no choice but to inform
the user.

This sort of silent corruption is one of the big reasons I use ZFS. I can
set it to scan the array and compare the checksums to ensure the data is
good. If it's not, it can recover the data from the redundant copies as it
knows WHICH disk is returning bad data. Unlike normal RAID, which can tell
that it's broken, but can't fix it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mythtv.org/pipermail/mythtv-users/attachments/20100308/7c2bf5ee/attachment.htm>