[mythtv-users] Hard Drive reliability esp. RAID issues

Mon Mar 8 21:35:07 UTC 2010

On Mon, Mar 8, 2010 at 1:28 PM, Travis Tabbal <travis at tabbal.net> wrote:
>
>
> On Mon, Mar 8, 2010 at 1:56 PM, Ronald Frazier <ron at ronfrazier.net> wrote:
>>
>> Regarding this whole concept, I've never seen a good explanation that
>> doesn't gloss over the specifics. When they say unrecoverable read
>> error, I'm assuming they don't just mean a temporary read error (where
>> the next time we read it we'll get the correct value), as that would
>> be easy to deal with. So if we are talking about a permanently bad
>> sector, then I also assume this isn't just a case where we can scan
>> the drive ahead of time to find the problems, as again that would be
>> easy to deal with.
>>
>> So I can only assume that what they mean is that, on average, after
>> every 12TB read 1 sector randomly turns up as permanently bad (1 bit
>> ruins the whole sector). But is this REALLY happening out there? Even
>> without raid, we should be seeing issues from this on high usage
>> devices. For example. A tivo is actually recording live tv 24x7, so
>> it's reading/writing 50+ TB/year. Thus it should be getting bad
>> sectors several times per year. Has anyone ever pulled a drive from a
>> several year old tivo and found a dozen or more bad sectors on it
>> (I've still got one that ran for 5 years or so, so maybe I should
>> check it just for kicks)? I wouldn't be surprised if I, myself, have
>> transferred enough data to have encountered such an error if this is
>> really happening, but in all the years I've yet to encounter such a
>> thing as randomly bad sectors (when a sector goes bad, usually the
>> rest of the drive isn't far behind, and that's a totally different
>> issue from the one being discussed here).
>
>
> How would you know? Particularly with video content. You get a frame with
> some pixelation and it's fine again on the next. Most people would probably
> chalk that up to a signal problem rather than a disk problem. If they
> noticed it at all. If there is any error correction in the stream, it would
> probably get fixed transparently without you having any idea. As very few
> filesystems have the ability to detect corruption at this level, it's hard
> to say one way or the other based on user experience. I suppose we could set
> up a test writing known patterns to the disk and test our ability to read
> them back later. The sector might also go bad randomly before the user has
> ever written data to it. The drive firmware will re-map that sector without
> telling you about it unless it runs out of scratch space. That's probably
> why you say that when you see a bad sector, the drive isn't far behind. It's
> run out of space to map the bad sectors to and has no choice but to inform
> the user.
>
> This sort of silent corruption is one of the big reasons I use ZFS. I can
> set it to scan the array and compare the checksums to ensure the data is
> good. If it's not, it can recover the data from the redundant copies as it
> knows WHICH disk is returning bad data. Unlike normal RAID, which can tell
> that it's broken, but can't fix it.
>

Do you run ZFS on FUSE in user space or has there been a change to the
whole licensing question for ZFS?

- Mark