[mythtv-users] Hard Drive Problems

Tue May 1 13:16:26 UTC 2012

On Tue, May 1, 2012 at 8:55 AM, David Brieck Jr. <dbrieck at gmail.com> wrote:
> This past weekend my master backend froze up hard and wouldn't boot
> back up due to hard drive problems.
>
> I was able to boot from a USB stick and my two 1 TB WD Green drives
> eventually showed up, however unlikely as it seems, the both seem to
> have problems.
>
> The drive that's got me confused the most is currently on /dev/sda.
> It's got 38625 hours on it (4.5 yrs). I was able to mount it once
> without a problem, but the next time I had to run fsck on it and it
> came up with a ton of errors. After the errors were fixed it mounted,
> however, here's where it gets even more confusing.
>
> Here's the output from fdisk:
>
> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00047280
>
>   Device Boot      Start         End      Blocks   Id  System
> /dev/sda1               1      121601   976760001   83  Linux
>
> Based on this, you would see that the drive has one partition and you
> would assume it was 1 TB. However, that's not what the OS is seeing.
>
> Here's the output of df -h:
>
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1             147G  6.8G  133G   5% /mnt/data2
>
> So where did the rest of the drive go?

Filesystem corruption. You may be able to recover from this by making
a duplicate using ddrescue then use fsck on the duplicate. ddrescue
will try to copy as many of the good sectors skipping over the bad
ones. It keeps a log file so that if you need to reboot inbetween the
copy (drive went totally offline) you can continue where you left off.

> Oddly enough, the SMART data
> says there's nothing wrong with the drive, but it is giving me errors
> on the other drive. The other drive, mounted on /dev/sdb currently
> originally didn't want to mount, but after a few reboots it mounted
> without any errors and I was able to pull data off without a problem.
>
> However, here's the SMART data for that drive:
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>  3 Spin_Up_Time            0x0027   142   124   021    Pre-fail
> Always       -       5858
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       27
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   078   078   000    Old_age
> Always       -       16072
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       24
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always       -       19
> 193 Load_Cycle_Count        0x0032   017   017   000    Old_age
> Always       -       550422
> 194 Temperature_Celsius     0x0022   110   102   000    Old_age
> Always       -       37
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   192   192   000    Old_age
> Always       -       1425
> 198 Offline_Uncorrectable   0x0030   200   194   000    Old_age
> Offline      -       33
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   148   148   000    Old_age
> Offline      -       10543
>
> SMART Error Log Version: 1
> Warning: ATA error count 834 inconsistent with error log pointer 1
>
> ATA Error Count: 834 (device log contains only the most recent five errors)
>        CR = Command Register [HEX]
>        FR = Features Register [HEX]
>        SC = Sector Count Register [HEX]
>        SN = Sector Number Register [HEX]
>        CL = Cylinder Low Register [HEX]
>        CH = Cylinder High Register [HEX]
>        DH = Device/Head Register [HEX]
>        DC = Device Command Register [HEX]
>        ER = Error register [HEX]
>        ST = Status register [HEX]
>
> This drive is actually younger than the other drive at 16072 (1.8 yrs)
> but there are a bunch of errors on the drive.
>
> Was it a pure coincidence that both drives went bad at the same time
> or is one of them still good??
>
Have you looked at the SMART before? At work I monitor SMART for the
lifetime of all drives in my servers.
>
> I have some replacement drives on the way, but I'm not sure if any
> recordings could be recovered from the first drive that seems to be
> missing or if I should bother even trying to keep one of these drives
> around?
>
> I'm going to try to get a warranty replacement for the newer drive
> since it's still in warranty, but the older drive by all rights still
> seems like it might be good.
>
> Thoughts?
> _______________________________________________

The drive that you show the SMART data could be dieing or this could
be an isolated media defect. SMART attribute 197 and 198 should be
close to 0. Both of these represent what some call drive amnesia. The
drive wrote data that it no longer can read.

John