[mythtv-users] Hard Drive Problems
John Drescher
drescherjm at gmail.com
Tue May 1 13:16:26 UTC 2012
On Tue, May 1, 2012 at 8:55 AM, David Brieck Jr. <dbrieck at gmail.com> wrote:
> This past weekend my master backend froze up hard and wouldn't boot
> back up due to hard drive problems.
>
> I was able to boot from a USB stick and my two 1 TB WD Green drives
> eventually showed up, however unlikely as it seems, the both seem to
> have problems.
>
> The drive that's got me confused the most is currently on /dev/sda.
> It's got 38625 hours on it (4.5 yrs). I was able to mount it once
> without a problem, but the next time I had to run fsck on it and it
> came up with a ton of errors. After the errors were fixed it mounted,
> however, here's where it gets even more confusing.
>
> Here's the output from fdisk:
>
> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00047280
>
> Device Boot Start End Blocks Id System
> /dev/sda1 1 121601 976760001 83 Linux
>
> Based on this, you would see that the drive has one partition and you
> would assume it was 1 TB. However, that's not what the OS is seeing.
>
> Here's the output of df -h:
>
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda1 147G 6.8G 133G 5% /mnt/data2
>
> So where did the rest of the drive go?
Filesystem corruption. You may be able to recover from this by making
a duplicate using ddrescue then use fsck on the duplicate. ddrescue
will try to copy as many of the good sectors skipping over the bad
ones. It keeps a log file so that if you need to reboot inbetween the
copy (drive went totally offline) you can continue where you left off.
> Oddly enough, the SMART data
> says there's nothing wrong with the drive, but it is giving me errors
> on the other drive. The other drive, mounted on /dev/sdb currently
> originally didn't want to mount, but after a few reboots it mounted
> without any errors and I was able to pull data off without a problem.
>
> However, here's the SMART data for that drive:
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 142 124 021 Pre-fail
> Always - 5858
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 27
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 078 078 000 Old_age
> Always - 16072
> 10 Spin_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 24
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
> Always - 19
> 193 Load_Cycle_Count 0x0032 017 017 000 Old_age
> Always - 550422
> 194 Temperature_Celsius 0x0022 110 102 000 Old_age
> Always - 37
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
> Always - 0
> 197 Current_Pending_Sector 0x0032 192 192 000 Old_age
> Always - 1425
> 198 Offline_Uncorrectable 0x0030 200 194 000 Old_age
> Offline - 33
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
> Always - 0
> 200 Multi_Zone_Error_Rate 0x0008 148 148 000 Old_age
> Offline - 10543
>
> SMART Error Log Version: 1
> Warning: ATA error count 834 inconsistent with error log pointer 1
>
> ATA Error Count: 834 (device log contains only the most recent five errors)
> CR = Command Register [HEX]
> FR = Features Register [HEX]
> SC = Sector Count Register [HEX]
> SN = Sector Number Register [HEX]
> CL = Cylinder Low Register [HEX]
> CH = Cylinder High Register [HEX]
> DH = Device/Head Register [HEX]
> DC = Device Command Register [HEX]
> ER = Error register [HEX]
> ST = Status register [HEX]
>
> This drive is actually younger than the other drive at 16072 (1.8 yrs)
> but there are a bunch of errors on the drive.
>
> Was it a pure coincidence that both drives went bad at the same time
> or is one of them still good??
>
Have you looked at the SMART before? At work I monitor SMART for the
lifetime of all drives in my servers.
>
> I have some replacement drives on the way, but I'm not sure if any
> recordings could be recovered from the first drive that seems to be
> missing or if I should bother even trying to keep one of these drives
> around?
>
> I'm going to try to get a warranty replacement for the newer drive
> since it's still in warranty, but the older drive by all rights still
> seems like it might be good.
>
> Thoughts?
> _______________________________________________
The drive that you show the SMART data could be dieing or this could
be an isolated media defect. SMART attribute 197 and 198 should be
close to 0. Both of these represent what some call drive amnesia. The
drive wrote data that it no longer can read.
John
More information about the mythtv-users
mailing list