[mythtv-users] Is this a failing drive?

Don Brett dlbrett at zoominternet.net
Tue Nov 22 03:32:33 UTC 2011


On 11/21/2011 10:29 PM, Don Brett wrote:
> On 11/21/2011 11:12 AM, Keith Pyle wrote:
>> On 11/21/11 06:00, Manuel McLure wrote:
>>> On Sun, Nov 20, 2011 at 7:40 PM, Don Brett<dlbrett at zoominternet.net>  wrote:
>>>>> This is a little off-topic, but it's a new drive on a new Mythbuntu
>>>>> installation. ?Symptoms are:
>>>>>
>>>>> -partition table got corrupted (after about 50 hours on the drive); an
>>>>> 8-hour low level format got it back
>>>>> -the box occasionally freezes-up for a few seconds
>>>>> -I see multiple instances of these errors in /var/log/syslog:
>>>>>
>>>>> Nov 20 09:54:02 zedo kernel: [ ? ?6.292384] EXT4-fs (sda2): re-mounted.
>>>>> Opts: errors=remount-ro
>>>>> Nov 20 09:55:12 zedo kernel: [ ? 81.573425] EXT4-fs (sda2): re-mounted.
>>>>> Opts: errors=remount-ro,commit=0
>>>>> Nov 20 09:58:51 zedo kernel: [ ?300.004041] [Hardware Error]: Machine
>>>>> check events logged
>>>>>
>>>>>
>>>>> ?From /var/log/mcelog, I see multiple entries of this:
>>>>>
>>>>> mcelog: failed to prefill DIMM database from DMI data
>>>>> Kernel does not support page offline interface
>>>>> mcelog: mcelog read: No such device
>>>>> Hardware event. This is not a software error.
>>>>> MCE 0
>>>>> CPU 0 4 northbridge
>>>>> MISC c008000001000000 ADDR 1844184
>>>>> TIME 1321843678 Sun Nov 20 21:47:58 2011
>>>>> ? Northbridge NB Array Error
>>>>> ? ? ? ?bit42 = L3 subcache in error bit 0
>>>>> ? ? ? ?bit43 = L3 subcache in error bit 1
>>>>> ? ? ? ?bit46 = corrected ecc error
>>>>> ? ? ? ?bit59 = misc error valid
>>>>> ? ? ? ?bit62 = error overflow (multiple errors)
>>>>> ? memory/cache error 'evict mem transaction, generic transaction, level
>>>>> generic'
>>>>> STATUS dc074c60001c017b MCGSTATUS 0
>>>>> MCGCAP 106 APICID 0 SOCKETID 0
>>>>> CPUID Vendor AMD Family 16 Model 5
>>>>> Hardware event. This is not a software error.
>>>>>
>>>>>
>>>>> I replaced the sata drive cables, disconnected the dvd drive, tried a
>>>>> different power supply. ?I also tried another drive on the same box (but
>>>>> it was an ide) and had no errors. ?With a different motherboard...still
>>>>> got the "re-mount" errors but none of the "[Hardware Error]" entries.
>>>>> Anyone have a suggestion?
>>>>>
>>>>>
>>>>> PS - the hardware is: (everything but the power supply and case is new)
>>>>> -ASUS M4A78LT-M AM3 AMD 760G HDMI Micro ATX AMD Motherboard
>>>>> -SAMSUNG EcoGreen F4 HD204UI 2TB SATA 3.0Gb/s 3.5" Internal Hard Drive
>>>>> -ADATA Gaming Series 2GB 240-Pin DDR3 SDRAM DDR3 1600 (PC3 12800)
>>>>> Desktop Memory
>>>>> -ZOTAC ZT-20203-10L GeForce GT 220 1GB 128-bit DDR2 PCI Express 2.0 x16
>>>>> HDCP Ready Video Card
>>>>> -ThermalTake 430 power supply
>>> It's not a disk problem, that's a CPU or motherboard problem. The disk
>>> corruption is caused by your memory contents getting corrupted and
>>> being written to disk.
>>>
>>> Seehttp://halobates.de/mce.pdf  for details on exactly what a "machine
>>> check exception" is.
>> With the caveat that I'm not an expert on this...
>>
>> I suspect your cache memory or perhaps the Northbridge (handles
>> communication among cores, possibly RAM, video) has a problem.
>> Depending on your specific CPU, the Northbridge may be on the CPU, i.e.,
>> CPU cores and Northbridge are all in one package.  This is the case for
>> many recent, mainstream processors from both Intel and AMD.
>>
>> The errors you included suggest a multi-bit error in the L3 cache.  L3
>> is special memory where (a limited amount of) recently used instructions
>> and data are stored for faster access by the CPU than going to main
>> RAM.  As Manuel wrote, if the cache is corrupted, it could lead to all
>> manner of intermittent and seemingly random problems, including those
>> you mentioned.
>>
>> If this is a Northbridge/cache problem and the Northbridge is on the CPU
>> die, then your only fix will be to replace the CPU.  If the Northbridge
>> is a separate chip on the motherboard, then you'll have to replace the
>> motherboard but could keep the CPU.
>>
>> It may be worthwhile trying to see if Asus will help you if this a new
>> motherboard.  (I have no personal experience with Asus support and don't
>> know if/how they will help.)
>>
>> Keith
>> _______________________________________________
>> mythtv-users mailing list
>> mythtv-users at mythtv.org
>> http://www.mythtv.org/mailman/listinfo/mythtv-users
>>
>>
>
> I just notice that I hadn't included the cpu on the list of hardware; 
> it's an AMD Athlon II X3 445 Rana (3.1GHz Socket AM3 95W Triple-Core 
> Desktop Processor ADX445WFGMBOX).  Apparently this processor doesn't 
> have an L3 cache (from Toms Hardware - *Rana*, triple-core, no L3 
> cache (2.7+ GHz)), does that matter, or is the cache on the motherboard?
>
> I looked up the features on the motherboard chipset, it has a North 
> Bridge (AMD 760G); I assume that means the cpu does not have an 
> integrated northbridge...right?  So it looks like my problem might be 
> with the motherboard.
>
> Sidenote - Some other threads implied it might be a memory problem, so 
> I played with it a little.  The box started with (2) 2 gig ddr3's.  I 
> removed one of the sticks...similar behavior.  Replaced that with the 
> other stick (still running with a single stick)...errors increased a 
> lot, 2-3 errors a minute.   Does that mean anything?
>
>
>
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://www.mythtv.org/mailman/listinfo/mythtv-users
I forgot to mention, I ran memtest86.  I showed no errors after 5 passes 
(it ran for about 4 hours).

Don

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.mythtv.org/pipermail/mythtv-users/attachments/20111121/5292bfe9/attachment.html 


More information about the mythtv-users mailing list