[mythtv-users] SOLVED Random lockups on Mythbackend

f-myth-users at media.mit.edu f-myth-users at media.mit.edu
Sun Apr 10 21:54:52 UTC 2011


    > Date: Sun, 10 Apr 2011 16:37:31 -0500
    > From: Larry Finger <Larry.Finger at lwfinger.net>

    > I was certainly aware that memtest86+ could never prove that memory had no 
    > faults, but your stories are scary. That is nearly enough to suggest a new line 
    > of work.

Tell me about it.  I just got bitten a few days ago by discovering
that the onboard Realtek NIC in an identical pair of new motherboards
I got worked fine -unless- there was also heavy PCI traffic (e.g., to
an AOC-SAT2-MV8), whereupon the machine locked up in minutes.  Turning
off thottling and SS turned the lockups into veerrry slooow wedges
that eventually recovered but then led to "CPU #n stuck for 61s!"
warnings forever afterwards at intervals, even when idle, until a reboot.

Happened across a range of kernels; didn't happen if not doing heavy
gigabit, or if no PCI traffic (e.g., to native SATA ports).  Didn't
happen even if I used a PCI (!) based gigabit NIC in addition to the
PCI disk controller; happened less often if I slowed down the transfer
but still happened.  Identical behavior on each machine.

Fixed by dropping an Intel gigabit NIC into a free PCIe slot and
deciding I would never use a Realtek NIC again.  (Tons of instability
in various kernel bug reports, as I discovered doing the research.)
Fortunately, it's a $30 NIC and I had the slot, but I'm still annoyed.

These one-minute fixes look simple in retrospect, but can take days of
running tests to be sure of really nailing down the culprit (to avoid
a cargo-cult superstition of things that aren't the real problem).

I'm getting -very- tired of discovering critical bugs in every piece
of hardware I touch.


More information about the mythtv-users mailing list