[mythtv-users] SOLVED Random lockups on Mythbackend
f-myth-users at media.mit.edu
f-myth-users at media.mit.edu
Sun Apr 10 21:54:52 UTC 2011
> Date: Sun, 10 Apr 2011 16:37:31 -0500
> From: Larry Finger <Larry.Finger at lwfinger.net>
> I was certainly aware that memtest86+ could never prove that memory had no
> faults, but your stories are scary. That is nearly enough to suggest a new line
> of work.
Tell me about it. I just got bitten a few days ago by discovering
that the onboard Realtek NIC in an identical pair of new motherboards
I got worked fine -unless- there was also heavy PCI traffic (e.g., to
an AOC-SAT2-MV8), whereupon the machine locked up in minutes. Turning
off thottling and SS turned the lockups into veerrry slooow wedges
that eventually recovered but then led to "CPU #n stuck for 61s!"
warnings forever afterwards at intervals, even when idle, until a reboot.
Happened across a range of kernels; didn't happen if not doing heavy
gigabit, or if no PCI traffic (e.g., to native SATA ports). Didn't
happen even if I used a PCI (!) based gigabit NIC in addition to the
PCI disk controller; happened less often if I slowed down the transfer
but still happened. Identical behavior on each machine.
Fixed by dropping an Intel gigabit NIC into a free PCIe slot and
deciding I would never use a Realtek NIC again. (Tons of instability
in various kernel bug reports, as I discovered doing the research.)
Fortunately, it's a $30 NIC and I had the slot, but I'm still annoyed.
These one-minute fixes look simple in retrospect, but can take days of
running tests to be sure of really nailing down the culprit (to avoid
a cargo-cult superstition of things that aren't the real problem).
I'm getting -very- tired of discovering critical bugs in every piece
of hardware I touch.
More information about the mythtv-users
mailing list