[mythtv] [mythtv-commits] Ticket #9704: backend deadlocks up with Protocol version check failure.

Wed Apr 20 13:31:52 UTC 2011

On Wed, Apr 20, 2011 at 12:44 AM,  <noreply at mythtv.org> wrote:
> #9704: backend deadlocks up with Protocol version check failure.
> -----------------------------------+----------------------------
>  Reporter:  brian@…                |          Owner:
>     Type:  Bug Report - General   |         Status:  new
>  Priority:  critical               |      Milestone:  unknown
> Component:  MythTV - General       |        Version:  0.24-fixes
>  Severity:  high                   |     Resolution:
>  Keywords:                         |  Ticket locked:  0
> -----------------------------------+----------------------------
>
> Comment (by markk):
>
>  This is not an area of the code I'm familiar with, but here's my not so
>  quick but still quite dirty analysis of the backtrace (using latest
>  0.24-fixes code):-
>
>  Thread 1 (the main thread)
>
>  - has received a MASTER_UPDATE_PROG_INFO message (mainserver.cpp @ 1001).
>  I presume this is the result of a recording finishing, though I'm not too
>  sure. In the process of handling that message, it asks for the recording
>  status of the program referenced in the message. It gets this from the
>  scheduler - scheduler.cpp, GetRecStatus @ 1534 and immediately tries to
>  obtain a lock on access to the scheduler which it presumably never
>  obtains. The main loop is now deadlocked.
>
>
>  Thread 20 - the scheduler thread
>
>  - is in the middle of the Scheduler::RunScheduler loop (scheduler.cpp @
>  1688). This obtains the scheduler lock at line 1730 and is still holding
>  it when it calls AutoExpire::Update at line 2152. AutoExpire::Update tries
>  to get a lock on the global AutoExpire object - and again presumably
>  fails.
>
>  Thread 18 - the AutoExpire thread
>
>  - is in the middle of AutoExpire::RunExpirer (line 310) and is holding the
>  global autoexpire lock. RunExpirer then proceeds down to the ProgramInfo
>  class where it is accessing the database. It's not clear whether the
>  database access is stalled for some reason or whether this is just where
>  the interrupt happened to land.
>
>  So the main thread (1) is waiting on the scheduler lock, the scheduler
>  thread (20) holds that lock but is waiting for the autoexpire lock, the
>  autoexpire thread (18) holds the autoexpire lock - but I have no idea if
>  it's released or not.

I don't want to pile on, but I am having the exact same issue, can
reproduce it at will, and have a debug version compiled and ready to
test, if a developer needs any more info/backtrace/logs/etc.

I am using current trunk as of yesterday, with 2xHDHR, 2xPVR-x50 (MBE)
and 1 HD-PVR (SBE), and it's *barely* useable. The only way to keep it
from deadlocking is to run with log verbosity set to at a minimum
'socket', and it's spending so much time logging that only one
frontend can view recordings at a time, otherwise the server is bogged
down so much that it just stutters. I have turned off the SBE for now,
just to reduce the load a bit.

Just looking at /var/log/messages and I see mythcommflag core dumped 4
times last night as well, if it helps.

Thanks
Tom