[mythtv] [Patch] Re: [BUG?] Threading / Sockets problem with master and slave backends

Geoffrey Kruse gkruse at gmail.com
Sat Sep 10 20:39:39 UTC 2005


On Sep 10, 2005, at 1:26 PM, Malc wrote:

> Malcolm Smith wrote:
>
>
>> Hi all,
>>
>>
>> Threading problems
>>
>> I think there may be problems with Sockets or threads. (See  
>> attached bt).
>>
>> I can reliably reproduce this problem.
>>
>> Setup
>> Master Backend / Slave backend both in idle state. This only fails  
>> when a live slave backend in idle state is present
>>
>> Method:
>> Open 2 browsers and request status simultaneously. The thread on  
>> the master backend handling status / web crashes, recording  
>> continues on both backend will remain in this state until killed.  
>> There is never any further response from the status port  
>> 6544/6543. (Sometimes it takes a few tries, so it's something to  
>> do with collision timing).
>>
>> It can also be reproduced when requesting multiple activities via  
>> mythweb that take time to process (rescheds, status, deletes), but  
>> only ever when slave backend is present.
>>
>> For background master has DVB, slave has DVB and PVR250 card.  
>> Master server also has mysql running on it.
>> Both backends are built from SVN from Weds 31 Aug, identical  
>> distributions.
>>
>> back trace attached
>>
>> Because of this the WAF is dropping, as she's impatient on the  
>> web.... help please.
>>
>>
>> Thread 12 (Thread 31009712 (LWP 4318)):
>> #0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>> #1  0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
>> #2  0x02bcad6a in usleep () from /lib/tls/libc.so.6
>> #3  0x007dbcaa in EITScanner::RunEventLoop (this=0x8109d90)
>>    at eitscanner.cpp:62
>> #4  0x007dbc6f in EITScanner::SpawnEventLoop (param=0x8109d90)
>>    at eitscanner.cpp:50
>> #5  0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
>> #6  0x02bd17da in clone () from /lib/tls/libc.so.6
>>
>> Thread 11 (Thread 129358768 (LWP 4439)):
>> #0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>> #1  0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
>> #2  0x02bcad6a in usleep () from /lib/tls/libc.so.6
>> #3  0x006c9364 in TVRec::RunTV (this=0x8101af0) at tv_rec.cpp:1612
>> #4  0x006c8dd9 in TVRec::EventThread (param=0x8101af0) at  
>> tv_rec.cpp:1534
>> #5  0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
>> #6  0x02bd17da in clone () from /lib/tls/libc.so.6
>>
>> Thread 10 (Thread 98745264 (LWP 4441)):
>> #0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>> #1  0x00193eee in __lll_mutex_lock_wait () from /lib/tls/ 
>> libpthread.so.0
>> #2  0x00190df4 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
>> #3  0xf6e00010 in ?? ()
>> #4  0xf6f48cf8 in ?? ()
>> #5  0x070d5214 in ?? () from /usr/lib/qt-3.3/lib/libqt-mt.so.3
>> #6  0x00a76ee0 in ?? ()
>> #7  0x080ef278 in ?? ()
>> #8  0x05e2b5f8 in ?? ()
>> #9  0x06f972f0 in QRecursiveMutexPrivate::lock ()
>>   from /usr/lib/qt-3.3/lib/libqt-mt.so.3 Previous frame identical  
>> to this frame (corrupt stack?)
>>
>>
>>
>>
>>
> I've spent some time tracking through the code.
>
> I think I'm getting somewhere with this. This problem only seems to  
> be common in the following circumstances (but will occur in other  
> circumstances).
>
> Master backend is acting as a middleware (i.e. a backend to a  
> client and client to slave backends/mysql). Examples of this are:
> 1. Requesting status (localhost:6545) with slave backends available  
> from a web browser
> 2. Requesting sql and file activity from a frontend or browser, esp  
> when slave backends are present
>
> The critical bit of code seems to be:
> - programs/mythbackend/playbacksock.cpp
>
> bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
> {
>   sock->Lock();
>
>  ......
>
> Thread will hang at sock->Lock and never return.
>
> I've attached a patch which does fix the problem, but does make the  
> code more stable, by not getting stuck on the lock.
> What the patch does is to use tryLock to see if the lock can be  
> obtained. If not, it tries once every 0.1s for 20 times. If not  
> lock after that, then it aborts the SendReceive.
>
> This means whatever the calling code was trying to get done (e.g.  
> schedule, delete etc) doesn't get done.... It wouldn't have  
> anyway.. and would have required a restart of the masterbackend! I  
> can't think of any critical activity on myth that requires critical  
> confirmation and execution only once. e.g. if delete didn't work  
> then just retry.. frustrating but less so than a restart.
>
> Can people try this patch, see whether it inceases stability for  
> them? I've had no lockups using this code.
>
I can't get this patch to apply, I get the following error:

geoff at itchy:~/mythtv$ patch -p0 < socket.patch
patching file programs/mythbackend/playbacksock.cpp
patch: **** malformed patch at line 40:      sockLock.lock();

I see this bug all the time so I can't wait to test this patch.
Geoff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2361 bytes
Desc: not available
Url : http://mythtv.org/pipermail/mythtv-dev/attachments/20050910/f6579835/smime.bin


More information about the mythtv-dev mailing list