[mythtv] [Patch] Re: [BUG?] Threading / Sockets problem with
master and slave backends
Geoffrey Kruse
gkruse at gmail.com
Sat Sep 10 20:39:39 UTC 2005
On Sep 10, 2005, at 1:26 PM, Malc wrote:
> Malcolm Smith wrote:
>
>
>> Hi all,
>>
>>
>> Threading problems
>>
>> I think there may be problems with Sockets or threads. (See
>> attached bt).
>>
>> I can reliably reproduce this problem.
>>
>> Setup
>> Master Backend / Slave backend both in idle state. This only fails
>> when a live slave backend in idle state is present
>>
>> Method:
>> Open 2 browsers and request status simultaneously. The thread on
>> the master backend handling status / web crashes, recording
>> continues on both backend will remain in this state until killed.
>> There is never any further response from the status port
>> 6544/6543. (Sometimes it takes a few tries, so it's something to
>> do with collision timing).
>>
>> It can also be reproduced when requesting multiple activities via
>> mythweb that take time to process (rescheds, status, deletes), but
>> only ever when slave backend is present.
>>
>> For background master has DVB, slave has DVB and PVR250 card.
>> Master server also has mysql running on it.
>> Both backends are built from SVN from Weds 31 Aug, identical
>> distributions.
>>
>> back trace attached
>>
>> Because of this the WAF is dropping, as she's impatient on the
>> web.... help please.
>>
>>
>> Thread 12 (Thread 31009712 (LWP 4318)):
>> #0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>> #1 0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
>> #2 0x02bcad6a in usleep () from /lib/tls/libc.so.6
>> #3 0x007dbcaa in EITScanner::RunEventLoop (this=0x8109d90)
>> at eitscanner.cpp:62
>> #4 0x007dbc6f in EITScanner::SpawnEventLoop (param=0x8109d90)
>> at eitscanner.cpp:50
>> #5 0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
>> #6 0x02bd17da in clone () from /lib/tls/libc.so.6
>>
>> Thread 11 (Thread 129358768 (LWP 4439)):
>> #0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>> #1 0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
>> #2 0x02bcad6a in usleep () from /lib/tls/libc.so.6
>> #3 0x006c9364 in TVRec::RunTV (this=0x8101af0) at tv_rec.cpp:1612
>> #4 0x006c8dd9 in TVRec::EventThread (param=0x8101af0) at
>> tv_rec.cpp:1534
>> #5 0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
>> #6 0x02bd17da in clone () from /lib/tls/libc.so.6
>>
>> Thread 10 (Thread 98745264 (LWP 4441)):
>> #0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>> #1 0x00193eee in __lll_mutex_lock_wait () from /lib/tls/
>> libpthread.so.0
>> #2 0x00190df4 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
>> #3 0xf6e00010 in ?? ()
>> #4 0xf6f48cf8 in ?? ()
>> #5 0x070d5214 in ?? () from /usr/lib/qt-3.3/lib/libqt-mt.so.3
>> #6 0x00a76ee0 in ?? ()
>> #7 0x080ef278 in ?? ()
>> #8 0x05e2b5f8 in ?? ()
>> #9 0x06f972f0 in QRecursiveMutexPrivate::lock ()
>> from /usr/lib/qt-3.3/lib/libqt-mt.so.3 Previous frame identical
>> to this frame (corrupt stack?)
>>
>>
>>
>>
>>
> I've spent some time tracking through the code.
>
> I think I'm getting somewhere with this. This problem only seems to
> be common in the following circumstances (but will occur in other
> circumstances).
>
> Master backend is acting as a middleware (i.e. a backend to a
> client and client to slave backends/mysql). Examples of this are:
> 1. Requesting status (localhost:6545) with slave backends available
> from a web browser
> 2. Requesting sql and file activity from a frontend or browser, esp
> when slave backends are present
>
> The critical bit of code seems to be:
> - programs/mythbackend/playbacksock.cpp
>
> bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
> {
> sock->Lock();
>
> ......
>
> Thread will hang at sock->Lock and never return.
>
> I've attached a patch which does fix the problem, but does make the
> code more stable, by not getting stuck on the lock.
> What the patch does is to use tryLock to see if the lock can be
> obtained. If not, it tries once every 0.1s for 20 times. If not
> lock after that, then it aborts the SendReceive.
>
> This means whatever the calling code was trying to get done (e.g.
> schedule, delete etc) doesn't get done.... It wouldn't have
> anyway.. and would have required a restart of the masterbackend! I
> can't think of any critical activity on myth that requires critical
> confirmation and execution only once. e.g. if delete didn't work
> then just retry.. frustrating but less so than a restart.
>
> Can people try this patch, see whether it inceases stability for
> them? I've had no lockups using this code.
>
I can't get this patch to apply, I get the following error:
geoff at itchy:~/mythtv$ patch -p0 < socket.patch
patching file programs/mythbackend/playbacksock.cpp
patch: **** malformed patch at line 40: sockLock.lock();
I see this bug all the time so I can't wait to test this patch.
Geoff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2361 bytes
Desc: not available
Url : http://mythtv.org/pipermail/mythtv-dev/attachments/20050910/f6579835/smime.bin
More information about the mythtv-dev
mailing list