[mythtv] [Patch] Re: [BUG?] Threading / Sockets problem with master and slave backends

Malc malc at porsche.demon.co.uk
Sat Sep 10 20:26:34 UTC 2005


Malcolm Smith wrote:

> Hi all,
>
>
> Threading problems
>
> I think there may be problems with Sockets or threads. (See attached bt).
>
> I can reliably reproduce this problem.
>
> Setup
> Master Backend / Slave backend both in idle state. This only fails 
> when a live slave backend in idle state is present
>
> Method:
> Open 2 browsers and request status simultaneously. The thread on the 
> master backend handling status / web crashes, recording continues on 
> both backend will remain in this state until killed. There is never 
> any further response from the status port 6544/6543. (Sometimes it 
> takes a few tries, so it's something to do with collision timing).
>
> It can also be reproduced when requesting multiple activities via 
> mythweb that take time to process (rescheds, status, deletes), but 
> only ever when slave backend is present.
>
> For background master has DVB, slave has DVB and PVR250 card. Master 
> server also has mysql running on it.
> Both backends are built from SVN from Weds 31 Aug, identical 
> distributions.
>
> back trace attached
>
> Because of this the WAF is dropping, as she's impatient on the web.... 
> help please.
>
>
> Thread 12 (Thread 31009712 (LWP 4318)):
> #0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> #1  0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
> #2  0x02bcad6a in usleep () from /lib/tls/libc.so.6
> #3  0x007dbcaa in EITScanner::RunEventLoop (this=0x8109d90)
>    at eitscanner.cpp:62
> #4  0x007dbc6f in EITScanner::SpawnEventLoop (param=0x8109d90)
>    at eitscanner.cpp:50
> #5  0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
> #6  0x02bd17da in clone () from /lib/tls/libc.so.6
>
> Thread 11 (Thread 129358768 (LWP 4439)):
> #0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> #1  0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
> #2  0x02bcad6a in usleep () from /lib/tls/libc.so.6
> #3  0x006c9364 in TVRec::RunTV (this=0x8101af0) at tv_rec.cpp:1612
> #4  0x006c8dd9 in TVRec::EventThread (param=0x8101af0) at tv_rec.cpp:1534
> #5  0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
> #6  0x02bd17da in clone () from /lib/tls/libc.so.6
>
> Thread 10 (Thread 98745264 (LWP 4441)):
> #0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> #1  0x00193eee in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
> #2  0x00190df4 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
> #3  0xf6e00010 in ?? ()
> #4  0xf6f48cf8 in ?? ()
> #5  0x070d5214 in ?? () from /usr/lib/qt-3.3/lib/libqt-mt.so.3
> #6  0x00a76ee0 in ?? ()
> #7  0x080ef278 in ?? ()
> #8  0x05e2b5f8 in ?? ()
> #9  0x06f972f0 in QRecursiveMutexPrivate::lock ()
>   from /usr/lib/qt-3.3/lib/libqt-mt.so.3 Previous frame identical to 
> this frame (corrupt stack?)
>
>
>
>
I've spent some time tracking through the code.

I think I'm getting somewhere with this. This problem only seems to be 
common in the following circumstances (but will occur in other 
circumstances).

Master backend is acting as a middleware (i.e. a backend to a client and 
client to slave backends/mysql). Examples of this are:
 1. Requesting status (localhost:6545) with slave backends available 
from a web browser
 2. Requesting sql and file activity from a frontend or browser, esp 
when slave backends are present

The critical bit of code seems to be:
 - programs/mythbackend/playbacksock.cpp

 bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
 {
   sock->Lock();

  ......

Thread will hang at sock->Lock and never return.

I've attached a patch which does fix the problem, but does make the code 
more stable, by not getting stuck on the lock.
What the patch does is to use tryLock to see if the lock can be 
obtained. If not, it tries once every 0.1s for 20 times. If not lock 
after that, then it aborts the SendReceive.

This means whatever the calling code was trying to get done (e.g. 
schedule, delete etc) doesn't get done.... It wouldn't have anyway.. and 
would have required a restart of the masterbackend! I can't think of any 
critical activity on myth that requires critical confirmation and 
execution only once. e.g. if delete didn't work then just retry.. 
frustrating but less so than a restart.

Can people try this patch, see whether it inceases stability for them? 
I've had no lockups using this code.

Obviously the long term solution is to fix the problem. I'm happy to 
help discuss this problem.


More on the problem... looking at the activity

Case 1 - This is a standard activity
 Client     MasterBackend       Slave backend

  1 Req->
               1 Process
            Sendreceive Lock
                  1  Req ->
                                              -> 1Receive req
                                                   1 SlaveProcess
                                              <-  1 Response
                   1 Response <-
              Sendreceive unlock
             <- 1 Response

In the case where two requests are made quickly in a row this seems to 
be happening. (e.g. requesting status from 2 different browsers within a 
second or so of each other)
Case 2 - failure

Client     MasterBackend       Slave backend

  1 Req->
               1 Process
           1 Sendreceive Lock
                  1  Req ->
                                              -> 1Receive req
                                                   1 SlaveProcess    (in 
this case slave process takes time... maybe swapping or processing)
 2 Req ->                                                               
(new req arrives)
               2 Process
             2 SendreceiveLock                                   (In 
some cases this seems to fail, rather than waiting to come free)
                                              <-  1 Response       (If 
this response is never sent, or gets lost then thread is forever locked)
   


-------------- next part --------------
Index: programs/mythbackend/playbacksock.cpp
===================================================================
--- programs/mythbackend/playbacksock.cpp	(revision 7191)
+++ programs/mythbackend/playbacksock.cpp	(working copy)
@@ -2,6 +2,9 @@
 
 #include <iostream>
 
+// C headers
+#include <unistd.h>
+
 using namespace std;
 
 #include "playbacksock.h"
@@ -59,7 +62,23 @@
 
 bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
 {
-    sock->Lock();
+   int itertry = 0;
+
+     if (!sock->tryLock()) {    
+       VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList trying to send - could not obtain Mutex lock -- waiting" );
+        while (!sock->tryLock() && itertry<=20) {
+              usleep(100000);
+	      itertry ++;
+//           sock->Lock();
+         }
+      if (itertry>= 20) {
+         //don't send
+          VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - did not send receive"  );
+          return 0;
+        }
+          VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - loxk eventually obtained - contuning "  );
+    }
+   // Carry on
     sock->UpRef();
 
     sockLock.lock();
Index: programs/mythbackend/server.h
===================================================================
--- programs/mythbackend/server.h	(revision 7191)
+++ programs/mythbackend/server.h	(working copy)
@@ -20,6 +20,7 @@
     bool IsInProcess(void) { return inUse; }
 
     void Lock() { lock.lock(); }
+    bool tryLock() { return lock.tryLock(); }
     void Unlock() { lock.unlock(); }
 
   protected:
 


More information about the mythtv-dev mailing list