[mythtv] ringbuffer.cpp

Fri Feb 14 04:20:19 UTC 2014

JY,

Thanks for your clarifications and pointers. I will need some time to 
process your information and to look further into the code. Your point 
that my chances only removed the log entries but not fixed the problem 
is well understood and I totally agree. I don't have the long lockups 
that you are mentioning, but just a few seconds and almost always when 
there is a show ending and another starting.

Cheers,

// Jonas

On 02/13/2014 08:58 PM, Jean-Yves Avenard wrote:
> Hi
>
> As a matter of fact, I'm plagued with the same issues as you are and
> I'm currently working on it...
>
>
> On 14 February 2014 14:32, Jonas Arndt <jonas.arndt at thearndtfamily.com> wrote:
>> It started out by me being tortured with the never ending
>>
>> =========================================================================
>> Decoder ringbuffer.cpp:1086 (WaitForAvail)
>> RingBuf(/second_drive/recordings/3130_20120103210000.mpg): Waited 2.0
>> seconds for data #012#011#011#011t
>> =========================================================================
> Yes, once it enters that condition, it never comes out and will fail
> after 16 seconds.
>
>
>> This seems straight forward. Then the big while loop is started, which seems
>> to be waiting for avail to be at least as big as count (which seems to be
>> the amount of data we want). It test other things as well but they all seem
>> external to the WaitForAvial function.
> You have two threads going (three if you take into account the writer
> used by the recorder)
>
> The ring buffer has a read-ahead thread going, which fills the ring
> buffer in advance for data you're going to need later.
>
> This area of the code wait is triggered when there's not enough data
> left in the ringbuffer ; and it's waiting for the read ahead thread to
> retrieve enough data to continue.
>
> If you analysed further, you would find that the issue occurs in the
> safe_read function, it may takes several seconds for it to return. By
> the time, the waiting in the ringbuffer class would have timed out and
> the ringbuffer killed. By which time the mythfrontend client would
> have been kicked out (and at this point it usually hangs)
>
>> In the while loop we call the QWaitCondition wait (generalWait.wait(&rwlock,
>> 250), with a timeout of max 250 milliseconds. Once we are out of that (worst
>> case 250 ms later) we go onto checking if avail now is at least count big.
>> If not we do something I don't understand this and this is where I have
>> questions. First out, here is the if statement
>>
>> ==================================================
>>          if (avail < count)
>>          {
>>              int elapsed = t.elapsed();
>>              if (elapsed > 500 && low_buffers && avail >= fill_min)
>>                  count = avail;
>>              else if  (((elapsed > 250) && (elapsed < 500))  ||
>>                       ((elapsed >  500) && (elapsed < 750))  ||
>>                       ((elapsed > 1000) && (elapsed < 1250)) ||
>>                       ((elapsed > 2000) && (elapsed < 2250)) ||
>>                       ((elapsed > 4000) && (elapsed < 4250)) ||
>>                       ((elapsed > 8000) && (elapsed < 8250)) ||
>>                       ((elapsed > 9000)))
>>              {
>>                  LOG(VB_GENERAL, LOG_INFO, LOC + "Waited " +
>>                      QString("%1").arg((elapsed / 250) * 0.25f, 3, 'f', 1) +
>>                      " seconds for data \n\t\t\tto become available..." +
>>                      QString(" %2 < %3") .arg(avail).arg(count));
>>              }
>>
>>              if (elapsed > 16000)
>>              {
>>                  LOG(VB_GENERAL, LOG_ERR, LOC + "Waited " +
>>                      QString("%1").arg(elapsed/1000) +
>>                      " seconds for data, aborting.");
>>                  return false;
>>              }
>>          }
>> ==================================================
>>
>> So why do we expect elapsed to be over 500 ms here? In worst case it should
>> be only 250 ms, right? At least the first time we are in the while loop. We
>> would therefore automatically enter the "else if" statement the first time
>> we are in the while loop, and therefore risk outputting the Error.
> That area of the does two things:
> 1- print the error message after .2s, 05, 1s, 2s, 4s, 8s, 9s ... and
> abort after 16.
>
> There's a special case after 500ms if we have not retrieved all the
> data we were looking for, but still retrieved more than the minimum
> threshold...
>
>> So what is the logic?
>> if elapsed  below  500 ms => OK (can't happen the first lapse in the while
>> loop)
>> if elapsed between 750 and 1000 => OK
>> if elapsed between 1250 and 2000 => OK
>> and so on....
> it's not okay, it just displays the log.
>
> this is to prevent the log to be displayed continuously until the wait
> is complete.
>
>> To me it seems that one test for a max value would be enough. That max value
>> would be something that would actually cause a problem for the client. Now
>> it seems we are just polluting the log file with messages for something that
>> seems to be perfectly okay with the client. I also notice that the log level
>> is LOG_INFO so I am wondering why it is visible in the LOG. I run Ubuntu's
>> package and it seems that thing comes with this log level enabled.
> by the time the log starts to show: you have a problem.
>
> the log just acknowledge that.
> Arguably; we could up the time threshold so it doesn't display
> anything until waiting for say 1s...
>
> But you'll find that the typical size being read or waited on at a
> time is usually < 100kB. So even 0.2s is *very* long to manage to read
> that little amount of data.
>
>  From what I've seen, there's an issue occurring with liveTV where you
> have a thread writing the data, and a thread reading that same file.
>
> Even though the writer has long written all the required data, the
> reader doesn't see it for several seconds (to me, safe_read sometimes
> locks up for 20-25s !!.
>
> This occurs even more now that my backend is using a NFS share to
> store the recordings.
>
> But it was occurring before when I was using a local RAID5 array, with
> one faulty disk
>
>> generalWait.wait(&rwlock, 250) to generalWait.wait(&rwlock, 500)
>> This change was made mostly as I felt there might be another thread that
>> needed some more time (perhaps something that is writing to the buffer?)
> well, there is !
>
>> I then removed everything in the else-if statement and just tested if it was
>> over 1000.
>> With these changes I very rarely get anything in the log. I still get the
>> 4-5 seconds freeze between shows though (even though that doesn't always
>> happen).
> you'll get less in the log, but that doesn't solve the issue causing the log.
>
> so you will get just as many freezes as you did earlier. Just less log.
>
> I would much prefer to fix the problem, than just hide the logs :)
>
>> Any clarifications here would be awesome. I have also seen multiple other
>> folks having these error messages (searching the Internet).
> now if you want to have a go at the problem, feel free.
>
> It's an area of the code I know very little, and like you I'm puzzled
> with the logic used...
>
> JY
> _______________________________________________
> mythtv-dev mailing list
> mythtv-dev at mythtv.org
> http://www.mythtv.org/mailman/listinfo/mythtv-dev
> http://wiki.mythtv.org/Mailing_List_etiquette
> MythTV Forums: https://forum.mythtv.org