[mythtv] ringbuffer.cpp

Fri Feb 14 03:58:30 UTC 2014

Hi

As a matter of fact, I'm plagued with the same issues as you are and
I'm currently working on it...

On 14 February 2014 14:32, Jonas Arndt <jonas.arndt at thearndtfamily.com> wrote:
> It started out by me being tortured with the never ending
>
> =========================================================================
> Decoder ringbuffer.cpp:1086 (WaitForAvail)
> RingBuf(/second_drive/recordings/3130_20120103210000.mpg): Waited 2.0
> seconds for data #012#011#011#011t
> =========================================================================

Yes, once it enters that condition, it never comes out and will fail
after 16 seconds.

> This seems straight forward. Then the big while loop is started, which seems
> to be waiting for avail to be at least as big as count (which seems to be
> the amount of data we want). It test other things as well but they all seem
> external to the WaitForAvial function.

You have two threads going (three if you take into account the writer
used by the recorder)

The ring buffer has a read-ahead thread going, which fills the ring
buffer in advance for data you're going to need later.

This area of the code wait is triggered when there's not enough data
left in the ringbuffer ; and it's waiting for the read ahead thread to
retrieve enough data to continue.

If you analysed further, you would find that the issue occurs in the
safe_read function, it may takes several seconds for it to return. By
the time, the waiting in the ringbuffer class would have timed out and
the ringbuffer killed. By which time the mythfrontend client would
have been kicked out (and at this point it usually hangs)

>
> In the while loop we call the QWaitCondition wait (generalWait.wait(&rwlock,
> 250), with a timeout of max 250 milliseconds. Once we are out of that (worst
> case 250 ms later) we go onto checking if avail now is at least count big.
> If not we do something I don't understand this and this is where I have
> questions. First out, here is the if statement
>
> ==================================================
>         if (avail < count)
>         {
>             int elapsed = t.elapsed();
>             if (elapsed > 500 && low_buffers && avail >= fill_min)
>                 count = avail;
>             else if  (((elapsed > 250) && (elapsed < 500))  ||
>                      ((elapsed >  500) && (elapsed < 750))  ||
>                      ((elapsed > 1000) && (elapsed < 1250)) ||
>                      ((elapsed > 2000) && (elapsed < 2250)) ||
>                      ((elapsed > 4000) && (elapsed < 4250)) ||
>                      ((elapsed > 8000) && (elapsed < 8250)) ||
>                      ((elapsed > 9000)))
>             {
>                 LOG(VB_GENERAL, LOG_INFO, LOC + "Waited " +
>                     QString("%1").arg((elapsed / 250) * 0.25f, 3, 'f', 1) +
>                     " seconds for data \n\t\t\tto become available..." +
>                     QString(" %2 < %3") .arg(avail).arg(count));
>             }
>
>             if (elapsed > 16000)
>             {
>                 LOG(VB_GENERAL, LOG_ERR, LOC + "Waited " +
>                     QString("%1").arg(elapsed/1000) +
>                     " seconds for data, aborting.");
>                 return false;
>             }
>         }
> ==================================================
>
> So why do we expect elapsed to be over 500 ms here? In worst case it should
> be only 250 ms, right? At least the first time we are in the while loop. We
> would therefore automatically enter the "else if" statement the first time
> we are in the while loop, and therefore risk outputting the Error.

That area of the does two things:
1- print the error message after .2s, 05, 1s, 2s, 4s, 8s, 9s ... and
abort after 16.

There's a special case after 500ms if we have not retrieved all the
data we were looking for, but still retrieved more than the minimum
threshold...

> So what is the logic?
> if elapsed  below  500 ms => OK (can't happen the first lapse in the while
> loop)
> if elapsed between 750 and 1000 => OK
> if elapsed between 1250 and 2000 => OK
> and so on....

it's not okay, it just displays the log.

this is to prevent the log to be displayed continuously until the wait
is complete.

> To me it seems that one test for a max value would be enough. That max value
> would be something that would actually cause a problem for the client. Now
> it seems we are just polluting the log file with messages for something that
> seems to be perfectly okay with the client. I also notice that the log level
> is LOG_INFO so I am wondering why it is visible in the LOG. I run Ubuntu's
> package and it seems that thing comes with this log level enabled.

by the time the log starts to show: you have a problem.

the log just acknowledge that.
Arguably; we could up the time threshold so it doesn't display
anything until waiting for say 1s...

But you'll find that the typical size being read or waited on at a
time is usually < 100kB. So even 0.2s is *very* long to manage to read
that little amount of data.

>From what I've seen, there's an issue occurring with liveTV where you
have a thread writing the data, and a thread reading that same file.

Even though the writer has long written all the required data, the
reader doesn't see it for several seconds (to me, safe_read sometimes
locks up for 20-25s !!.

This occurs even more now that my backend is using a NFS share to
store the recordings.

But it was occurring before when I was using a local RAID5 array, with
one faulty disk

> generalWait.wait(&rwlock, 250) to generalWait.wait(&rwlock, 500)
> This change was made mostly as I felt there might be another thread that
> needed some more time (perhaps something that is writing to the buffer?)

well, there is !

>
> I then removed everything in the else-if statement and just tested if it was
> over 1000.
> With these changes I very rarely get anything in the log. I still get the
> 4-5 seconds freeze between shows though (even though that doesn't always
> happen).

you'll get less in the log, but that doesn't solve the issue causing the log.

so you will get just as many freezes as you did earlier. Just less log.

I would much prefer to fix the problem, than just hide the logs :)

>
> Any clarifications here would be awesome. I have also seen multiple other
> folks having these error messages (searching the Internet).

now if you want to have a go at the problem, feel free.

It's an area of the code I know very little, and like you I'm puzzled
with the logic used...

JY