[mythtv] [mythtv-commits] Ticket #2782: automatic simultaneous jobs scaling

Sat Dec 9 17:35:59 UTC 2006

On Sat, 2006-12-09 at 13:57 +0200, osma ahvenlampi wrote:
> On 12/9/06, Daniel Kristjansson <danielk at cuymedia.net> wrote:
> > For most recorders people use the nice value of the backend and
> > mysql is the higher of the two since we need to write to both
> > the db and the filesystem when writing a file. I made this
> I'm not sure I understood this. Which is less nice of the two? On my
> backend, which also doubles as my primary frontend, mythbackend and
> mysqld are both nice 0 processes. I haven't tuned that since my two
> DVB tuners and generally not more than one simultaneous playback has
> never become an I/O problem.

When recording an MPEG stream MythTV needs to write the
keyframe locations to the DB. This allows for quick seeking.
If the DB blocks, MythTV waits until the DB has has finished
before writing any more A/V data to the disk. The upside
of this is that changing the niceness of mysqld has little
effect on recorder performance for most of the recorders.

> But if mysql's buffer flushes are delayed a bit due to recording
> buffer flushes, nothing is lost - and mysql writes much less. That's
> why I suggested the recorder might benefit from being less nice. Also,
> btw, might be a good reason to recommend (or even try to create)
> innodb tables instead of myisam -- innodb is better at writing through
> buffers.

Changing the mysql buffering does have an effect. I believe
we have some recommendations for this in the docs somewhere.
However, when a reschedule happens (about every 5 minutes +/-
when EIT is being collected), there is enough activity that
mysql has to read/write from/to disk. And it has to do this
before our puny keyframe write is processed.

> > We also don't want playback to run at greater niceness than
> > recording. Playback has much higher real-time requirements.
> Well - is the priority to ensure correctly recorded programs, or to
> avoid playback glitches? A recording error will cause a playback error
> anyway - again, that's why felt it might be best if recording was the
> least nice process.

Well I run all three at 0 priority. But some people can't get
playback to work at all unless it is running at high priority.
So for those users playback needs to be run at high priority.
The decoder thread and the disk reading thread are not running
at high priority, they are running at the same priority as
the recorders. Having the commercial flagging and transcode
jobs run their disk reading and writing threads at low priority
would be (is?) fine. But disk access is notorious for creating
priority inversions. Mucking with niceness works well for
allocating CPU, but not so well for allocating other resources.

> I would venture to guess that people running their backends on Linux
> 2.6 kernels are in the overwhelming majority amongst the user base.
> I'd be happy to try to work this out to support BSD, OSX and even
> Windows if someone could point me to examples of how to read system
> stats out of these systems and volunteer to test patches.

Most of us are using Linux, but I'm weary of adding Linux
dependencies. If the only possible fix on some other operating
system fixes it for Linux too, then we probably want to go
with the more portable fix.

> Does the backend even run on Windows?

Almost, Jerry Rubinow is working on it. (see Ticket #1590).
He hasn't submitted a patch in a while because he ran into
some toolchain problems, and so far the new toolchain
appears to require more changes to MythTV.

> I did say "in theory" :)

:)

> The only thing this really requires is a way to read "how busy is the
> CPU", and information on whether that busyness includes I/O wait or
> not. Like I wrote in my previous message, under these circumstances
> the patch will *reduce* competition for resources.

I hadn't considered I/O wait, that might be the simplest way to
retrofit this metric.. The I/O wait stat includes both disk and
network, right?

> Unfortunately,
> getting even that much info is entirely OS-specific. I was thinking of
> digging into libgtop to see how it's done on most systems. Which would
> be more preferable, linking against that or just using it as an
> example for a re-implementation?

Generally, if it is small re-implement, if it is optional and
the library is generally available link, if it is big and not
optional import the library. But Issac is the final arbiter on
this issue since we all can easily disagree on where a library
fits on this continuum.

> > > Or you could solve it the way Internet is usually solved -- brute
> > > force and more capacity than is going to be needed for the job at hand
> >
> > But transcoding and commercial flagging are not real-time processes,
> > we should be able to run them when recording/playback is not
> > happening, or better yet use only the disk/cpu/network resources
> > that are currently going unused.
> 
> That's exactly what I started out to do, until I heard people want to
> use resources that are NOT going unused, except sometimes they don't,
> and I must prove I'm not going to trash a recording on a system
> already 99% committed. That's a couple of objectives too many for my
> original rather simple patch :)

It's impossible to keep all resources 100% utilized with an
application like MythTV with some soft real-time requirements.
It is better to err on under-committing the resources we have.

> > I would think that monitoring the buffer fill on recording and
> > playback processes would be a good enough metric to be control
> > the throttling of transcode & commercial flagging processes.
> 
> Not a bad idea, that. How would one go about monitoring it? Although
> the buffers are worth what - a few seconds of I/O? The jobqueue
> scheduling decisions are made over minutes, so it's not really in a
> position to react to (possibly very temporary) buffer fill scenario.

Together the buffers for recording are somewhere in the 30-60
second range, but they are generally far from full. I'm thinking
that we could just keep a running average & min & max stats over
a 5 minute window. If the max comes close to 75% or the average
approaches 50% we freeze jobs, for some values less than that
we throttle the jobs.

For playback the fill of the RingBuffers could be monitored as
well but there if the min comes close to 25% or the average
falls below 50% we would freeze or throttle the jobs, resp.
(You would need to throw out the outliers when seeking.)

But before adding this instrumentation, it might make sense to
look at I/O wait for this metric, and throttling the jobs
based on that. If that works, I'm sure we can find a way to
collect that stat on other operating systems..

-- Daniel