[mythtv-users] REPAIR/OPTIMIZE in HouseKeeper (was Re: Running optimize_mythdb.pl before mythfilldatabase)

Fri Feb 23 03:25:53 UTC 2007

    > Date: Thu, 22 Feb 2007 18:34:59 -0500
    > From: "Michael T. Dean" <mtdean at thirdcontact.com>

    > > Unless you can guarantee that the housekeeper will -never- be running
    > > while a recording is in progress (or several minutes beforehand!),
    > > please do NOT add things that lock tables for extended periods of time
    > > and run when the user can't control them.

    > I'm working with Chris Pinkham on that right now.  We have an initial 
    > plan, and I'm considering making some adjustments.  (Probably should 
    > have mentioned that to Chris before saying so here, though...)  It'll 
    > probably be a couple of weeks before I have the patch (travel for work 
    > is getting in the way of my Myth time).

Good news.

    > But housekeeping isn't just running mythfilldatabase.  DailyCleanup is 
    > much more consistent about its execution time.

Okay.  Still has to avoid stepping on recordings, though.

    > > (*) E.g., because pretty much every time somebody says anything at all
    > > DB-related, you (mtdean) come back with "have you repaired & optimized?"

    > Yes.  Which is why I was considering putting it in daily cleanup.  But, 
    > I no longer plan to do so.

Right.

    > However, regarding the not-running during recordings, the approach we're 
    > taking is to allow a job to specify whether to run anyway (i.e. if 
    > recordings taking place during the job window will prevent the job from 
    > running that day due to a requested lead-time constraint).  The 
    > mythfilldatabase execution will be set to run anyway (because missing it 
    > could cause missed/incorrect recordings, which is probably more annoying 
    > to users than recording glitches).  However, daily cleanup will not be 
    > run anyway (we can always "catch up" on it tomorrow).

I'm presuming that just mfdb doesn't cause glitches 'cause it won't be
locking tables long enough (especially after the seek/buffer/lock
issue gets fixed).  If we get operational experience that mfdb -does-
glitch, then it should get run as soon as practical after DD's
suggested time, such that it doesn't also run during a recording.
That's complicated (and will tend to quantize runtimes to start on
half-hours, etc, without additional cleverness/randomization), so I'm
hoping that level of complexity is unnecessary.  (I don't have data
'cause I've arranged things such that mfdb never runs when I'm recording.)

    > So, this is the one benefit to adding another "daily" script to run.  It 
    > could be set up to run as part of daily cleanup (and /not/ ignore the 
    > specified "lead time"--as doing a daily repair/optimize isn't critical), 
    > while mfdb can be executed during a recording if absolutely necessary.

Assuming it won't glitch the recording.

    > I don't have any personal reason to choose one approach or the other, so 
    > I'll leave it up to the devs whether an extra script to configure is 
    > worth the benefit (to possibly only a small number of users)--especially 
    > since that benefit can be achieved through careful selection of cron job 
    > execution time or even through some fancy scriptwork.

    > Mike

    > [OT]:

    > /me wonders if you would still have the "trashed recordings" issue with 
    > a more current version of Myth (like 0.20-fixes or even 0.19-fixes)...

Yes.  Many others have reported exactly the same problems in .19, .20,
and SVN, which is why Chris has been looking into it.  The problem is
that the scheduler runs when a recording ends, its queries hold a lock
on the entire recordedmarkup/recordedseek table, and that hangs the
process that's emptying video buffers; see all the prior discussion
from a few weeks ago.  As long as that lock is held, nothing can help
besides huge ivtv buffers (which -still- can't be made big enough,
since this lock can be held for 30 seconds and those buffers chew up
RAM and have max size limits---and non-ivtv sources may not even allow
that level of configurability) or a more-efficient scheduler query,
and apparently that query hasn't gotten much more efficient.  (And
then you're still playing a game of catchup; the solution is to avoid
the lock completely, not just try to shave a few seconds off a marginal
situation.)

    > Only wondering because I don't have these issues with 4x pcHDTV 
    > HD-3000's (often all four are recording simultaneously), 

You're not understanding something here.  It's not an I/O load issue.

Do HD3000's not write data into recordedmarkup/recordedseek?  Are you
using innoDB?  Do you have particularly simple scheduling rules with
very few "all channels" situations?  And---very important in the other
direction---do you record 100% of all your recordings with hard
padding appended to their ends?  [After all, if you rarely postroll,
that means that one show ends -just- as one begins, which puts the
scheduler query right at the start of the new recording, where it's
likely that any corruption will be overlooked 'cause it's not actually
part of the program you're watching, or is perhaps written off as "the
tuner is weird for the first few seconds and writes bad data", or
whatever---except in the common case of a program that ends on the
half-hour in the middle of another that goes the full hour, etc.  But
if you typically postroll -and- typically have a recording still in
progress on another tuner when that postroll ends, that puts the
scheduler query smack in the middle of something you're trying to
watch, where it's really obvious.  Yet not doing pre/postrolls
guarantees (in my situation) simply losing beginnings and endings of
recordings, so that's no solution, either, and wouldn't help the
half-hour/full-hour program case, either.]

Also note that it's not just "end of recording"---deletions also cause
scheduler runs and also glitch my recordings.  I'm having to be
careful never to delete anything while a tuner is in operation,
unless I feel like waiting until only -one- tuner is running and
doing the deletion during a commercial.  This is hardly what a "PVR"
is all about... :)

So it's not a "data rate" situation---it doesn't -matter- how many
instances of the same type of tuner you have, because (unlike what
everyone told me for MONTHS until I tried every possible solution and
started asking some obnoxious questions), what's stalling the buffers
is NOT contention for the disk---it's the database lock.  You can lose
data from even a single running tuner if that lock is held.  If you've
got 4-6 running in parallel, you'll just lose data from ALL of them.

And since I never even -considered- that (a) a scheduler query had
anything to do with recordedmarkup, and (b) MySQL would lock an
entire -table- so commonly, it never even occurred to me that it
was a database lock and not I/O performance.

    > 			     MySQL server 
    > running on the master backend, non-RAID'ed PATA disks; and 
    > mythfilldatabase--and even optimize_mythdb.pl--run during recordings 
    > sometimes.  Note that I'm not saying you should upgrade as I don't know 
    > whether there is a difference that would help you.  Another system I 
    > manage has 4x PVR-250's, combined frontend/backend, with MySQL server 
    > (and Apache httpd and ProFTPd and TeamSpeak server and a bunch of other 
    > junk), non-RAID'ed PATA disks, and haven't had any recording issues with 
    > it, either.  (I used to use LVM on the SDTV system, but don't, anymore.)

Right, but again, you're saying, "Look at all this load, and yet it's
working!"  But it's not load.  I'm not losing on the disk head
thrashing, nor am I losing on total bandwidth to disk.  I'm losing
because MySQL is holding a lock for 30 seconds while the scheduler
runs, and that held lock stops the process that's reading ivtv buffers
dead in its tracks.  It LOOKED like a disk head/thrashing/contention
issue for a long time, of course, because IF enough of the DB happened
to be in memory and/or otherwise didn't happen to hit the disk too
hard, the scheduler query happened just fast enough that ivtv's
buffers didn't overflow because the lock was only held for 10 seconds
and not 20 (etc).  So I spent a few dozen hours trying (and testing,
and wondering at the inconclusiveness of the tests) all the canonical
solutions, e.g., multiple spindles, different filesystems, different
kernel I/O scheduling algorithms, optimizing the DB every four hours,
and a pile of other things that you can find documented in my previous
posts on this problem, some of which -appeared- to help but didn't
actually fix the problem for much longer than the duration of the
test.