[mythtv-users] Slow MySQL query after delete

Thu Sep 6 21:36:38 UTC 2007

    > Date: Thu, 06 Sep 2007 17:04:32 -0400
    > From: "Michael T. Dean" <mtdean at thirdcontact.com>

    > On 09/06/2007 04:14 PM, f-myth-users at media.mit.edu wrote:
    > >     > Date: Wed, 05 Sep 2007 13:00:03 -0400
    > >     > From: "Michael T. Dean" <mtdean at thirdcontact.com>
    > >
    > >     > On 09/05/2007 12:46 PM, David Brodbeck wrote:
    > >     > > So it's still there, it's just hidden.  I guess if MySQL (or MythTV)  
    > >     > > quietly ran an OPTIMIZE TABLE once a week, instead of my scheduling  
    > >     > > it in cron,
    > >
    > >     > MythTV should be doing the OPTIMIZE, ANALYZE, and REPAIR--as well as 
    > >
    > > Doesn't OPTIMIZE subsume ANALYZE? 

    > Quite possible.  Someone else mentioned the same.  However, the patch
    > will be taking into account the different storage types.

Oh, whoops.  Hadn't considered that.  (Does Myth have a "suggested" or
(gulp) "supported" set of storage engines, or is it claimed that any
storage engine MySQL supports will "work"?  I'm assuming 99% of
everybody leaves theirs at the default, but of course we don't know.)

    > So, for BDB tables, OPTIMIZE won't be run, but ANALYZE will be.  For
    > MyISAM, OPTIMIZE will be run (and, it seems ANALYZE won't), for InnoDB,
    > I may do OPTIMIZE and let the DB deal with version differences (though
    > I'm still a bit undecided on that).  If the storage is neither MyISAM,
    > BDB, nor InnoDB, OPTIMIZE won't be run. ...  (Probably easier, though,
    > to just wait for the patch and see. :)

Yeah.

    > >  (And presumably the output from your
    > > patch gets logged somewhere---if I have a damaged table that needed
    > > repair, I wanna know about it pronto, because otherwise I might miss
    > > whatever's damaging it---or have some "wierd bug" and always wonder,
    > > "Could it have been a damaged table that was repaired behind my back?")

    > With appropriate verbosity, yes.  :)  As of now, I'm planning for
    > damaged tables or repair errors to cause a notification at important. 
    > REPAIR will only be run if there are, in fact, damaged tables and will
    > only be run on the damaged tables, themselves.

Sounds good.

    > >     > doing automated backups--by 0.21.  (If I can just get some time to
    > >     > finish up the patches.)
    > >
    > > I'm not sure if this ever got resolved some months ago when it was
    > > initially discussed, and I can't find anything in the archives, so just
    > > to confirm---it will take care to never do these while any streams are
    > > recording on -any- backend, or due to record "soon", right? 

    > Yep.  That's why the patch is taking so long.  That's a pretty major
    > change that I want to do "right", so it's not a quick change (and
    > requires my finding more time (when I'm around home) to work on it).

Ok.

    > >  (Where
    > > "soon" is configurable in some way by the user, who is the one in the
    > > best position to know how long these actions should take with the
    > > local configuration.)

    > That I don't promise.  It's likely that the "set-aside" time will be
    > significantly greater than (multiples of) the time required on even slow
    > systems.  The repair/optimize isn't critical for a daily run, so it will
    > prefer to miss it rather than affect recordings.

Yes, certainly.

(My idea of "configurable" was "read something from the settings
table", which allows people who really, really care to modify it,
without having to gum up the UI with it.  E.g., just put the number of
minutes it must run in advance in the DB as a setting and not as some
figure compiled into the binary.  Yes, another of the "let users touch
the DB directly" applications... :)

    > > Otherwise it will certainly corrupt streams, at least in my setup,
    > > even with the DB on a separate spindle.  This is of course due to the
    > > locking involved and the disk I/O.  (The locking will also tend to
    > > hang the UI, though that's somewhat easier to tolerate, but ideally
    > > of course it'd also be configurable to run at times when the UI is
    > > unlikely to be in use.)
    > >
    > > I think I also mentioned that it'd be very nice for scriptable hooks
    > > in this both -before- and -after- the run, e.g., so the freshly-made
    > > backup could be put somewhere else as well,

    > Handled by storage groups.  Create a "backup" storage group (name not
    > set in stone) to specify the location for backups.  If there is no
    > backup storage group, the default storage group will be used.

Clever!

(And if there are two?  When I back up my DB, I do a mysqldump through
gzip --best into a file, and then copy that file to a different disk.
I keep backups on the two disks for very different amounts of time
because they have very different sizes.  Other users might well want
to rsync or scp the backup elsewhere, immediately [rather than poll
for it, etc].  And the backups have standard yyyymmddhhmmss filenames
(rather than .1, .2) so they won't rotate and bloat the process that
backs up the whole machine...  and also to make it trivial to find the
"right" backup if I break something and need to reload the DB.)

    > >  etc, especially if users
    > > disagree about how often they're made, how many get preserved, and
    > > where they go.  I think this is likely to be a large point of
    > > variability and you have to account for people's own local knowledge
    > > about what's where and how much redundancy and history they want.

    > Much of it may not be very much user-configurable (at least at first). 
    > Having a bunch of SVN trunk users testing the same configuration will
    > make it much easier to identify where my assumptions were overly
    > optimistic.  Eventually, user-specifiable settings may be added (though
    > I have to admit I'm firmly in the "we have too many settings" camp).

Hence my comment about at least having a hook that says, "If -this-
value in settings is filled in, we'll exec it (presumably a script"
before the optimize/backup happens, and if -this- value is set, we'll
do it afterwards."  These sorts of hooks exist in all kinds of things
and make it easy to add functionality without having to embed it in
the app itself---and presumably won't make it any harder to test the
app itself.  Again, I'm thinking just fields in the DB settings table
for now---no UI.  (Basically reuse the jobqueue or channel-changer code.)

    > > Lacking noninterference & hooks, then I assume there'll be a way to
    > > turn this off so users who don't like the default can run things the
    > > way they'd like to instead.

    > Again, don't know about to start with, but...

Just an "off" switch in the DB?

    > > P.S.  Ditto for when mfdb runs, and there the "soon" must be
    > > configurable to a much larger value (potentially 10-15 minutes
    > > or more, including time for the download),

    > I'm thinking greater than that for even the repair/optimize (more like
    > 25min for repair/optimize).  mfdb may be more like 55 minutes (though
    > mfdb will be allowed to run when the set aside would prevent a run for a
    > "period"--i.e. if the set aside would mean missing the window to run
    > mfdb that day).

By "set aside", do you mean "'cause we're recording" or "'cause we're
not in our preferred runtime window"?

(I'd be a little concerned if, for example, the preferred times kept
landing on the afternoon and evening---my prime recording times---and
would thus damage streams as Myth made a last-ditch effort to run mfdb
at all lest it miss a day.  Whereas I typically force runs in the
morning (but not the wee hours, 'cause we knew that to be bad in the
DD days) since I know it won't trash recordings then (and I know
exactly when it'll run and hope to spot the rare recording during
that time period in advance).)

    > >  since mfdb will also
    > > acquire table locks and lead to general disk-thrashing that will
    > > corrupt recordings.  This might include ignoring the preferred
    > > download time if it's the only way to avoid stepping on recordings
    > > without potentially waiting forever.

    > I don't plan to allow mfdb to ignore preferred download time as some
    > users may have to pay extra tariffs or whatever for downloading outside
    > the specified times.  Instead, if they're having recording issues due to
    > the times they've specified, and they want to modify the allowed run
    > times, they'll need to modify those times.

Ah, we're using different terminology.  You're talking about "the
times they set using mythtv-setup" and I'm talking about the times
that DD used to (and SD still does?) send during a retrieval that say
"do your next run no earlier than -this- time".

So by "preferred times" in my paragraph above, etc, I mean the times
supplied by the data source, not the windows specified by the user
(who presumably knows what he's doing and whose preferences should be
respected).

    > > [Obviously, the "noninterference" logic would be useful for a large
    > > variety of background tasks and it would be nice if there was an easy
    > > way to hook into it

    > Right.  I'm hoping the same infrastructure can be used for the job queue.

Yes.

    > > ---especially from shell scripts and the like, so
    > > quick maintenance operations could be thrown together without
    > > requiring recompiling Myth itself, etc.  Right now I kluge this
    > > by doing what MythWeb would do to get status and then looking for
    > > any tuners marked "in use" or recording "soon" by my own definition,
    > > but that's clumsy.]

    > I'll probably work on getting in place first, then look at adding
    > "generic" housekeeping scripts or "non-recording" user jobs or whatever.

Right, makes sense.

Tnx!