[mythtv-users] Slow MySQL query after delete

Mon Dec 3 08:42:57 UTC 2007

    > Date: Sat, 01 Dec 2007 19:04:43 -0500
    > From: "Michael T. Dean" <mtdean at thirdcontact.com>

    > Fine.  Look.  I'm not saying that it can't be improved--there's not a
    > single place in Myth that can't be improved, as nothing is ever
    > perfect.  I'm simply saying that the "recent OP"--the guy who has a
    > problem and restarted this thread--can make his system perform much
    > better.  It seems there are far too many people trying to convince him
    > that there's nothing he can do because Myth is broken and the scheduler
    > was improperly designed.

That's not what I'm trying to do.  I -am- trying to keep him from
wasting too much time following cargo-cult advice.  If he finds that
turning on slow-deletes works, great!  (But he sent mail that it
didn't help.)  I got burned by that sort of advice twice (once before
I even joined this list, as my very first experience with Myth ever,
when I tried [and abandoned] KnoppMyth when it had a serious bug that
"reinstall a zillion times---clearly -you- must be screwing up and
doing it wrong" sorts of advice (from its maintainer, even) never
fixed---while I wasted an entire week at it---until another user found
the critical DB bug in that version of KnoppMyth that absolutely
prevented any separate-machine FE/BE install from working; and once
here, when I was trying to reduce the number of serious glitches in my
recordings and was given a whole lot of things to do that chewed up a
lot of time and were, in the end, unrelated to actually fixing it
(that was Chris' fix to the locking problem w/writing streams vs DB
locks).

I'm not saying necessarily that your advice here is cargo-cultish,
but OTOH, I -was- giving a pretty obvious bit of advice to pay
attention to how the scheduler may be screwing the OP, and in fact
that seems so far to be getting borne out for at least some posters.

    > Daniel and Janne have actually made some changes that make the
    > scheduling faster (maybe only committed to multirec so far, though--I
    > don't remember).  Chris Pinkham promised you that he would make (and he
    > has already made some) changes which make the DB locking less of an
    > issue.  Others may have also made some changes.

Yes, and they're all good.  I appreciate them, and I'm sure others do, too.

    > Though I can't guarantee it--because I'm a non-dev and because I don't
    > make decisions (so arguing with me is really a waste of both our time),
    > I'm pretty certain no major rewrite of the scheduler will be done in
    > 0.20-fixes (stable) branch.

Of course not.  But I do assume that it's helpful that those who might
be in the position to make patches continue to be aware that scheduler
performance is awful in some cases, and that either it needs to get
fixed, or those things that tend to hang on the awful performance need
to get disentangled somehow so the bad performance isn't noticeable.
That's all.

    >                              I'm even more certain that no major rewrite
    > of the scheduler will be done in the 0.18-fixes (archive) branch.

Strawman argument.  All the other users reporting either (a) slow
deletes or (b) poor scheduler performance have been running recent
revisions, so the fact that I'm not doesn't invalidate -their- reports.

    > >  for some users.

    > And /there/ is the important point.  Find out why it's only /some/ users
    > and--if, in fact, the configuration difference is a Myth bug, I will fix
    > it for you.

That's a very generous offer; thank you.

Unfortunately, figuring this out may require some cooperation from
users who never see slow deletes or poor scheduler performance.

    > > Now, as for your own performance, can you please run a couple of tests
    > > and let us know how -your- system performs?

    > That will have to wait until I can find the time.

Okay, but then you should be cautious attributing all of the reported
problems to "configuration errors" until you -do- have the time; otherwise,
it looks like you may be trying to dodge perhaps being shown that even
-your- system may suffer from the same problems.

A really quick suggestion would be to simply -not- delete anything for
the next few hours or days until you have a stack of 3-5 things that
you'd really like to delete.  Then try to delete them, one at a time,
as fast as possible from the frontend UI and see if it ever seems to
pause.  That test shouldn't take substantially longer than it would
have taken you to do each delete singly; it just requires remembering
what you meant to delete (and being able to navigate to each one fast
enough to show up the problem; if it takes you 30 seconds of scrolling
around in the listing to find each thing you want to delete, you may
not see the bug).

Actually, you also won't see it if your scheduler runs really fast;
I assume you have verbose logging enabled on your backend, so what's
a typical time for your

  Scheduled 1247 items in 30.0 = 1.96 match + 28.08 place

lines?  Mine averages 43.2 seconds across a sample of the last 35
items scheduled; smallest 27.52, largest 57.14.  Of that, match is
almost always 0.00, but half a dozen were around 0.1 seconds, and
three were large:  4.75, 2.19, 1.96.  Scheduled 1247 +- 1 items each
run until mfdb intervened; then scheduled 1317 until a few hours ago,
then 1312.  So that gives you the order of magnitude of things.

(Of course, those 1247/1312 numbers are before duplicate & deactivated
elimination!  I have 169 items scheduled to my current horizon 13 days
from now, or 13/day.)

    > > If you cannot ever get the UI to pause while doing this, I'm sure many
    > > of us would be interested to know the size of your oldrecorded tables

    > 7072 in oldrecorded, 2903907 in recordedseek, 4577 in recordedmarkup

10797 in oldrecorded, 2624120 in recordedmarkup (no recordedseek
'cause that's not in 0.18.1)

    > > and the number and makeup of your recording rules,

    > 78 in record (all are currently always/any channel rules--though I often
    > add up to 20 find once/any channel rules for movies), 413 in
    > recordmatch, 437 in recorded.

411 in record (of which maybe a dozen are always/any channel and most
of the rest are always/one channel), 2090 in recordmatch, 372 in recorded.

Btw, 125 in channel, and 6 tuners.  Everything in SD, FWIW.

Note that I seem to have 5 times as many entries in record, and 5
times as many in recordmatch, as you seem to; I'd be surprised if
scheduler performance was linearly related to those ratios, but it
could be 5 to 25 times worse for me than for you if so.  And that's
assuming no O(n^2) steps anywhere.  (I'd rather hope that things are
more like linear or sublinear, but I haven't analyzed the computational
complexity of either the BUSQ or how the scheduler uses it, so I dunno.)

    > Every table in the DB is MyISAM with the exception of the new
    > MythWeather tables, which are InnoDB (see
    > http://www.gossamer-threads.com/lists/mythtv/dev/302941#302941 ).

All of my tables are MyISAM.

    > Oh, and, I've deleted at least 10 shows today, so those numbers were higher.

Ach!  You could have run the simple test! :)

    > >  as mentioned in
    > > that old thread about scheduler performance from a few months ago.
    > > Also interesting would be the match/place lines from your sched logs,
    > > and the contents of your MySQL conf (e.g., memory allocations, etc).

    > It's the default one that's installed from a source install--with the
    > one exception that I disabled (commented) log-bin.

Mine is nonstandard; making the changes helped a great deal and
speeded up everything having to do with scheduling, but not enough,
and making the numbers even larger didn't seem to help much or at all
(though I don't recall if it was because the tables weren't getting
fully used, or because then I ran into swapping instead):

# ++ The "doubled" values, plus the tmp_table_size.
myisam_sort_buffer_size = 128M
join_buffer_size = 2M
read_buffer_size = 2M
sort_buffer_size = 4M
table_cache = 2048
thread_cache_size = 128
max_allowed_packet = 32M
query_cache_limit = 2M
query_cache_size = 64M
# Increased from 16M (above, now commented-out) to 64M on 1/17/07:
[and then doubled again]
key_buffer              = 128M
# New.
tmp_table_size = 128M
# --

[The original values were as follows:
    # ++ The "undoubled" values.
    # myisam_sort_buffer_size = 64M
    # join_buffer_size = 1M
    # read_buffer_size = 1M
    # sort_buffer_size = 2M
    # table_cache = 1024
    # thread_cache_size = 64
    # max_allowed_packet = 16M
    # query_cache_limit = 1M
    # query_cache_size = 32M
    # # Increased from 16M (above, now commented-out) to 64M on 1/17/07:
    # key_buffer            = 64M
    ## --
]

The backend has 1G RAM and doesn't ever use the swap it has available;
perhaps half of that is taken up by large ivtv 0.4.1 buffers for the 5
tuners in that machine.  (I could get an exact figure if necessary; I
documented it and even sent it in traffic to the ivtv list maybe a year
or two ago.)

    > >     > Never optimize before profiling.  IMHO, picking something--no matter how
    > >     > "likely" to be the problem--and changing it without actually proving it
    > >     > is the source of the problem is just a waste of time.
    > >
    > > You seem to be completely ignoring the fact that you've been told
    > > repeatedly that simple reschedule -is- a big problem,

    > I've been told a lot of things.  I believe very little of of what I'm
    > told until I have proof.

What sort of proof would you find acceptable?  Will you not believe
that the scheduler is slow until you see it on your own machine?  Will
you not believe that such slowness is not something you can conveniently
blame on the user until it bites you personally?  You of all people
should be aware that there are plenty of configurations for which Myth
is supposed to work but which are buggy; just because one user is in
such a situation doesn't make the bug for -her- configuration any less
valid just because it doesn't come up for you.  (E.g., if we were
talking about something that broke PAL, you'd never see it---but that
wouldn't make it any less of a bug for UK users.)

    > > P.S.  One way of verifying your contention that it's the actual
    > > removal of the recordedseek info that's causing deletions to
    > > be slow

    > I _never_ said that removal of recordedseek info causes deletion to be
    > slow.

You seemed to imply it when you said this:

    >                                                        I really think it's
    > a file system thing--i.e. disk thrashing as XFS tries to delete the
    > recording at full speed and MySQL on the same disk tries to delete the
    > recordedseek entries for the recording (i.e. finding 7200 needles (per
    > hour of recording you're deleting) in a 1.1M straw haystack).

and this:

    >                                Also, it doesn't really help that he's
    > refusing to try slow deletes because he has a super-fast XFS filesystem
    > and he's smart enough to know that a slow delete (that will cause the
    > removal of recordedseek/recordedmarkup entries to be delayed about 2min
    > 10sec per GiB of recording) will have no effect on performance of his
    > deletes.

If that's not what you meant, I apologize for tarring you with that brush.
But if that's not what you meant in those two excerpts, what -did- you mean?

(Are the -entries- removed slowly?  Why are they delayed -at all- if
the point of slow deletes is to compensate for -filesystems- that are
slow to delete big files?  And if they're simply -delayed- but then
deleted all at once, how does -that- help?  If the point was that
deleting those entries made other things slow, it just moves the
inevitable hang out to sometime in the future...  unless it's your
contention that "fast-delete" filesystems are in fact not "fast"
unless absolutely no other disk activity is happening.  I disbelieve.
If a typical deletion of a large file takes tens of milliseconds, it's
very hard to see how that small an amount of disk access can get in
the way of MySQL's accesses, or vice versa.  And, of course, we can
-completely- eliminate this hypothesis by renaming the original and
substituting a zero-length file before deletion, as I suggested earlier.)

    >        As a matter of fact, I have never even agreed that deletion is
    > slow--as it isn't for me.

You haven't tried the critical experiment (so far as you've told us),
which is to do MORE THAN ONE in quick succession.

    > I simply made a comment which tried to point out that--despite what many
    > had said in the thread--enabling slow deletes on a fast filesystem
    > /does/ have an effect.  I did not say that effect would fix the
    > slowness.  I did not say that removing records from recordedseek is slow.

Okay.  I'd be interested know know exactly what effect this has for
people who run fast filesystems (e.g., anything but ext3fs), whether
or not they think deletion is slow on their machines.  I've certainly
never seen -any- unlink under JFS -ever- be slow -or- cause the disk
to thrash and slow down other processes, so now -I'm- doubting -you-.
So how does slow-delete help in this situation?

    > Far too many people seem to be forgetting that there's a /lot/ going on
    > in the system and that few people have a clear picture of all that
    > happens.  I know enough about what happens to know that I don't have a
    > clear picture of all that happens.  Because of this, even things that
    > don't seem to be relevant may actually be relevant.

Of course.  But by the same token, I (and others) are talking about
something (scheduling) that doesn't at first glance seem relevant to
deletion, either, but it -is-.