[mythtv-users] 0.27: Stalls and other niggles - diagnosed

Sun Jan 19 12:07:30 UTC 2014

On Sat, 18 Jan 2014 15:19:31 -0500
Jay Ashworth <jra at baylink.com> wrote:
> ----- Original Message -----
> > From: "Michael T. Dean" <mtdean at thirdcontact.com>
> 
> > If you want enterprise-level reliability, and you're running on
> > consumer-grade disk controllers (without battery backup), you should
> > enable barriers (which is what barriers were designed to allow). If
> > this combination doesn't provide the performance you want/need, you
> > may need to get enterprise-quality hardware (such as a
> > battery-backed disk controller) or more-expensive, but faster,
> > consumer hardware (like flash disks).
> 
> Or... switch to a DB engine that can, I dunno, avoid those problems.

and what engine is that? The gain comes from the use of short-lived
transactions and group commit. This means issuing

set autocommit=0;

when the connection is initiated, followed by:

begin transaction
reads and writes...
commit transaction

for each activity, and letting MySQL batch up the log writes.

> Why is it again, precisely, Mike, that we *don't* use InnoDB?

Although InnoDB supports the above, others in this thread are spot on
when they say that InnoDB is slow in comparison to MyISAM.

To get the performance back the application needs quite a lot of work,
namely:

- Operating two independent database connection pools - one for the
  existing code with autocommit set, and one for new code with
  autocommit off. This allows sub-systems to be transitioned
  independently.

- Take the application logic away from C++ and place it into stored
  procedures, sub-system by sub-system. Many parts of the code are
  probably best kept as they are, but things that call the database in
  tight loops (even selects) like the EIT cache are ripe for this. This
  is the reason why my posted patch covered this area.

  It should be obvious from the code that using stored procedures can
  also simplify the schema in places and potentially reduce the need
  for housekeeping.

- Have the code issue tight transactions, making judicious use of
  multi-valued inserts, like so:

  begin transaction
  call preparatory_routine_if_required(arg,arg,arg)
  call some_procedure(arg,arg,arg) over and over as required
  insert into foo values (this),(that),(other) as required
  call completion_routing_if_required(arg,arg,arg)
  commit or rollback transaction

  The key with InnoDB and other transactional databases is to keep
  locks short. In practice this means reducing the number of times the
  C++ issues a statement. Multi-value insert statements write the same
  amount of data as a bunch of single-valued inserts in a tight loop,
  but only wait for one round trip. Stored procedures allow multiple SQL
  statements for one round trip.

  Further gains can be made by incorporating begin transaction and
  commit/rollback into the procedures themselves.

The key to all this is taking care not to leave dangling transactions
and to handle error conditions sensibly. I have lost count of the
number of database applications I've worked on which do stupid things
like this:

begin transaction
do some reading and writing
encounter an error condition, most often a deadlock
drop the connection
open another connection from the pool
continue reading and writing as if they are inside the transaction
inevitably encounter another error
drop the connection
open another connection from the pool
around and around the mulberry bush we go

I'm talking big commercial applications.

The main problem with all this is the developer time involved. I'm
happy to talk people through and I'm happy to address subsystems as
they affect me, but I really don't have the time to take on the whole
application. In any case, I'm not sure it's warranted. The application
is fast enough for most purposes. It's just that as more functions
creep in so more people will fall over the I/O wall.

Perhaps if another developer were to talk those of us who are new to
the code base through the system as a whole and about the parts which
are not-obvious then we could start to address some of the I/O heavy
subsystems. It would give MythTV more headroom for newer features.

Like was said later in this thread, this is an architectural matter for
the key developers to chew the fat over. Putting my pennyworth into
this I feel that we should scotch talk of moving to SQLite or even
Postgres. Having a separate SQL server allows users to query the
database independently of the application. This allows for the modular
application it is today. IMHO making a monolithic application for a
questionable performance increase is a step backwards.

Postgres is interesting because it performs fewer I/Os per commit than
MySQL with binlogging, but I doubt it's worth the upheaval. There are
many MySQL-specific parts of the code which would take time to discover
and re-write, and there would still be a need to convert the code to
stored procedures to realize these gains. There is also the question of
whether the Postgres optimizer is as advanced as MySQL's. This might
make a difference on some of the larger tables. My recordedseek table
contains 8 million rows.

Regards,

Mike