[mythtv-users] 0.27: Stalls and other niggles - diagnosed

Fri Jan 17 17:40:38 UTC 2014

On Thu, 16 Jan 2014 13:27:00 -0500
"Michael T. Dean" <mtdean at thirdcontact.com> wrote:
> On 01/16/2014 11:32 AM, Mike Thomas wrote:
> > What I have noticed is mysqld almost always seems to be the process
> > doing the most I/O at the moment of the stall, but is rarely doing
> > any I/O at other times. It is almost as if mysqld is batching
> > updates to its tables and logs.
> 
> Sounds like you're storing your MySQL data on a file system with 
> barriers enabled, so mysqld writes data to disk and the file system 
> barriers cause it to block until all data is physically written to
> disk platters (not just to disk cache)--which takes a long time.
> During this time, MythTV (and everything else) is unable to access
> required data in MySQL.
> 
> The proper fix is /not/ just disabling barriers, instead making it so 
> that data can be written safely--using barriers--fast enough to
> prevent impacts to system services.  There are many things you can do
> to fix this, each with its own advantages/disadvantages (and costs),
> but before going there, I'd recommend just remounting your
> MySQL-data-containing file system with barriers disabled to see if
> this is, in fact, the problem you need to fix.  If so, then come back
> to find the best fix for you.

Dear Chaps,

A little debugging has revealed the cause of the problem.
Unfortunately it stems from the design of the application itself.

The stalls occur whenever the backend issues a batch of inserts into
the recordedseek table. The table is a MyISAM table, with a primary
index covering most of the columns and a data file for the remaining
column. The table is stored on a journaling filesystem (ext4).

I suggest that each insert into the table will:

- update the .MYD file and its inode
- update the .MYI file and its inode
- update the ext4 journal
- update the binlog and its inode
- update the ext4 journal again

at a minimum and these updates will be flushed to disk before the
SQL client library returns to the application. Issuing 15-20 of these in
a short batch every 10 seconds must result in at least 60 sequential
physical writes to disk. On a 7200rpm disk this takes 0.5 seconds.
As the files grow and become more physically separated this number will
increase.

There are a number of ways to address the impact of this process:

- Mount the filesystem without write barriers. This allows the hard
  disk's write cache more flexibility to cache writes. Unfortunately
  this comes at a cost: writes may be coalesced and blocks get written
  to the platters in an unpredictable order. This means that attempts
  to recover after a power failure are doomed. Even if you can make
  your filesystem and database work, the data may be only partially
  updated. This is what Mike Dean referred to above.

- Put the data on a non-journaled filesystem, such as ext2.
  Unfortunately, because these filesystems do not support barriers, the
  effect is the same as above. Ext2 does not use extents either, so the
  use of double- and triple-indirect blocks will eat away some of the
  gains as the files grow and separate.

- Put the database on a solid-state disk or use a RAID array with a
  battery-backed controller. This is very expensive.

- Call ProgramInfo::SavePositionMap() and
  ProgramInfo::SavePositionMapDelta() more frequently with smaller
  batches. This will help to spread the impact, but will not actually
  address the problem.

- Use a multi-valued insert statement. This will reduce the impact of a
  batch to almost that of a single insert.

- Use an InnoDB table and perform multi-valued inserts or single
  inserts inside a transaction. This will produce the greatest
  improvement because there will be a single write to the innodb log
  and delayed updates to InnoDB's other files. ext4's journal will not
  be updated unless filesystem data journaling is used as well.

  InnoDB uses clustered indices which will change insert and read
  performance. Insertions should be faster when multiple channels are
  recording because there won't be a single hot-spot as there is now.
  Selects should be nippier too because rows will be spread over
  fewer blocks.

I can't see how any of the following will make a tangible difference:

- Tuning the filesystem or using an 'improved' filesystem
- Using different I/O schedulers
- Tuning MySQL

because the problem stems from mythbackend instructing MySQL and
the filesystem to update the database files and wait for them to land
on the platters on each insert statement. IOW the use of single inserts
with autocommit/non-transactional tables mandates all this flushing and
waiting. What is required here is group commit, something which is
properly done by turning off autocommit for the client session and using
transactional tables. It is also possible to achieve this with a
multi-valued insert statement on a MyISAM table.

Comments, anyone?

Regards,

Mike.