[mythtv-users] random livetv stalls

Tue Feb 25 11:35:00 UTC 2014

On Mon, 24 Feb 2014 12:23:50 +0100
Hika van den Hoven <hikavdh at gmail.com> wrote:
> Hoi Mike,
> 
> 
> > For a more complete review of the impact of RAID 5, I suggest
> > reading chapters 29-31 of Recovery Mechanisms in Database Systems
> > by Kumar and Hsu. The performance figures they quote bear out my
> > experience.
> 
> Do you have a link, or does it only exists as book.

Dear Hika,

'fraid not. That book was from the days when books were books. I'm sure
there are plenty of other resources which will mention it. Just be sure
what you've chosen hasn't been written by a storage vendor.

> I thought the main advantage of raid 5/6 over raid 1 is performance or
> is that only on read access? 

You've got that the wrong way around. RAID 5/6 uses N+1 (or 2) disks
per set. RAID 1 uses 2N disks per set. RAID 5/6's advantage is economy
whilst providing data security. Unfortunately it is far and away the
slowest of the RAIDs for any typical usage pattern.

RAID 1 volumes can read from either disk, providing 2x read speed
improvement, but suffer from having to write to both disks before
telling the O/S the transaction is done.

RAID 5/6 volumes can read from any main disk (but not the parity disk)
to improve read speed, but must go through the aforementioned
hullabaloo to write to them. RAID5/6 is recommended for storing large
quantities of data that are read frequently but rarely updated. Of
course, the people who recommend it for that have never actually tried
to write the data onto the disk in the first place!

In theory, RAID 5/6 should be fine for mythtv video storage because it
stores a small number of very large files (requiring few filesystem
structure updates) and the files are stored for ages and may be read
many times but where it can be a problem is with the writing threads.
This is down to the coding of mythtv. Put simply, the task which reads
from the tuner also needs to write it to disk *and* update the database
with the jump points. Although the writing of the video data to disk
can go into cache, the other tasks are time critical. If the RAID 5/6
disks add even a small delay, this can cause stuttering when the other
backend threads try to read the video data and database rows. In the
worst case, it could cause a receive buffer overrun and consequent
corruption.

The other factor is the nature of the writes to the journal. If you use
a journal which stores the content of the file as well as the
filesystem structure updates, this probably won't apply (but you'll see
a performance drop from the extra data shifting). Whenever you append a
file, the inodes and free lists need updating. These updates are
probably quite small, and for data integrity reasons must be committed
to the platters. This means almost all of those large data writes are
followed by single sector writes and all the hullabaloo that involves.
I suggest it's that which makes the difference.

It is interesting that despite ext4 using delayed allocation it is
still not as good as xfs for storing video data. This implies that
mythtv is rather more time-critical than one might at first suppose.

Another factor is the modern trend for internal filesystem journals.
Aeons ago when I worked with big monolithic transactional databases the
most obvious cause of performance degradation was having the logs on
the same spindle as the data tables. Quite apart from the problem of
data loss in the event of a disk failure, this performance drop from
making the disk drives seek back and forth (data tables, logs, data
tables logs...) was simply unacceptable. I made a massive improvement
in one system (20x or more) by placing the logs on a separate disk. I
expect filesystem performance to improve from doing the same, but
probably not as dramatically.

The key message to take from all this is seek times and rotational
positioning delays, which are in the order of milliseconds dominate
the throughput of all transactional systems. Writes of incomplete RAID5
stripes are transactional in the sense that they need a read step, a
calculation step and a write step. Filesystems are transactional
because they must update their structures atomically. Journaling
filesystems try to offset this, but end up making the situation worse
if the journal is on the same spindle as the data. This is why ext3
sucks.

NFS, NAS and SAN architectures are plagued by the extra delays inherent
in packing data up for network transport and unpacking it. Although
the delays are not as great as those from seeking, they all accumulate,
even on 10Gbps networks. Even SAS disk arrays can add delays when data
from several drives needs to wait to get a slot on an expander's bus.

Databases are dominated by transactions. Databases on filesystems
suffer from all these phenomena. Using autocommit rather than begin
transaction-lots of updates-group commit makes for many more commits
than are necessary. And to top it all, mysql, being a SQL layer over
the top of a series of independent 'engines' includes its own 'binary
log'. If you put all these things together, it is easy to see how
limited performance can be.

You could say 'fuck it' and run without barriers etc., but then in the
event of a disk problem, you'd have to write all that data back onto
your RAID 5 arrays...

In my case, I chose to keep all the barriers and binary logging but
reduce the commit load by patching mythtv to update a couple of tables
using group commit. Not that I'm saying this is related to the
problems people on this thread have been discussing. I just raise the
point in a general discussion about throughput of I/O systems.

Yours,

Mike.