[mythtv] MythFillDatabase memory usage

Wed Dec 24 04:04:41 UTC 2008

I realise there's some information covered on the Wiki, but the
problem(s) I'm having are not related to anything listed there.

First, some preamble.

Running 0.21-fixes, AMD Sempron 1800+, 512MB RAM

I'm using a manually set up XMLTV parser. Each XMLTV file for 14 days
worth of listings weighs in at roughly 41MB (41,226,401 bytes, 145
channels, 61,146 programs scheduled).
If I run mfdb on that file, it takes (at least) 3 hours to process,
attempting to use well over 600MB of RAM. This causes kswapd to thrash
the HDD (As there's only 512MB Physical in this machine) and means the
system is totally unusable for the entire time. This is also before
MySQL is being touched (There is no noticable load from MySQL at the
time, watching using top). This is also with a meager 145 channels. If
I added Radio, HD, PPV, Foreign and AO channels to that list, it'd be
closer to 500.

However, if I call tv_split and split the single XMLTV file into 14
daily files, then process them one at a time through mfdb, the RAM
usage is much less (120MB or so), the runtime is a lot faster, and the
system can easily cope with recording two multicast streams and
streaming a recording over SMB as mfdb runs.

So what the heck is mfdb doing? With a quick glance at the code, it
appears to me like it's loading the entire file into RAM, building an
in-memory XML-based representation of the file, then building an
in-memory Object-based representation of the XML-based representation
of the file. This cannot be good, and may be the cause of many issues
with stuttering while mfdb is running. A simpler way would be to use
an XSLT to (for example) translate the XML file into a series of
myth-centric SQL statements to run against the DB, or working on the
file progressively in a stream from the HDD rather than in-memory,
parsing the XML and saving it directly to the DB from there, rather
than parsing the XML, making a hierachy of objects from the XML, then
parsing the objects you've just made to update the database. That way
you're not making 2 (Or more) in-memory representations of the same
data. This will then also take the load back from mfdb and kswapd and
put it back on MySQL, which can be easily moved to another machine if
necessary.
-- 
Robert "Anaerin" Johnston