[mythtv-users] mythconverg_backup.pl inefficiencies

Thu Nov 19 13:27:55 UTC 2015

On 11/18/2015 08:48 PM, Will Dormann wrote:
> Hi folks,
>
> I recently noticed that mythconverg_backup.pl was taking a long time to
> complete.   As I was looking at the output directory, I noticed that the
> backup has two distinct steps:
>
> 1) Backup the uncompressed database dump to the output directory
> 2) Compress the database dump
>
> Which seems very un-unix-like to me.  Granted, the backup was noticeably
> slow for me for two reasons:
>
> 1) My backup target directory is a 100Mbit-mounted network share (don't
> ask... my ION doesn't play well at gigabit speeds)
> 2) My mythtv installation is many years old, so the database has grown
> quite large.
>
> But it can be done better.  In particular, I just took line 1252 and
> made it:
>
>                 "'$safe_db_name' | /usr/bin/pigz >'$output_file.gz'";
>
> and then commented out near the end of the script:
> 	#compress_backup;
>
>
> Sure, it's a little hacky but the speed improvement is noticeable.
> Doing the compression piped inline with the dump saves the extra step
> and bandwidth to transfer the uncompressed data *twice* (once to ouput
> the data, and once again to read the data to compress it), and using
> pigz is more efficient by using all available CPU cores.
>
> Is there a reason why the current script does the backup in two steps?

Yes, for several reasons.  The first and most important is because a) 
backup is, given sufficient I/O bandwidth, very quick while compression 
is relatively slow*** and because mysqldump (the MySQL-provided script 
performing the backup) must lock tables to do its work, we want the 
backup to finish as quickly as possible in case MythTV is running (and 
needs access to tables and, especially, an ability to change data within 
tables).  Therefore, we dump the database as quickly as possible without 
any additional work slowing down its completion.  Also, because 
compression isn't always available on all people's systems and needs to 
work properly, even on non-*nix systems, we have several approaches we 
have to try to attempt to compress (first using IO::Compress::Gzip to 
let Perl do the compression without any required external compression 
binaries, then attempting to compress with a binary called gzip), and 
this trial-and-error approach would be much more difficult to implement 
with pipelines (where it's also possible some compression programs may 
not be pipelineable--depending on the program construction or, 
potentially, if the program uses a non-streamable compression 
algorithm).  And on some systems, the memory usage of a pipelined backup 
| compress could be a problem, especially when running MythTV and when 
MythTV is busy.

You can choose to backup the database, then compress it yourself (i.e. 
with a process on the file server).  To do so, just run with 
--compress=/bin/true (or, assuming there's no executable in your PATH 
called "none", you could use --compress=none).  That way, you only have 
to transmit the uncompressed backup over the network once.  
Alternatively, you can run the backup script on a system with local file 
storage (it doesn't even have to have MythTV installed, just 
Perl)--ideally, and for best performance possible, on the system that 
runs the MySQL database server (which really should have local storage) 
and then copy the compressed backup to another host for safety).  Or, 
you can keep your "special-case" modifications to the backup script, and 
just use your custom version, instead.

Mike

*** Meaning that nearly all the time spent running the 
mythconverg_backup.pl script is expended on the compression while the 
backup itself is a trivial amount of the overall run time.