[mythtv-users] mythconverg_backup.pl inefficiencies
Michael T. Dean
mtdean at thirdcontact.com
Thu Nov 19 13:27:55 UTC 2015
On 11/18/2015 08:48 PM, Will Dormann wrote:
> Hi folks,
>
> I recently noticed that mythconverg_backup.pl was taking a long time to
> complete. As I was looking at the output directory, I noticed that the
> backup has two distinct steps:
>
> 1) Backup the uncompressed database dump to the output directory
> 2) Compress the database dump
>
> Which seems very un-unix-like to me. Granted, the backup was noticeably
> slow for me for two reasons:
>
> 1) My backup target directory is a 100Mbit-mounted network share (don't
> ask... my ION doesn't play well at gigabit speeds)
> 2) My mythtv installation is many years old, so the database has grown
> quite large.
>
> But it can be done better. In particular, I just took line 1252 and
> made it:
>
> "'$safe_db_name' | /usr/bin/pigz >'$output_file.gz'";
>
> and then commented out near the end of the script:
> #compress_backup;
>
>
> Sure, it's a little hacky but the speed improvement is noticeable.
> Doing the compression piped inline with the dump saves the extra step
> and bandwidth to transfer the uncompressed data *twice* (once to ouput
> the data, and once again to read the data to compress it), and using
> pigz is more efficient by using all available CPU cores.
>
> Is there a reason why the current script does the backup in two steps?
Yes, for several reasons. The first and most important is because a)
backup is, given sufficient I/O bandwidth, very quick while compression
is relatively slow*** and because mysqldump (the MySQL-provided script
performing the backup) must lock tables to do its work, we want the
backup to finish as quickly as possible in case MythTV is running (and
needs access to tables and, especially, an ability to change data within
tables). Therefore, we dump the database as quickly as possible without
any additional work slowing down its completion. Also, because
compression isn't always available on all people's systems and needs to
work properly, even on non-*nix systems, we have several approaches we
have to try to attempt to compress (first using IO::Compress::Gzip to
let Perl do the compression without any required external compression
binaries, then attempting to compress with a binary called gzip), and
this trial-and-error approach would be much more difficult to implement
with pipelines (where it's also possible some compression programs may
not be pipelineable--depending on the program construction or,
potentially, if the program uses a non-streamable compression
algorithm). And on some systems, the memory usage of a pipelined backup
| compress could be a problem, especially when running MythTV and when
MythTV is busy.
You can choose to backup the database, then compress it yourself (i.e.
with a process on the file server). To do so, just run with
--compress=/bin/true (or, assuming there's no executable in your PATH
called "none", you could use --compress=none). That way, you only have
to transmit the uncompressed backup over the network once.
Alternatively, you can run the backup script on a system with local file
storage (it doesn't even have to have MythTV installed, just
Perl)--ideally, and for best performance possible, on the system that
runs the MySQL database server (which really should have local storage)
and then copy the compressed backup to another host for safety). Or,
you can keep your "special-case" modifications to the backup script, and
just use your custom version, instead.
Mike
*** Meaning that nearly all the time spent running the
mythconverg_backup.pl script is expended on the compression while the
backup itself is a trivial amount of the overall run time.
More information about the mythtv-users
mailing list