[mythtv] Mythfilldatabase and Shepherd (tv_grab_au)

Max Barry mythtv at maxbarry.com
Sat Jul 16 21:17:30 UTC 2011


Hello,

I was asked to post here regarding why Shepherd (tv_grab_au) relies on
the '--graboptions' arg that was recently removed from mythfilldatabase:

http://code.mythtv.org/trac/ticket/9853

Specifically, I was asked why Shepherd sets up a custom cron job on the
user's system to call mythfilldatabase, rather than relying on MythTV's
in-built scheduling system.

Firstly, I should say that we have a workaround now, so this is not a
request to keep --graboptions or anything. We can deal with that fine.
I'm just responding to stuartm's request for info.

So: there are a few reasons. Some may be obsolete now, because Shepherd
was written years ago and I haven't followed MythTV's development very
closely. But at the time, at least, this is why:

(1) By default, MythTV triggers mythfilldatabase at 2am. Shepherd phones
home with stats, and our graphs were showing an order of magnitude spike
in usage at 2am, as a great many Shepherd users Australia-wide all hit
the datasources at once. Please bear in mind that each Shepherd user is
not simply downloading one XMLTV file, but rather compiling XMLTV by
scraping dozens or hundreds or thousands of different web pages.

This behavior was a problem both for the datasource (which could be
overwhelmed with traffic) and for us. It's a problem for us because TV
guide data is not freely available in Australia: it's fiercely defended
by the TV networks, who don't want it to end up in home theater PCs. In
a nutshell, the only way to get high-quality TV guide data into an
Australian home theater PC has been to scrape the networks' web pages,
but they actively block any scrapers they detect. If a bunch of scrapers
hit them at precisely 2am, that's easy to block.

This is a key point that is often overlooked by non-Australians, who
don't appreciate the different environment here. For example, it would
make a lot of sense for us to run just one scraper to gather TV guide
data each day, convert it into XMLTV, and offer that for download to all
Australian users. However, that would be illegal under Australian
copyright law.

I point this out because in my experience, people overseas tend to
respond to our situation with sentiments like, "That copyright law is
stupid, you should get it changed," or similarly accurate but wildly
unhelpful observations. The reality is that before Shepherd, Australians
had no reliable high-quality source of TV guide data. We do it this way
because it's been our only option.

(2) MythTV assumes it will be powered on for the scheduled MFDB run, and
if it's not--e.g. it's a system that shuts down when idle--it skips that
day. Users could thus see a dwindling supply of guide data "days" and
think something was wrong with Shepherd. It's a particular problem when
combined with (1) above, because systems that auto-poweroff are often
off at 2am.

(This seems like a universal problem, not Australia-specific, so very
possibly it's been addressed since I last looked at it.)

(3) Similarly, MythTV is locked to one grab per day. Users with
unreliable internet connections tend to have Shepherd time out
occasionally (it takes a long time to run, often hours), and thus
encounter the same shrinking number of guide data days problem as above.
Shepherd's cron job, by contrast, runs once per hour, so that it can try
again more quickly in the event of a transient network failure.

(4) It's very hard to configure grabbers via mythtv-setup. For example,
every time you go into the relevant MythTV Settings page, a terminal
window is triggered running 'tv_grab_au --configure'. On my system, at
least, this terminal window was invisible until you exited mythtv-setup.
Also, the process of matching channels in MythTV to XMLTVIDs in Shepherd
was very torturous and sometimes involved races between the two
applications.

Once Shepherd became relatively stable, we found that the great majority
of our mailing list traffic was requests for help configuring MythTV,
due to the issues listed above. I didn't see this as my role, or
Shepherd's role; Shepherd's job is simply to deliver an XMLTV file.
However, people saw Shepherd-MythTV integration problems as Shepherd
problems, and eventually I gave in and added some code to automatically
configure MythTV to run Shepherd. This entails:

- creating a tv_grab_au symlink to Shepherd in a relevant path

- scanning the user's MythTV DB for channels, figuring out matches with
Shepherd TV guide data channels, and setting XMLTVIDs appropriately

- turning off MythTV scheduled updates

- setting up a cron job to run Shepherd once per hour at a randomized
time. This is where we used '--graboptions' to pass a '--daily' argument
to Shepherd, when that became its new default behavior. Our fix for the
removal of --graboptions from MFDB will simply be to alter Shepherd such
that --daily is implied.

This has worked very well, as we no longer see so many emails from
people needing help configuring MythTV.

Our #1 user complaint today is that once Shepherd is installed and the
user tries to run it via 'mythfilldatabase', it seems to hang, because
mythfilldatabase suppresses all Shepherd output. (Shepherd can take
several hours to complete its first run.) At this point, some people
give up and terminate the process, then write to us seeking help on what
went wrong. So if I do have a request, it is that MFDB stop suppressing
grabber output on the command-line, so users can see that it is actually
doing something.

Thanks,

Max.



More information about the mythtv-dev mailing list