[mythtv-users] Mooting architecture for a DataDirect replacement

Rod Smith mythtv at rodsbooks.com
Mon Jun 25 20:52:57 UTC 2007


On Monday 25 June 2007 15:40, David Brodbeck wrote:
> On Jun 25, 2007, at 11:25 AM, Rod Smith wrote:
> > Daniel Kristjansson
> > posted an interesting idea in the "IMDB has TV listings" thread a
> > couple of
> > days ago, though: Have the users themselves generate the data in a
> > decentralized way, with the help of some automation to create a
> > "first draft"
> > based on a station's regular schedule.
>
> That's definitely an interesting concept -- sort of the CDDB technique.

Yes, and looked at in that way, there are really two types of data we need:

1) Description data -- To have a description of a movie or TV show, including
   a short plot summary, the main actors, director, etc. This stuff will
   remain unchanged for years (barring any corrections to fix typos or maybe
   add mention of bit actors who subsequently make it big). The main ongoing
   need would be to add information about new TV show episodes and movies
   as they're released to TV.

2) Schedule data -- To know that Movie A will be shown at 3:00 PM Tuesday on
   Station Y. This data will obviously need to be generated in an ongoing
   fashion.

There'll also bee a need to link the two together, presumably using some sort 
of unique code number for each item. I know there are standards in existence 
to do this already, but I'm foggy on the details.

Here's a legal question: Suppose for the sake of argument that we had the 
description data in an "open source" form. Would it be legal to screen-scrape 
a TV station's Web site, zap2it.com, or whatever, for schedule data, link the 
two types of data, and publish the results (via Usenet, P2P networks, a Web 
site, or whatever)? If so, we could plan for a solution along those lines -- 
a single system could do the screen-scraping and merging of data to be 
subsequently published in an open form. My impression from the discussions of 
legal issues a few days ago is that this WOULD be legal, but IANAL.

From a technical viewpoint, this approach wouldn't put an undue burden on the 
screen-scraped site(s), in terms of bandwidth requirements, but it'd take 
some effort to come up with the screen scraping software and a means to 
reliably link the two types of data. Having two or three sources of 
screen-scraped data would be a useful backup for the inevitable changes in 
data format that would come along; if the primary source changes format, we'd 
switch to the backup source until the primary's screen-scraper is modified. 
We'd also obviously need somebody to volunteer to do the screen scraping and 
data bundling, but I imagine there'd be volunteers for that. If we were to 
start such a project immediately, it would begin with pretty poor or no 
descriptions. They'd also probably be pretty weak for new shows unless we 
could get studios or others in the TV supply chain to donate material, with 
appropriate releases to ensure it's under a license we could accept. Still, 
if there are no legal impediments this could be a good way to go in the long 
run, since it would keep us from being beholden to any one company's good 
will.

-- 
Rod Smith
http://www.rodsbooks.com


More information about the mythtv-users mailing list