[mythtv-users] Mooting architecture for a DataDirect replacement
Rod Smith
mythtv at rodsbooks.com
Mon Jun 25 20:52:57 UTC 2007
On Monday 25 June 2007 15:40, David Brodbeck wrote:
> On Jun 25, 2007, at 11:25 AM, Rod Smith wrote:
> > Daniel Kristjansson
> > posted an interesting idea in the "IMDB has TV listings" thread a
> > couple of
> > days ago, though: Have the users themselves generate the data in a
> > decentralized way, with the help of some automation to create a
> > "first draft"
> > based on a station's regular schedule.
>
> That's definitely an interesting concept -- sort of the CDDB technique.
Yes, and looked at in that way, there are really two types of data we need:
1) Description data -- To have a description of a movie or TV show, including
a short plot summary, the main actors, director, etc. This stuff will
remain unchanged for years (barring any corrections to fix typos or maybe
add mention of bit actors who subsequently make it big). The main ongoing
need would be to add information about new TV show episodes and movies
as they're released to TV.
2) Schedule data -- To know that Movie A will be shown at 3:00 PM Tuesday on
Station Y. This data will obviously need to be generated in an ongoing
fashion.
There'll also bee a need to link the two together, presumably using some sort
of unique code number for each item. I know there are standards in existence
to do this already, but I'm foggy on the details.
Here's a legal question: Suppose for the sake of argument that we had the
description data in an "open source" form. Would it be legal to screen-scrape
a TV station's Web site, zap2it.com, or whatever, for schedule data, link the
two types of data, and publish the results (via Usenet, P2P networks, a Web
site, or whatever)? If so, we could plan for a solution along those lines --
a single system could do the screen-scraping and merging of data to be
subsequently published in an open form. My impression from the discussions of
legal issues a few days ago is that this WOULD be legal, but IANAL.
From a technical viewpoint, this approach wouldn't put an undue burden on the
screen-scraped site(s), in terms of bandwidth requirements, but it'd take
some effort to come up with the screen scraping software and a means to
reliably link the two types of data. Having two or three sources of
screen-scraped data would be a useful backup for the inevitable changes in
data format that would come along; if the primary source changes format, we'd
switch to the backup source until the primary's screen-scraper is modified.
We'd also obviously need somebody to volunteer to do the screen scraping and
data bundling, but I imagine there'd be volunteers for that. If we were to
start such a project immediately, it would begin with pretty poor or no
descriptions. They'd also probably be pretty weak for new shows unless we
could get studios or others in the TV supply chain to donate material, with
appropriate releases to ensure it's under a license we could accept. Still,
if there are no legal impediments this could be a good way to go in the long
run, since it would keep us from being beholden to any one company's good
will.
--
Rod Smith
http://www.rodsbooks.com
More information about the mythtv-users
mailing list