[mythtv-users] Mooting architecture for a DataDirect replacement

Jay R. Ashworth jra at baylink.com
Mon Jun 25 21:03:30 UTC 2007


On Mon, Jun 25, 2007 at 04:52:57PM -0400, Rod Smith wrote:
> On Monday 25 June 2007 15:40, David Brodbeck wrote:
> > On Jun 25, 2007, at 11:25 AM, Rod Smith wrote:
> > > Daniel Kristjansson
> > > posted an interesting idea in the "IMDB has TV listings" thread a
> > > couple of
> > > days ago, though: Have the users themselves generate the data in a
> > > decentralized way, with the help of some automation to create a
> > > "first draft"
> > > based on a station's regular schedule.
> >
> > That's definitely an interesting concept -- sort of the CDDB technique.
> 
> Yes, and looked at in that way, there are really two types of data we need:
> 
> 1) Description data -- To have a description of a movie or TV show, including
>    a short plot summary, the main actors, director, etc. This stuff will
>    remain unchanged for years (barring any corrections to fix typos or maybe
>    add mention of bit actors who subsequently make it big). The main ongoing
>    need would be to add information about new TV show episodes and movies
>    as they're released to TV.

and *this* is the part that really lends itself well to user creation.

> 2) Schedule data -- To know that Movie A will be shown at 3:00 PM Tuesday on
>    Station Y. This data will obviously need to be generated in an ongoing
>    fashion.

This, not so much.

> There'll also bee a need to link the two together, presumably using some sort 
> of unique code number for each item. I know there are standards in existence 
> to do this already, but I'm foggy on the details.

Actually, there almost certainly is *not* any sort of industry standard
for identifying programs, and the problem is murkier than you think.

> Here's a legal question: Suppose for the sake of argument that we had the 
> description data in an "open source" form. Would it be legal to screen-scrape 
> a TV station's Web site, zap2it.com, or whatever, for schedule data, link the 
> two types of data, and publish the results (via Usenet, P2P networks, a Web 
> site, or whatever)? If so, we could plan for a solution along those lines -- 
> a single system could do the screen-scraping and merging of data to be 
> subsequently published in an open form. My impression from the discussions of 
> legal issues a few days ago is that this WOULD be legal, but IANAL.

There are two separate issues of legality involved here:

a) does the station own a copyright in the bare scheduling data (almost
certainly no: Feist v Rural, but I don't know if there's any caselaw on
this topic; anyone got a Versuslaw subscription?)

b) can the station legally restrict you from crawling their site in
that fashion, to acquire the data.  Courts are, IME, all over the map
on whether web site operators are legally *permitted* to restrict how
people can access non-technically-restricted pages in light of the fact
that technical restrictions are 1) possible, 2) easy and 3) free.  My
moral opinion is if you want it restricted, restrict it.

> From a technical viewpoint, this approach wouldn't put an undue burden
> on the screen-scraped site(s), in terms of bandwidth requirements, but
> it'd take some effort to come up with the screen scraping software
> and a means to reliably link the two types of data. Having two or
> three sources of screen-scraped data would be a useful backup for
> the inevitable changes in data format that would come along; if the
> primary source changes format, we'd switch to the backup source until
> the primary's screen-scraper is modified. We'd also obviously need
> somebody to volunteer to do the screen scraping and data bundling,
> but I imagine there'd be volunteers for that. If we were to start
> such a project immediately, it would begin with pretty poor or no
> descriptions. They'd also probably be pretty weak for new shows unless
> we could get studios or others in the TV supply chain to donate
> material, with appropriate releases to ensure it's under a license we
> could accept. Still, if there are no legal impediments this could be
> a good way to go in the long run, since it would keep us from being
> beholden to any one company's good will.

And that, right there, is the fundamental underlying motivation behind
*all* of my design decisions.

You know, Doc Searls is big on the Vendor Relationship Management
topic; I wonder if he has anything interesting to tell us.  I'll drop
him a note.

Cheers,
-- jra
-- 
Jay R. Ashworth                   Baylink                      jra at baylink.com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com                     '87 e24
St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274


More information about the mythtv-users mailing list