[mythtv-users] Mooting architecture for a DataDirect replacement

Peter Schachte schachte at csse.unimelb.edu.au
Thu Jun 28 14:16:26 UTC 2007


Jay R. Ashworth wrote:
> On Wed, Jun 27, 2007 at 04:27:34PM +1000, Peter Schachte wrote:
>> That sort of update may be tricky to handle. The production company
>> might have good description info to post but not know when it will be
>> aired in each market, particularly for a syndicated show. They'd want
>> to send out a message like: year 20, show 47 of Oprah has guests X,
>> Y, and Z. Then it's up to someone else to work out which program to
>> update. There's also the lifetime issue to think about: what about
>> when year 20, show 47 of Oprah is rerun some months later? You'd like
>> to get the correct guest list, but you don't want to keep around every
>> posting made about every show ever aired to get it.
> 
> This right here is probably the best argument that's been made yet for
> separating the program and airing data; the problem is how to store the
> program data.
> 
> It's harder to justify pawning *that* off on NNTP companies.  Perhaps
> Google would take on *that* part of the problem?  :-)  Perhaps we can
> create a protocol for semantic tagging of that data in such a fashion
> that the production companies can publish it on their own website in a
> locatable fashion.

It certainly makes sense for program data to be looked up on demand (pulled),
but scheduling data to be pushed, since program data has a long shelf life,
while scheduling data only becomes available at most a few weeks before it
becomes worthless.  But while having each production company *generate* data
about its own programs makes perfect sense, relying on every production company
to serve its own program data is less attractive.  For one thing, some
production companies probably won't do it very well, if at all (remember, some
will be niche players in some local market that just can't be bothered with
another thing to do).  More importantly, getting every production company in
the world to support exactly the same file formats and search interface seems
pretty unlikely in the near term.

What seems more likely to work in my lifetime is a single central database of
shows and episodes.  I just found out about thetvdb.com, which is exactly that.
 It seems to be open to scraping, and has detailed info about a large number of
TV series and episodes, including episode thumbnails and some really nice
banner images, and has an XML interface.  It even has info about non-US shows.

But a little math gives one pause.  If we're designing an architecture for TV
schedule distribution into the future, I think we'd better plan for growth.
Other free and commercial DVR systems will probably want to piggyback on it,
and DVR use seems likely to rise worldwide.  So I think we'd better plan for at
least 10M users worldwide (seems low to me, even only 10 years from now).  If
on average each of them looks up info on 20 channels times 20 programs per
channel each day, that's 4 billion searches per day, or close to 50K complete
DB searches per second.

But if there were a central site for each country or region to collect schedule
information from stations, and program info from someone like thetvdb.com, they
wouldn't put much load on the stations or the database.  They could then
assemble the detailed info into neat packages and distribute it through NNTP or
P2P, which wouldn't put much load on the central sites, either.

>> I do think there's a strong argument for having a central site
> Yeah, but then, again, *someone* gotta run it.

There are a lot of volunteer-run sites that require more effort than this is
likely to need, at least once it's up and running and provided the stations
cooperate.  Just in Australia for providing TV schedule info, I can point you
to oztivo (oztivo.net) and shepherd (http://svn.whuffy.com/index.fcgi/wiki).
(BTW, shepherd *does* use thetvdb.com and imdb.com to augment program
listings.)  Or, as suggested in another thread, it could be run as an ad-driven
commercial venture.

>> From the central site, the data could be distributed by NNTP, UUCP,
>> FTP, HTTP, P2P, or carrier pigeon.
> 
> Does RFC 1149 have the bandwidth for that?

ROTFL.  Hey, if a swallow can carry coconuts....

-- 
Peter Schachte              I worry that 10 or 15 years from now, [my child]
schachte at cs.mu.OZ.AU        will come to me and say 'Daddy, where were you
www.cs.mu.oz.au/~schachte/  when they took freedom of the press away from
Phone: +61 3 8344 1338      the Internet?' -- Mike Godwin


More information about the mythtv-users mailing list