[mythtv-users] Mooting architecture for a DataDirect replacement

Jay R. Ashworth jra at baylink.com
Thu Jun 28 20:28:01 UTC 2007


On Fri, Jun 29, 2007 at 12:16:26AM +1000, Peter Schachte wrote:
> Jay R. Ashworth wrote:
> > On Wed, Jun 27, 2007 at 04:27:34PM +1000, Peter Schachte wrote:
> >> That sort of update may be tricky to handle. The production company
> >> might have good description info to post but not know when it will be
> >> aired in each market, particularly for a syndicated show. They'd want
> >> to send out a message like: year 20, show 47 of Oprah has guests X,
> >> Y, and Z. Then it's up to someone else to work out which program to
> >> update. There's also the lifetime issue to think about: what about
> >> when year 20, show 47 of Oprah is rerun some months later? You'd like
> >> to get the correct guest list, but you don't want to keep around every
> >> posting made about every show ever aired to get it.
> > 
> > This right here is probably the best argument that's been made yet for
> > separating the program and airing data; the problem is how to store the
> > program data.
> > 
> > It's harder to justify pawning *that* off on NNTP companies.  Perhaps
> > Google would take on *that* part of the problem?  :-)  Perhaps we can
> > create a protocol for semantic tagging of that data in such a fashion
> > that the production companies can publish it on their own website in a
> > locatable fashion.
> 
> It certainly makes sense for program data to be looked up on demand
> (pulled), but scheduling data to be pushed, since program data has
> a long shelf life, while scheduling data only becomes available at
> most a few weeks before it becomes worthless. 

I agree with this, and your reasoning... though I don't know if it
becomes necessary to utilize separate distribution systems for them;
it's push *from the station*, but *everything* is pull from the viewer;
we might also put those listings up as expirable, supersedeable NNTP
postings, and have people's machines cache them for a month as well.

>                                                  But while having each
> production company *generate* data about its own programs makes
> perfect sense, relying on every production company to serve its own
> program data is less attractive. For one thing, some production
> companies probably won't do it very well, if at all (remember, some
> will be niche players in some local market that just can't be bothered
> with another thing to do). More importantly, getting every production
> company in the world to support exactly the same file formats and
> search interface seems pretty unlikely in the near term.

Indeed.

> What seems more likely to work in my lifetime is a single central
> database of shows and episodes. I just found out about thetvdb.com,
> which is exactly that. It seems to be open to scraping, and has
> detailed info about a large number of TV series and episodes,
> including episode thumbnails and some really nice banner images, and
> has an XML interface. It even has info about non-US shows.

Hmm.  See also tv.com, tvrage.com, tviv.com, and as people noted
earlier, Wikipedia.

> But a little math gives one pause. If we're designing an architecture
> for TV schedule distribution into the future, I think we'd better plan
> for growth. Other free and commercial DVR systems will probably want
> to piggyback on it, and DVR use seems likely to rise worldwide.

This was why the only thing about David Lonie's burst of motivation
that was questioning on was the *name*.  :-)

>                                                                    So I
> think we'd better plan for at least 10M users worldwide (seems low to
> me, even only 10 years from now). If on average each of them looks up
> info on 20 channels times 20 programs per channel each day, that's 4
> billion searches per day, or close to 50K complete DB searches per
> second.
>
> But if there were a central site for each country or region to collect
> schedule information from stations, and program info from someone
> like thetvdb.com, they wouldn't put much load on the stations or
> the database. They could then assemble the detailed info into neat
> packages and distribute it through NNTP or P2P, which wouldn't put
> much load on the central sites, either.

Yep.  "Those who do not *understand* Usenet are condemned to reinvent
it.  Poorly."  Henry Spencer at UTZoo, I think.

> >> I do think there's a strong argument for having a central site
> >Yeah, but then, again, *someone* gotta run it.
>
> There are a lot of volunteer-run sites that require more effort
> than this is likely to need, at least once it's up and running and
> provided the stations cooperate. Just in Australia for providing TV
> schedule info, I can point you to oztivo (oztivo.net) and shepherd
> (http://svn.whuffy.com/index.fcgi/wiki). (BTW, shepherd *does*
> use thetvdb.com and imdb.com to augment program listings.) Or,
> as suggested in another thread, it could be run as an ad-driven
> commercial venture.

True.

> >> From the central site, the data could be distributed by NNTP, UUCP,
> >> FTP, HTTP, P2P, or carrier pigeon.
> >
> > Does RFC 1149 have the bandwidth for that?
>
> ROTFL. Hey, if a swallow can carry coconuts....

What is the coconut carrying capacity of an unladen swallow, anyway?

Cheers,
-- jra
-- 
Jay R. Ashworth                   Baylink                      jra at baylink.com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com                     '87 e24
St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274


More information about the mythtv-users mailing list