[mythtv-users] Mooting architecture for a DataDirect replacement

Mon Jun 25 18:25:52 UTC 2007

On Monday 25 June 2007 13:40, David Brodbeck wrote:
> On Jun 25, 2007, at 10:21 AM, Rod Smith wrote:
> > An alternative, of course, would be to run a news server ourselves.
> > Even a home box with Leafnode would probably be adequate if the
> > need for a
> > backup was light, although if it got too much traffic something
> > bigger would
> > be required.
>
> I see pain down that route.  While you might bill it as a "backup,"
> it will quickly become a "primary" as people realize it's less work
> to throw 'nntp.mythtv.org' (or whatever) into the configuration file
> instead of trying to figure out what their ISP's news server is and
> configure authentication.  I know I'd be tempted, especially since
> I've had ISPs whose Usenet servers broke on a regular basis,
> sometimes for weeks at a time.

That is a risk. One possible way to manage it would be to limit the amount of 
data that can be downloaded in a given period from this server for any given 
user (tracked by IP address, probably). This limit could be raised for users 
with special accounts -- either administrators or, if we wanted to get into 
it, people with paid accounts. Tracking by IP address is an admittedly 
imperfect solution, but people who get around it will do so either by chance 
(their ISP changes their IP address regularly) or by expending more effort 
than would be required to enter the correct NNTP data to begin with.

> One thing that isn't clear to me with this setup is how the client
> can ever be sure it has a complete schedule.  This is inherent in the
> DataDirect model; you download the XML file, if the download is
> complete then you have everything the server is giving you.  With the
> NNTP model you can never be sure you have all the necessary posts.

A couple of observations:

- If you've got gaps in your schedule, you're missing something.
  (Presumably you'd want some sort of explicit "to be announced"
  gap filler entry to cover times when the station doesn't yet
  know what'll be aired in a time slot.)

- If you simply miss an update posting, you *WILL* have data for
  that time slot, but it'll be inaccurate and you might not know
  it. My suspicion is that this will usually be fairly non-dire;
  updates are likely to do things like change episode titles in
  series, fill in details on who's the guest on a late-night talk
  show, etc. The original data is likely to still be valid, just
  not as complete as you might like.

It might also be possible to post periodic messages that summarize all the 
posts, perhaps by the Message-ID header. Such a post would say, 
essentially, "here are the Message-IDs of posts for WGBH for the period 
1/1/08-1/8/08: blah blah blah blah blah blah blah". If the clients keep a log 
of this (including both the messages they apply and those they deliberately 
don't apply), then this will alert them to missing data. Of course, somebody 
will have to generate such posts, presumably after checking at least two 
servers that are known to be reliable. That shouldn't be a big challenge, 
though.

> But I think we're putting the cart before the horse.  I see little
> reason to believe we can get the scheduling *data*, so figuring out
> how we're going to distribute it, while an interesting mental
> exercise, is probably moot.

There are efforts underway to figure that out, but I get the impression at 
this point it's a more behind-the-scenes sort of thing. Daniel Kristjansson 
posted an interesting idea in the "IMDB has TV listings" thread a couple of 
days ago, though: Have the users themselves generate the data in a 
decentralized way, with the help of some automation to create a "first draft" 
based on a station's regular schedule. Every once in a while you'd get an 
e-mail saying it's your turn to generate data and be required to fill it in, 
using the auto-generated draft as a starting point. Others would then do a 
peer review to weed out bad entries, and the results would go in a database 
that could be subsequently used for reruns. This would of course be a royal 
pain at first, but once most of the huge backlog of movies and old TV shows 
was entered, it would probably become fairly manageable. Personally I hope it 
doesn't come to this, but having to take part in such an effort would be 
better than having nothing at all.

-- 
Rod Smith
http://www.rodsbooks.com