[mythtv-users] Mooting architecture for a DataDirect replacement

Rod Smith mythtv at rodsbooks.com
Fri Jun 22 16:52:41 UTC 2007


On Thursday 21 June 2007 20:19, Christopher X. Candreva wrote:
> On Thu, 21 Jun 2007, Jay R. Ashworth wrote:
> > I was merely trying to propose an architecture that would make
> > practical the distribution of the load of 200,000 Mythboxen looking for
> > guide data every day.  NNTP would.
>
> I used to run an NNTP system. I'm going to assume INN or it's replacement
> has gotten better, but it wasn't easy, and there is going to be
> significantly less experience in it today, especially most ISPs are
> outsourcing it.

I believe the suggestion is to use EXISTING NNTP servers -- those operated by 
users' ISPs, etc. In theory, nobody associated with MythTV or any new TV 
listings project would need to start running a new NNTP server for this 
scheme to work. (In practice, it might be desirable -- say to provide a 
server to which TV stations could upload their data if they don't already 
have access to an NNTP server.)

In another message, Rich West wrote:

> Not sure if it has been pointed out (dons flame retardant suit now), but
> NNTP is not accessible by everyone easily.  Many ISP's block it
> entirely, some offer it (Comcast) but require you to pay extra to access
> it, while some do allow for it on a limited basis.  Are there free news
> servers out there?  Sure.  But I know of a few people who have Comcast
> who can't even get those newsfeeds due to Comcast blocking them.

This is shocking. I was aware that some ISPs were moving away from providing 
Usenet access as a standard feature, but I'd never before heard of an ISP 
actively blocking it. Do you have a reference to a news story, Comcast policy 
page that describes this, or the like?

Even if this is a moderately widespread problem, I don't think it's 
necessarily a show-stopper for the idea. Google provides Web-based newsgroup 
access, so in principle the data could be obtained either via NNTP or via 
HTTP, although the latter might not be efficient and Google might object -- 
or simply not carry the necessary newsgroup(s). I'm sure it would also be 
possible to set up an NNTP server that uses a non-standard port, proxy 
through SSH, or whatnot. Those are extra hoops to jump through, but it could 
be done. I don't know if any of the third-party Usenet services already 
provide such options. If so, we wouldn't need to worry about it, except that 
the developers would have to include appropriate code to support these access 
methods, or at least document how to do it using other tools. An alternative 
to non-standard ports or proxies might be to set up a service that provides 
the data in a form similar to the current method; the service in question 
would simply download the data from a news server and then repackage it. This 
would require few or no changes to the current MythTV code, but would require 
centralized resources, so it'd probably have to be a subscription service.

More generally, I've been contemplating the concept of using NNTP for data 
dissemination, and I'm seeing more and more subtle positive points to it. 
I've also got a few random observations and questions:

- I know nothing about how TV stations manage their schedules. Presumably
  they've got some sort of internal database with the information. If
  stations use a limited number of software packages and data formats, then
  open source programmers could write software to take this data and export
  it to an NNTP server. It would then become very easy, from a technical point
  of view, for TV stations to provide the data directly. (Whether the average
  TV station would want to trust community-provided open source software is
  another matter, of course.) Can anybody comment on how TV stations manage
  this data to begin with?

- Assuming a large enough group of stations provide the data themselves,
  individuals with sufficient interest could help by being "gap-fillers."
  Such people could provide at least minimal data for specific stations;
  just format it correctly and upload it to their local news servers.

- Similar to the above, interested individuals could help fill in gaps in
  existing but incomplete guide data. For instance, entering episode numbers
  to help schedulers avoid recording repeats when the TV stations neglect
  such information.

- Depending on licensing terms, even if an individual station didn't want
  to provide the data directly, a station might not object to one person
  screen-scraping data from a Web site and providing it via the proposed
  Usenet architecture (or some other means, for that matter).

- An awful lot of what's broadcast is repeats, movies, etc. I wonder if
  some way of separating the schedule data from the program data would
  be desirable, or at least providing a community-run database of
  program data (similar to CDDB for CD data or IMDB). If we as a community
  provide such a resource, or if we can leverage off of IMDB, that might
  be helpful. The NNTP-provided schedule data could either embed basic
  descriptions from the database or refer to the database entry. (The
  latter would probably be more complex on the client side and would of
  course greatly increase the load on the server.) Part of my thinking
  here is that, to the extent that program descriptions can be considered
  copyrighted (yes, I know this is debatable), if we provide our own
  database that's under an open license, we won't need to worry about
  that if/when we as individuals need to provide data for stations that
  don't want to "play ball" -- at least not for programs with entries in
  our database. This particular point goes well beyond the NNTP issue,
  of course; a user-generated descriptions database could be useful no
  matter what the guide data distribution method. Running it would of
  course require resources, though.

- You'd want some sort of key signing to prevent malicious insertion of
  incorrect data into the stream (somebody claiming an X-rated movie is
  actually a popular children's cartoon, say).

- The big draw to this idea, IMHO, is that it uses an existing network
  of NNTP servers, which are paid for by subscribers' Internet access
  fees (or by separate subscription fees to SuperNews or the like). Thus,
  there'd be little or no monetary cost to it. (To the extent that some
  ISPs actively block NNTP, though, this might be less of an advantage.)
  OTOH, some peoples' ISPs provide NNTP access only as an added-cost option
  or not at all, so some people would have to subscribe to a third-party
  Usenet provider.

- I wonder how much of a problem spam would be. Obviously, key-signing
  can help prevent malicious data from doing damage, but spammers could
  waste a lot of bandwidth. The code to parse the data would also have
  to be particularly free of security-related bugs; you wouldn't want
  somebody to find a loophole that would enable a message to carry a
  Linux program that would automatically run once the message is
  downloaded. Perhaps using the Usenet moderation system would be
  desirable to help minimize these risks -- but that would slow
  propagation of messages.

-- 
Rod Smith
http://www.rodsbooks.com


More information about the mythtv-users mailing list