[mythtv-users] Mooting architecture for a DataDirect replacement
Rod Smith
mythtv at rodsbooks.com
Fri Jun 22 16:52:41 UTC 2007
On Thursday 21 June 2007 20:19, Christopher X. Candreva wrote:
> On Thu, 21 Jun 2007, Jay R. Ashworth wrote:
> > I was merely trying to propose an architecture that would make
> > practical the distribution of the load of 200,000 Mythboxen looking for
> > guide data every day. NNTP would.
>
> I used to run an NNTP system. I'm going to assume INN or it's replacement
> has gotten better, but it wasn't easy, and there is going to be
> significantly less experience in it today, especially most ISPs are
> outsourcing it.
I believe the suggestion is to use EXISTING NNTP servers -- those operated by
users' ISPs, etc. In theory, nobody associated with MythTV or any new TV
listings project would need to start running a new NNTP server for this
scheme to work. (In practice, it might be desirable -- say to provide a
server to which TV stations could upload their data if they don't already
have access to an NNTP server.)
In another message, Rich West wrote:
> Not sure if it has been pointed out (dons flame retardant suit now), but
> NNTP is not accessible by everyone easily. Many ISP's block it
> entirely, some offer it (Comcast) but require you to pay extra to access
> it, while some do allow for it on a limited basis. Are there free news
> servers out there? Sure. But I know of a few people who have Comcast
> who can't even get those newsfeeds due to Comcast blocking them.
This is shocking. I was aware that some ISPs were moving away from providing
Usenet access as a standard feature, but I'd never before heard of an ISP
actively blocking it. Do you have a reference to a news story, Comcast policy
page that describes this, or the like?
Even if this is a moderately widespread problem, I don't think it's
necessarily a show-stopper for the idea. Google provides Web-based newsgroup
access, so in principle the data could be obtained either via NNTP or via
HTTP, although the latter might not be efficient and Google might object --
or simply not carry the necessary newsgroup(s). I'm sure it would also be
possible to set up an NNTP server that uses a non-standard port, proxy
through SSH, or whatnot. Those are extra hoops to jump through, but it could
be done. I don't know if any of the third-party Usenet services already
provide such options. If so, we wouldn't need to worry about it, except that
the developers would have to include appropriate code to support these access
methods, or at least document how to do it using other tools. An alternative
to non-standard ports or proxies might be to set up a service that provides
the data in a form similar to the current method; the service in question
would simply download the data from a news server and then repackage it. This
would require few or no changes to the current MythTV code, but would require
centralized resources, so it'd probably have to be a subscription service.
More generally, I've been contemplating the concept of using NNTP for data
dissemination, and I'm seeing more and more subtle positive points to it.
I've also got a few random observations and questions:
- I know nothing about how TV stations manage their schedules. Presumably
they've got some sort of internal database with the information. If
stations use a limited number of software packages and data formats, then
open source programmers could write software to take this data and export
it to an NNTP server. It would then become very easy, from a technical point
of view, for TV stations to provide the data directly. (Whether the average
TV station would want to trust community-provided open source software is
another matter, of course.) Can anybody comment on how TV stations manage
this data to begin with?
- Assuming a large enough group of stations provide the data themselves,
individuals with sufficient interest could help by being "gap-fillers."
Such people could provide at least minimal data for specific stations;
just format it correctly and upload it to their local news servers.
- Similar to the above, interested individuals could help fill in gaps in
existing but incomplete guide data. For instance, entering episode numbers
to help schedulers avoid recording repeats when the TV stations neglect
such information.
- Depending on licensing terms, even if an individual station didn't want
to provide the data directly, a station might not object to one person
screen-scraping data from a Web site and providing it via the proposed
Usenet architecture (or some other means, for that matter).
- An awful lot of what's broadcast is repeats, movies, etc. I wonder if
some way of separating the schedule data from the program data would
be desirable, or at least providing a community-run database of
program data (similar to CDDB for CD data or IMDB). If we as a community
provide such a resource, or if we can leverage off of IMDB, that might
be helpful. The NNTP-provided schedule data could either embed basic
descriptions from the database or refer to the database entry. (The
latter would probably be more complex on the client side and would of
course greatly increase the load on the server.) Part of my thinking
here is that, to the extent that program descriptions can be considered
copyrighted (yes, I know this is debatable), if we provide our own
database that's under an open license, we won't need to worry about
that if/when we as individuals need to provide data for stations that
don't want to "play ball" -- at least not for programs with entries in
our database. This particular point goes well beyond the NNTP issue,
of course; a user-generated descriptions database could be useful no
matter what the guide data distribution method. Running it would of
course require resources, though.
- You'd want some sort of key signing to prevent malicious insertion of
incorrect data into the stream (somebody claiming an X-rated movie is
actually a popular children's cartoon, say).
- The big draw to this idea, IMHO, is that it uses an existing network
of NNTP servers, which are paid for by subscribers' Internet access
fees (or by separate subscription fees to SuperNews or the like). Thus,
there'd be little or no monetary cost to it. (To the extent that some
ISPs actively block NNTP, though, this might be less of an advantage.)
OTOH, some peoples' ISPs provide NNTP access only as an added-cost option
or not at all, so some people would have to subscribe to a third-party
Usenet provider.
- I wonder how much of a problem spam would be. Obviously, key-signing
can help prevent malicious data from doing damage, but spammers could
waste a lot of bandwidth. The code to parse the data would also have
to be particularly free of security-related bugs; you wouldn't want
somebody to find a loophole that would enable a message to carry a
Linux program that would automatically run once the message is
downloaded. Perhaps using the Usenet moderation system would be
desirable to help minimize these risks -- but that would slow
propagation of messages.
--
Rod Smith
http://www.rodsbooks.com
More information about the mythtv-users
mailing list