[mythtv-users] Mooting architecture for a DataDirect replacement

Jay R. Ashworth jra at baylink.com
Fri Jun 22 17:37:42 UTC 2007


On Fri, Jun 22, 2007 at 12:52:41PM -0400, Rod Smith wrote:
> On Thursday 21 June 2007 20:19, Christopher X. Candreva wrote:
> > On Thu, 21 Jun 2007, Jay R. Ashworth wrote:
> > > I was merely trying to propose an architecture that would make
> > > practical the distribution of the load of 200,000 Mythboxen looking for
> > > guide data every day.  NNTP would.
> >
> > I used to run an NNTP system. I'm going to assume INN or it's replacement
> > has gotten better, but it wasn't easy, and there is going to be
> > significantly less experience in it today, especially most ISPs are
> > outsourcing it.
> 
> I believe the suggestion is to use EXISTING NNTP servers -- those operated by 
> users' ISPs, etc. In theory, nobody associated with MythTV or any new TV 
> listings project would need to start running a new NNTP server for this 
> scheme to work. (In practice, it might be desirable -- say to provide a 
> server to which TV stations could upload their data if they don't already 
> have access to an NNTP server.)

Partially, yes.  I can see people like Yeechang Lee running servers for
the appropriate hierarchy to which they permit access by others -- at
least other Myth users.

> In another message, Rich West wrote:
> > Not sure if it has been pointed out (dons flame retardant suit now), but
> > NNTP is not accessible by everyone easily.  Many ISP's block it
> > entirely, some offer it (Comcast) but require you to pay extra to access
> > it, while some do allow for it on a limited basis.  Are there free news
> > servers out there?  Sure.  But I know of a few people who have Comcast
> > who can't even get those newsfeeds due to Comcast blocking them.
> 
> This is shocking. I was aware that some ISPs were moving away from providing 
> Usenet access as a standard feature, but I'd never before heard of an ISP 
> actively blocking it. Do you have a reference to a news story, Comcast policy 
> page that describes this, or the like?

Yep; that's what I asked him, too.

> Even if this is a moderately widespread problem, I don't think it's 
> necessarily a show-stopper for the idea. Google provides Web-based newsgroup 
> access, so in principle the data could be obtained either via NNTP or via 
> HTTP, although the latter might not be efficient and Google might object -- 
> or simply not carry the necessary newsgroup(s). I'm sure it would also be 
> possible to set up an NNTP server that uses a non-standard port, proxy 
> through SSH, or whatnot. Those are extra hoops to jump through, but it could 
> be done. I don't know if any of the third-party Usenet services already 
> provide such options. If so, we wouldn't need to worry about it, except that 
> the developers would have to include appropriate code to support these access 
> methods, or at least document how to do it using other tools. An alternative 
> to non-standard ports or proxies might be to set up a service that provides 
> the data in a form similar to the current method; the service in question 
> would simply download the data from a news server and then repackage it. This 
> would require few or no changes to the current MythTV code, but would require 
> centralized resources, so it'd probably have to be a subscription service.

Yep; akin to my observation that this could be done for human eyeballs
as an ad service.

> More generally, I've been contemplating the concept of using NNTP for data 
> dissemination, and I'm seeing more and more subtle positive points to it. 
> I've also got a few random observations and questions:
> 
> - I know nothing about how TV stations manage their schedules.
> Presumably they've got some sort of internal database with the
> information. If stations use a limited number of software packages and
> data formats, then open source programmers could write software to
> take this data and export it to an NNTP server. It would then become
> very easy, from a technical point of view, for TV stations to provide
> the data directly. (Whether the average TV station would want to trust
> community-provided open source software is another matter, of course.)
> Can anybody comment on how TV stations manage this data to begin with?

Yeah, me.  :-)  As I've noted in a couple other posting, there are,
maybe, 2 dozen companies that produce automation scheduling packages
that control the playout of channels to 'air'.  Those systems, by
necessity, already have *unambiguously accurate* data in them, and any
changes to it are "real" changes.

Ripping it out of those systems and posting it is the best answer.

I've already inquired of the one I know best; more when I know
more<tm>.

> - Assuming a large enough group of stations provide the data
> themselves, individuals with sufficient interest could help by being
> "gap-fillers." Such people could provide at least minimal data for
> specific stations; just format it correctly and upload it to their
> local news servers.

Yup.  Confidence factor becomes one of the data items.

> - Similar to the above, interested individuals could help fill in
> gaps in existing but incomplete guide data. For instance, entering
> episode numbers to help schedulers avoid recording repeats when the TV
> stations neglect such information.

Since I propose to have the data sources PKI sign their postings, you
could decide whose updates you trusted...  A protocol for partial-data
overlaying would need to be decided upon.

> - Depending on licensing terms, even if an individual station didn't
> want to provide the data directly, a station might not object to one
> person screen-scraping data from a Web site and providing it via the
> proposed Usenet architecture (or some other means, for that matter).

Perhaps.  I can't see why a station would *not* want it's local viewers
to have the most accurate up-to-date data about what's scheduled.
Indeed: they might see fit to put data about promercials in their local
scheduler, which TMS might ignore... but if the data's coming direct
from the station, you'll get it.

> - An awful lot of what's broadcast is repeats, movies, etc. I wonder
> if some way of separating the schedule data from the program data
> would be desirable, or at least providing a community-run database
> of program data (similar to CDDB for CD data or IMDB).

*This* is the hirsute part of the problem.  Optimally, you do want to
divorce PROGRAM data from AIRING data... but then someone has to
repository the PROGRAM data, and make sure everyone uses the same
(unique) primary key for each program -- which is still a weak spot
in the program distribution business as I understand it.

>                                                            If we as a
> community provide such a resource, or if we can leverage off of IMDB,
> that might be helpful. The NNTP-provided schedule data could either
> embed basic descriptions from the database or refer to the database
> entry. (The latter would probably be more complex on the client side
> and would of course greatly increase the load on the server.) Part
> of my thinking here is that, to the extent that program descriptions
> can be considered copyrighted (yes, I know this is debatable), if we
> provide our own database that's under an open license, we won't need
> to worry about that if/when we as individuals need to provide data for
> stations that don't want to "play ball" -- at least not for programs
> with entries in our database. This particular point goes well beyond
> the NNTP issue, of course; a user-generated descriptions database
> could be useful no matter what the guide data distribution method.
> Running it would of course require resources, though.

Yep.  And the cooperation of program-producers.  It's not that hard a
sell... but it's a sell.

> - You'd want some sort of key signing to prevent malicious insertion
> of incorrect data into the stream (somebody claiming an X-rated movie
> is actually a popular children's cartoon, say).

Yes; I'd planned on that (I think I mentioned it upthread).

> - The big draw to this idea, IMHO, is that it uses an existing network
> of NNTP servers, which are paid for by subscribers' Internet access
> fees (or by separate subscription fees to SuperNews or the like).
> Thus, there'd be little or no monetary cost to it. (To the extent
> that some ISPs actively block NNTP, though, this might be less of
> an advantage.) OTOH, some peoples' ISPs provide NNTP access only as
> an added-cost option or not at all, so some people would have to
> subscribe to a third-party Usenet provider.

Sure.  But that's infrastructural.  And it's *standardized*
infrastructure, so you *can* "just grab a feed from anywhere".  The
trick is to pre-process and pre-structure the feed *just enough* to
make that workable, based on the smarts of the servers and the
abilities of the clients -- I'm mooting more than just Myth as a
target, obviously.

> - I wonder how much of a problem spam would be. Obviously, key-signing
> can help prevent malicious data from doing damage, but spammers could
> waste a lot of bandwidth. The code to parse the data would also
> have to be particularly free of security-related bugs; you wouldn't
> want somebody to find a loophole that would enable a message to
> carry a Linux program that would automatically run once the message
> is downloaded. Perhaps using the Usenet moderation system would
> be desirable to help minimize these risks -- but that would slow
> propagation of messages.

Indeed.  I think if everything is required to be signed, and you close
the trust loop properly (in a fashion I'm still mulling over) then
you're ok there.  Bandwidth isn't really my issue: if necessary, you
can put the signature in the subject header (or something else which
can be retrieved by clients without full bodies).

This is the level of conversation I've been hoping for on this idea,
Rod; thanks for climbing on.

Cheers,
- jra
-- 
Jay R. Ashworth                   Baylink                      jra at baylink.com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com                     '87 e24
St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274


More information about the mythtv-users mailing list