[mythtv-users] Mooting architecture for a DataDirect replacement
Jay R. Ashworth
jra at baylink.com
Mon Jun 25 13:53:06 UTC 2007
On Sun, Jun 24, 2007 at 01:11:17PM -0400, Rod Smith wrote:
> On Sunday 24 June 2007 02:45, Peter Schachte wrote:
> > Rod Smith wrote:
> > > and/or to not apply earlier changes that would overwrite more
> > > recent ones.
> >
> > That only works if your updates are always complete replacements, never
> > just corrections. Otherwise if one correction updates the end time of a
> > program and the next one changes the subtitle, but they arrive out of order
> > and so you don't apply the first change, then you wind up with the wrong
> > end time.
>
> That does bring up a question that needs answering before such a
> system could be formally designed: Just what form do updates take?
I was planning on an update message to carry one or more XML wrapped
collections of scheduling and program data fields, which might be ADD,
REMOVE, or REPLACE (which is actually a special case, and might be
dropped -- except that I don't think you can; see below).
Each message will be globally signed by a private key owned by the
originator; public keys will be published by a standard protocol (a
standard URL on a station's website, for example).
I'm of two minds about whether the location of the public key should be
*in* the message; I suspect not, though a digest of the public key
could be usefully included.
> They could be similar to diff files (or be ACTUAL diff files, for
> that matter), changes to individual fields in the file (as you
> imply in the above), changes to multiple fields in the file, or
> complete replacement files. Each possibility brings a different set of
> challenges.
Indeed, and Peter raises an excellent point that I missed initially,
even though I had planned to make "overlay" updates possible.
(My apologies for dropping out; I took an actual weekend off. :-)
> Before answering this question, we'd need to know how often changes
> are likely to be propagated through the system. If there's an average
> of one update per lineup per week, then the overhead of providing
> updates as complete replacement files would be minor (particularly
> if each file is small, such as an individual show's description),
> and the simplification of everything that cascades from it would
> make it worthwhile to do it this way. If there are an average of ten
> updates per station per day, fixing minor things like typos in program
> descriptions, then some other update method makes sense and the logic
> to handle it would have to be correspondingly more complex.
The assumptions I personally have been carting around were these:
1) Most updates will come directly out of the automation software (or
something which drives it) at the station/network), as they're put in.
I'm trying to avoid being in a position where we have to
convince the stations to *put in another box*, which sharply
limits what we can expect of the sources; I can't see that
they're gonna batch stuff up. This sort of update will usually
be simply a replacement record -- I can't see there being a
problem with leaving things hanging; schedule slots rarely
change to "empty".
These *could* be +this, -that type diffs instead, but I'm not
yet sure how deeply it will be possible to hook the schedule
databases.
2) Some updates may come from external sources (production companies
may see fit to send out more comprehensive program descriptions at some
point, we might talk TV Barn's Aaron Barnhart into sending out his talk
show guest updates as overlays, individuals in specific communities may
want to send out flash updates as they hear things, for those who
choose to accept updates signed with *their* keys, etc.
Many of these, though not all, may be overlay updates, and
obviously, these won't be able to say what they're replacing,
and so they can't be +this -that; they have to be =something-else.
> > And there's still nothing you can do about a posting that never
> arrives.
>
> *ONCE AGAIN*: There's par2, which is designed for precisely this
> purpose. It provides one or more data-recovery postings that enable
> the recreation of missing data, similar to the way certain RAID levels
> work. Actually using this technique for this particular application
> would take some careful planning (you'd need to create reasonable
> groupings of postings to which a given set of par2 files apply), but
> it ought to work.
Yeah, but given the possibility of multiple valid update sources for a
given program's data (which is a possibility I admit I probably haven't
mentioned until now, but that I think would be useful in several
circumstances including those noted above)... I don't think we can.
Or at least, not authoritatively.
There's also a question of how much data we can expect the *client* to
keep around (update timestamps on each column, for example), which
might be a *slightly* more tractable problem.
Or, we could just provision our own local database on the client, and
then push out of that to Myth/etc.
> Of course, even this will break down if the user's
> NNTP server is sufficiently unreliable, but when it gets to that point
> the solution is for the user to find a better NNTP server.
You bet it is. And, come to think of it an easy fix for the common
case is just serial number the updates; you may not know what you
missed from the station, but at least you'll know you missed something
And as long as a) it did get to the server, which the server side
client can work harder on and b) the serial number is in the subject
header, then you're golden.
> One advantage of Usenet is that, in the modern world, files get
> distributed quickly, and it's not unreasonable for clients to check
> their local servers for updates very frequently. This would enable
> updates to be disseminated quickly, so our boxes could become aware
> of last-minute schedule changes -- possibly even things like sporting
> events going into overtime, if somebody were to keep an eye on it and
> issue predictive updates. (That would open another can of worms, but
> at least the infrastructure could handle it.)
Yes; this is another of my hobby horses... though the client PVR
software would have to deal with recordings differently than I suspect
they currently do...
> I honestly don't know
> as much about BitTorrent or other P2P protocols, so I don't know how
> quickly such changes could be propagated via that method.
My intuition is that the whole tracker layer makes it unwieldy at best.
As someone else noted, BT is suited much more for relatively few large
files which don't change often, none of which are characteristics of
our update packets as I currently envision them.
Cheers,
-- jra
--
Jay R. Ashworth Baylink jra at baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://baylink.pitas.com '87 e24
St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274
More information about the mythtv-users
mailing list