[mythtv-users] Duplicate recordings because of bad SD data?

f-myth-users at media.mit.edu f-myth-users at media.mit.edu
Wed Jul 21 22:18:20 UTC 2010


    > Date: Wed, 21 Jul 2010 16:47:56 -0500
    > From: Robert Eden <rmeden at yahoo.com>

    > On 7/21/2010 1:05 PM, f-myth-users at media.mit.edu wrote:
    > > As far as I can see, TMS is occasionally forgetting that they have a
    > > particular episode in their database and are gensyming a new description
    > > and ID for it.
    > >   
    > When I spoke to TMS about this, new Episodes-IDs are also generated for 
    > significant program edits.  For example, "The Apprentice" would show one 
    > show on NBC and then an "extended version" on CNBC.  Both would get 
    > different Episode-IDs.  Sometimes the edits are not apparent to us, but 
    > Tribune knows all. :)

Well, the cases I've seen, I've typically compared the CC data 1:1.
Either the edits sufficiently are small that I can't eyeball them, or
they're to things that don't show up in the CC data and only show up
in the video.  (In some/many cases I could actually compare all of the
video as well, but it would take essentially as long as it would take
to watch all the possible "versions" and with little CC data change,
it hasn't been worth it.  Especially since I figured nobody cared.)
Also, the cases I've seen typically have the differing ID's on the
same channel as the original airing.

Note that it's often not possible to just diff the CC data directly,
since often its formatting changes from run to run (where the line
breaks are, etc).  I have tools and regexps to make this easier, but
it still requires by-hand identification of the commercials, so it's
still labor-intensive.  (My tools compute a percent-similarity, but
that's not a format that's easily human-readable to spot the specific
differences.)

But for the ones for which I've actually by-hand diffed the CC's and
looked at the whole program, I haven't detected a difference I couldn't
attribute to the sorts of fuzz you often get with CC'ed (the occasional
dropped char or whatever), so hearing that Tribune really thinks these
deserve new programids is surprising.

[I -did- just recently notice a brand-new ID for one program that
aired with commercials on the Travel Channel and then, a year later,
on PBS.  Since there's likely 25% more content for the latter, that
seems a legitimate change and I wish they'd do more of that---but it
seems that usually they do the opposite!  E.g., they take something
airing commfree and with-comms and assign them -both- the same ID,
meaning that I won't pick up a commfree version later if I've recorded
the original with comms, and that's kinda annoying...]

    > I've submitted issues like this in the past and would be happy to do so 
    > again.  The only thing I request is you provide raw data evidence to 
    > rule out MythTV problems. (XMLTV format is fine, it doesn't store stuff 
    > so it can't make things up).  That's tough  with Myth because it doesn't 
    > have a way to store the downloads.  Some people scheduled an XMLTV run 
    > at the same time as the Myth load just for evidence purposes. (you can 
    > also load Myth with XMLTV of course)

I store all my downloads so I can diagnose problems, so I can do this.
They're in the XML that arrives from SD.  (In fact, I manually shove
that data -into- Myth, so the download happens once only, and that's
what Myth is using, so it's also guaranteed that the data didn't
change behind my back.)

But since such repeats with different IDs usually occur long after the
first airing (months), I'd have to watch the new one -and- rewatch the
old one (if available) just to verify.  Would CC data from both be
convincing enough?  I always have both versions of that available.

    > One benefit of the new software load last week is we should now get 
    > mid-day updates, just like Zap2IT.com.  Hopefully the data will be 
    > fresher now.

That'd also be nice, but the problems I've seen seem to have nothing
to do with data freshness, unless the incorrect programid/description
get corrected, on average, less than one day in advance.

[I only wish that many channels wouldn't have generic descriptions
until almost the last minute, and that TCM would also include their
shorts in their TMS/SD data and not just on the website, and lots of
other things that are upstream of TMS...]


More information about the mythtv-users mailing list