[mythtv] [mythtv-commits] Ticket #2678: Duplicate checking of "unidentified episode"s is broken

Bruce Markey bjm at lvcm.com
Tue Nov 14 19:53:58 UTC 2006


Michael T. Dean wrote:
> On 11/13/2006 11:23 AM, Mark Buechler wrote:
>>  On 11/13/06, Michael T. Dean <mtdean at thirdcontact.com> wrote:
>>> On 11/12/2006 10:28 PM, MythTV wrote:
>>>> This is probably a dupe of 2677.
>>>>
>>>> Setting up a search rule which happens to match "Unidentified
>>>> Episode"s, records duplicates.  It acts like it does not bother
>>>> to compare the title and descriptions when then programid ends
>>>> with "0000".
>>>>
>>> By design.  If a programid exists, duplicate matching type is
>>> ignored.
>>>
>>> http://www.gossamer-threads.com/lists/mythtv/dev/67377#67377
>>> http://www.gossamer-threads.com/lists/mythtv/dev/180855#180855
>>>
>>>> For example, I have a search rule to record shows begining with
>>>> "Fast Flights:".  Under this rule it has already recorded "Fast
>>>> Flights: British Isles".  Even so, the scheduler is recording
>>>> every single instance of "Fast Flights: British Isles".  All of
>>>> the episodes it is repeately recording have the same description:
>>>> "A 500-mile aerial excursion over the UK."  They also all have
>>>> the same programid: SH8505190000.
>>> I'm sure Bruce could explain how to set up your rule for this
>>> show... :)  Sorry, I don't know how.
>>  This is very interesting and may explain many of my problems. I have
>>  several cases where one of my sources has programid's and the other
>>  (EIT) does not. In a given schedule, is duplicate checking disabled
>>  if ANY matching program has a programid or if ALL matching programs
>>  have programid's?

Duplicate checking isn't disabled per se 'tho one of the methods
is to not match duplicates and generic episodes should not normally
be treated as matching. It isn't the presence or absence of the
programids that determine if two showings are or are not a match.
If both showings have programids then this info is assumed to be
authoritative so the strings are not matched. If either is blank
then the subtitle and/or description are checked. Note that ""
vs "" can not be used to decide if something is a match.

Duplicate matching would be fairly simple except for the fact
that the stations sometimes don't report which episode will be
shown and only say that some episode of the series will be shown.

      title: Seinfeld
   subtitle: The Busboy
description: A busboy loses his job and his cat because of George.
  programid: EP0169160126

      title: Seinfeld
   subtitle: The Marine Biologist
description: George poses as a marine biologist; guest Carol Kane.
  programid: EP0169160089

      title: Seinfeld
   subtitle:
description: Jerry and his friends face life in New York.
  programid: SH0169160000

The first two are specific episodes. The last is a generic
placeholder. Each showing may or may not be different episodes.
It might be another showing of "The Busboy" or it might be
"The Soup". Therefore, the default behavior is to record all
of the generics. Some stations may have series with nothing but
generic episodes. If these were considered as matches, it would
record the first random episode then never record again.

This is slightly off topic but we have a slick way to deal with
current events shows that are repeated and have all or mostly
generic descriptions. We may know the pattern even if the listings
don't tell the system what is going on. This is where FindDaily
and FindWeekly come in. TV Guide Channel has a weekly review of
reality shows:

mysql> select starttime,title, programid from program where title = 'Reality Chat'\G
*************************** 1. row ***************************
starttime: 2006-11-12 19:00:00
    title: Reality Chat
programid: SH8156050000
*************************** 2. row ***************************
starttime: 2006-11-13 23:00:00
    title: Reality Chat
programid: SH8156050000
*************************** 3. row ***************************
starttime: 2006-11-14 16:00:00
    title: Reality Chat
programid: SH8156050000
*************************** 4. row ***************************
starttime: 2006-11-19 19:00:00
    title: Reality Chat
programid: SH8156050000
*************************** 5. row ***************************
starttime: 2006-11-19 22:00:00
    title: Reality Chat
programid: SH8156050000
*************************** 6. row ***************************
starttime: 2006-11-21 16:00:00
    title: Reality Chat
programid: SH8156050000

These all have the same generic description, however, I know
that the new weekly version is at 7pm on Sunday and the others
during the week are repeats. I set a FindWeekly rule for a 7pm
showing on Sunday and the system then knows that it should
record any one showing between 7pm Sunday and 7pm the following
Sunday.

But back on topic...

If there are no programids, a generic episode is fairly obvious;
subtitle "" and description "". It would be nice if every episode
of every series had subtitles and descriptions but that's not
the case. It would even be nice if all series followed the same
conventions but they don't.

Many shows do have subtitles and descriptions pulled from the
same database where dup matching is unambiguous. Most shows
have subtitles and checking just these would be enough. However,
sometimes there are two part episodes where the same subtitle
is used with a different description. In most cases where there
are subtitles, the generic episodes have a blank subtitle and
the same generic description like the "Seinfeld" example above.
However, there are shows like "Reno 911!" where there are no
subtitles and each description describes a unique episode.

mysql> select title, subtitle, description, programid from program where title = 'Reno 911!'\G
*************************** 1. row ***************************
      title: Reno 911!
   subtitle:
description: The deputies hang out with the host of a children's show when some of the staff is quarantined with a possible S.A.R.S. infection.
  programid: EP5858760036
*************************** 2. row ***************************
      title: Reno 911!
   subtitle:
description: A mishap with the mayor's child puts the officers on alert for contraband fireworks.
  programid: EP5858760004
*************************** 3. row ***************************
      title: Reno 911!
   subtitle:
description: Lt. Dangle declares a zero tolerance policy only to find that he's the only criminal in town.
  programid: EP5858760001
*************************** 4. row ***************************
      title: Reno 911!
   subtitle:
description: The deputies attempt to catch the fastest criminal in the southwestern United States.
  programid: EP5858760035

Here we can not assume a blank subtitle means that the description
is generic as with most other shows.

For string matching, there can not be a one size fits all method.
We need to look at the upcoming episodes list to determine which
of the dupmethods will do the best job of identifying unique
episodes.

> OK, let me give a bit more detail.  AIUI, if duplicate matching is on, 
> the program's title is compared against records in oldrecorded.  Then, 
> if the program's programid is empty and the program is not a generic 
> episode, the program's programid is compared against records in 
> oldrecorded.  Otherwise, if the program is not a generic program and the 
> program's or the oldrecorded program's programid is blank, it checks for 
> duplicates based on dupmethod.  The same checks are done against 
> recorded programs (i.e. those that haven't been deleted) that are not 
> LiveTV recordings.

The same checks are done internally by the scheduler when
checking two upcoming showings against each other. These result
in either "E" or "L". A match in 'recorded' results in "R" and a
match with 'oldrecorded' gets marked as "P".

> In John's case, the program is a generic program (based on one possible 
> definition of generic, where the program is a series and programid ends 

Agreed. If the category type was something other than series,
it would have treated each title/programid like s special or
movie and would only record one showing.

> with '0000'), so it is excluded from duplicate matching.  Myth /always/ 
> records generic episodes (and will do so over and over again--even if an 
> "identical" generic episode was recorded and never deleted) since they 
> may be episodes the user hasn't seen, but about which TMS was not 

Don't shoot the messenger. It's most likely the station that
is not reporting the episode info to TMS. Fox Reality listings
only have episode info for their original series and generics
for all the syndicated shows but if I go to their web site, they
have episode info for every show throughout the day (grumble).
The Daily Show never gave episode info until they had Bill Clinton
as a guest and have been trying to give episode info ever since.

> provided episode information.  While you can create a custom record rule 
> to exclude generic episodes, it sounds like, since all the episodes of 
> "Fast Flights" are given generic id's, doing so would effectively 
> disable the recording rule.

Right, these listings are bogus. It ought to be title "Fast
Flights" subtitle "British Isles" but nooOOoo.... ;-). This
is most likely the producers' fault. It shouldn't be marked
as a series because there is only one "Fast Flights: British
Isles". This would seem to be TMS' fault.

I don't have these shows in my listings so I can't play the
home game but here are some of the things I might do.

Normally titles like this are a limited series. If there were
only three to six "Fast Flights: %" I'd Single record each.

If the first showing was at the same time each week, I'd make
a rule to record this title pattern in that time slot

Rule Name: Fast Flights

program.title LIKE 'Fast Flights: %'
AND DAYNAME(program.starttime) = 'Tuesday'
AND HOUR(program.starttime) = 20

I might try recording the first showing in the listings for
each Fast Flights

Rule Name: Fast Flights

program.title LIKE 'Fast Flights: %'
AND program.first > 0

However, if one of the shows is not on for a few days or weeks
then repeated, the first showing in the listings would again be
marked as 'first'. I could mark these as "Don't record" but if
they happen a lot, I might set this rule to Inactive and choose
"Record anyway" when I see an episode that I hadn't recorded.

Finally the ugliness that gets right to the point, match only
showings where there is no 'oldrecorded' entry which has the
duplicate flag set.

Rule Name: Fast Flights

Additional Tables: LEFT JOIN oldrecorded ON (program.programid =
oldrecorded.programid AND oldrecorded.duplicate = 1)

program.title LIKE 'Fast Flights: %'
AND oldrecorded.duplicate IS NULL

This is the sure fire solution but none of this would be
necessary if this rare case had been listed in a reasonable
way. The closest thing I'd seen was "Discovery Atlas: Podunk",
"Discovery Atlas: Bumfuk, Egypt" but these weren't marked as
series and a simple title search rule worked just fine. I now
see that Discovery Times has this straightened out and lists
these as "Discovery Atlas" with episodes "Podunk Revealed" and
"Bumfuk, Egypt Revealed".

> Note that programs may also be marked as generic if programid, subtitle, 
> and description are blank.  Both of these definitions of generic come 

Note that generics aren't something new and have appeared in
TV Guide and such for over fifty years. MythTV has taken generics
into account for maybe four years. The recently added 'generic'
field is just a recent optimization. We can identify showings
that are generic at the end of the mfdb listings update so we
don't need to identify them again every time the scheduler runs.


> from mythfilldatabase.  I have no idea whether/when EIT grabbers mark 
> episodes as generic.

Marking is done at the end of mfdb so we can't say that EIT
does it but listings grabbed by EIT may be flagged as generic
the next time mfdb runs. In the meantime showings that really
are generic won't match even though they haven't been optimized
to say "don't bother checking because we know that this is a
generic".

--  bjm



More information about the mythtv-dev mailing list