[mythtv-users] Interesting false match

Bruce Markey bjm at lvcm.com
Fri Dec 23 14:30:59 EST 2005

Nick wrote:
> On 23/12/05, Bruce Markey <bjm at lvcm.com> wrote:
>> Al Mcintosh wrote:
>> ...
>>>>> I know the Titles are the same I was just commenting on the fact that
>>>>> they are different as one is House Type "movie" Catagory Drama and one
>>>>> House  type "Series" (SH688359) category Mystery.
>>>> What duplicate matching policy were you using?
>>> Check for duplicates in: All recordings
>>> Duplicate Check method: Subtitle and Description
>> My apologies, this has nothing to do with it. There are way too
>> many responses on this list by 'I'm not a doctor but I play one
>> on television' types ;-).
> Thanks for today's script... my agent said it was in the mail.

This in no way stands out from the piles of mis-information
posted every day but the dupmethod or dupin have no impact for
matching listings for the title. They only determine which of
those matching showings are marked P, R, E, L or r.

>> Your point is that the title for the movie and series both matched
>> even though there is a seriesid. The implication is that it should
>> just match the showings by seriesid instead. David Engel and I both
>> assumed this would be a good thing too and started to work on it but
>> after searching for test cases, it turns out this isn't a good idea.
> Is it not possible for the scheduler to make use of the _type_ of the
> show? From the information Allan gave, one is a Series, the other a
> Movie, irrespective of the seriesid. I know this information is
> clearly grabber specific, but if the scheduler picks up a possible
> hit, is it not worth at least checking to see whether the types match
> (if present)?

mysql> select count(category_type) as hits,category_type,title from program where title='College Basketball' group by category_type;
| hits | category_type | title              |
|   31 | series        | College Basketball |
|  110 | sports        | College Basketball |
2 rows in set (0.00 sec)

mysql> select count(category_type) as hits,category_type,title from program where title='College Basketball' and subtitle like '%Oklahoma%' group by category_type;
| hits | category_type | title              |
|    2 | series        | College Basketball |
|    7 | sports        | College Basketball |
2 rows in set (0.00 sec)

So does it take two rules to record Oklahoma basketball games?
Does the user have to look at the program details for every
showing to determine what will be a match? Does there need to
be another obscure option for the user decide if the rule should
or shouldn't limit by type? Is it reasonable to expect the average
user to understand the use and impact of such an option?

The point I was hoping to make is that this is a 'grass is
greener on the other side of the hill' or 'jumping out of the
pan and into the fire' kind of a situation. While it is easy
to point out that a movie title could match the same title as
series for a kAllRecord, a reality check is that in the past
four years, this has come up twice for me. I've spent a total
of about 10 seconds marking them "Never Record" (it has taken
more time to type this sentence than to fix these). Missing
one basketball game would be a thousand fold worse than the

Matching by category_type doesn't address the much bigger problem
of title strings changing which seriesid would address.

So to answer your question, no, it would not be worth at least
checking to see whether the types match (if present). This would
mean checking two fields for every showing for every rule on
every matching query and the result would be more gotchas,
exceptions and ambiguous results not less.

> Using the uk_rt XMLTV grabber, all of my program entries have a
> seriesid, even movies. This is where category="film" and

Hum. Movies aren't serial. This is probably the result of some
misunderstanding. AFAIK no one contacted me or any other dev
about how to best use these fields for the sake of searching
and scheduling in myth.

> category_type="movie". The programid is the seriesid prepended with
> "MV", and only movies have a programid. 

Checking for 'programid LIKE 'MV%' was a quick optimization for
DataDirect once we first had programids and this looks like a
smart way to piggyback this. However, category_type is more
consistent and doesn't involve a wildcard to find a match. I've
always disliked the name "category_type" and the these are
hard coded lower case strings in the English language. In the
future I expect that I'll pick another field name for an enum
int such as

0 = ptTVShow
1 = ptMovie
2 = ptSeries
3 = ptSports

so 'if (proginfo->program_type == ptMovie)' or whatever.

> The showtype field is unused
> for all program entries.

This was added along with a bunch of other DD specific fields.
This is informational and not used for searching or scheduling.

mysql> select count(showtype),showtype from program group by showtype;
| count(showtype) | showtype         |
|            7883 |                  |
|            1824 | Limited Series   |
|              39 | Miniseries       |
|            3625 | Paid Programming |
|               7 | Serial           |
|           19353 | Series           |
|              97 | Short Film       |
|            1681 | Special          |
8 rows in set (0.13 sec)

> After the fun of setting up MythTV, I think anyone would take a Type I
> error over a Type II error any day of the week! 

Well, nothing in your message was labeled Type I or II nor is
there a clear compare and contrast so there is no way to know
what you are talking about but I'm pretty sure that an exclamation
point doesn't elevate an assumption to become a fact. If you were
assuming that testing the category_type also would be a cure all
for all ills, this hypothesis is incorrect.

> The scheduler seems to
> cope very admirably as it is, so thanks for the hard work.

I always want to make sure that it is clear that David Engel did
all the heavy lifting in the scheduler. My part is more to leverage,
expose and exploit the capabilities.

The reason that I posted a long message on seriesid is because 
I don't think there has ever been a posting in the archives
explaining why we decided to not use them (I kind of anticipate
that Mike Dean will post links to that message when someone asks
about seriesids =).

--  bjm

More information about the mythtv-users mailing list