[mythtv-users] Duplicate detection

Thu Sep 22 14:26:35 UTC 2016

On Tue, Sep 20, 2016 at 01:05:25PM -0400, Michael T. Dean wrote:
> On 09/20/2016 12:36 PM, Jan Ceuleers wrote:
> > On 20/09/16 17:14, Michael T. Dean wrote:
> > > > IIUC this partitions duplicate matching, such that duplicates would be
> > > > found for repeats on channels whose metadata comes from the same source,
> > > > but still not for repeats that span listings data sources. In order to
> > > > achieve that I believe (but do correct me if I'm wrong) that I need to
> > > > continue erasing the programids.
> > > No, it means the program ID is used for dup matching when both programs
> > > contain program IDs from the same authority and the rule-specified
> > > duplicate-matching method is used otherwise.
> > Yes, exactly. We're on the same page. I said what I said because the
> > rule-specified method doesn't work since it disregards empty subtitles,
> > rather than accepting an empty subtitle as something that should be
> > matched with another empty subtitle.
> 
> Well, since you've already determined that dup matching won't work for this
> specific situation--regardless of whether you have scrubbed the program
> IDs--removing program IDs isn't helping.  If you stop removing program IDs,
> you'll get valid dup matching when you have showings from the same program
> ID provider that the program you previously recorded used.  Otherwise, your
> rule-specified method will be used and (assuming you choose "subtitle"
> method) it will be treated as a generic (meaning it will be recorded).
> You're no worse off than you are now, and you're better off when the repeat
> is on the same program ID source as the original recording.
> 
> However, for all other "proper" programs--where there is something that can
> be used for dup matching--it will just work.  The program ID will be used
> when available and when matching authorities are specified, otherwise, the
> method your rule specifies will be used.
> 
> Currently, by scrubbing out all program IDs, you can only ever use your
> rule-specified method--i.e. the fallback that would have been used after the
> program IDs were found to come from different authorities.  So, really,
> scrubbing them isn't helping; it's only making it always use the fallback.
> 
> > I had another thought: a duplicate-matching method based on the inetref
> > field. This wouldn't find defects until the metadata has been retrieved,
> > of course,
> 
> Right--and does require a lot of hits against a metadata source (there are a
> lot of episodes in people's program listings and they're replaced a
> lot--daily for about 2 weeks, usually--causing re-retrievals).  This might
> even be so many hits we may not want to encourage it.
> 
> > and it relies on there being a history of inetrefs employing
> > the current format (i.e. not just the number but also the tmdb3.py_ or
> > ttvdb.py_ prefix). Furthermore, it breaks if a new metadata source is
> > introduced in the future.
> 
> Well, the program ID authorities would fix all of that.
> 
> > The latter weakness could be addressed by updating the inetref in
> > oldrecorded after the fact.
> > 
> > Just a thought - this would require a code change. Not sure I'm up to
> > that but once I upgrade to 0.28 I could give it a go.
> > 
> > I could test-drive the concept by a one-time:
> > 
> > update oldrecorded set subtitle=inetref where length(subtitle)=0;
> > 
> > and a daily
> > 
> > update program set subtitle=inetref where length(subtitle)=0;
> > 
> > I can then continue to use the subtitle duplicate-matching method; it'll
> > just be ugly in the user interface.
> > 
> > Another possibility is to regard the special treatment of empty
> > subtitles as a bug,
> 
> Well, it's actually a designed-in feature.  If you say the subtitle
> distinguishes episodes and there is no subtitle, there is no way to
> distinguish which episode it is, so we have to assume it could be one you
> haven't seen, so we record it since you can ALWAYS delete after something is
> recorded, but you can't (at least I haven't found a way to) go back and
> record something after it airs because you later find out you hadn't seen
> that episode.
> 
> >   and to remove that special treatment. This might
> > cause a regression for people who rely on that (probably long-standing)
> > behaviour though.
> > 
> 
> The easiest generally-good approach for this specific issue--the movie
> rule--is the title-only dup matching method.  Again, this might be
> considered if someone went to the trouble of coding it, but no one has yet
> felt sufficient need to actually do the work.  It won't distinguish between
> Ben-Hur's 1959, and 2016 releases***, but if you record one and decide you
> want the other, you could always create a specific Ben Hur rule to catch it.

I'm definitely not in favor of an an entirely new dupmethod.  After a
quick glance at the code, however, it looks like a title only mode
might "just work" with mininal changes if the dupmethod is allowed to
be set to 0.

David
-- 
David Engel
david at istwok.net