[mythtv-users] Problems with detecting duplicate recordings due to subtle detail changes

Ant Daniel antdaniel at gmail.com
Wed Jan 11 14:28:52 UTC 2006


Hi,

Below is a few examples of shows which are flagged in my scheduler to
be recorded even though their the same episode.

1)
Title: Lost
Episode: 24/25 - Exodus 2/3
Description: Drama series following the survivors of a plane crash who
are forced to live with each other on a remote island, a dangerous
world with many new threats. The column of smoke continues to billow
from deep in the jungle, and the survivors need to get to safety.
Claire must deal with the fact that the Others are after her baby.

Title: Lost
Episode: Exodus Part 2
Description: Drama series following the survivors of a plane crash who
are forced to live with each other on a remote island, a dangerous
world with many new threats. The column of smoke continues to billow
from deep in the jungle, and the survivors need to get to safety.
Claire must deal with the fact that the Others are after her baby.

In this case the obvious problem is the Episode. Throughout the series
the main difference here has been the nn/nn part in the Title.

2)
Title: Star Trek: Enterprise
Episode: Broken Bow
Description: Prequel `Star Trek' series, set 100 years before the time
of Captain Kirk, when interstellar travel is in its infancy. In the
pilot episode, the Enterprise sets off on her maiden voyage to return
a wounded Klingon to his people.

Title: Star Trek: Enterprise
Episode: Broken Bow
Description: Prequel Star Trek series, set 100 years before the time
of Captain Kirk, when interstellar travel is in its infancy. In this
pilot episode, the Enterprise sets off on her maiden voyage to return
a wounded Klingon to his people.

A very subtle change with the quotes around the Star Trek.

Other changes I remember seeing include the gramatical "which" ->
"that" change that Word always seems to highlight, and (Reviewed by: A
N Other) tagged on the end.

While I know I could change the matching categories to just
description for Lost and Episode for Enterprise, I'm not convinced
that this will get all episodes. Sometimes we just get a duplicate
description, for example.

Is it possible to produce a matching algorithm that does this on a
'fuzzy' basis, so that if the description is 98% (or more, maybe with
an Episode name identical or also very similar) then the scheduler can
flag it as a duplicate?

Before I go hunting through google to find such an algorithm, how
would you react to such an idea?

Regards,
Ant.


More information about the mythtv-users mailing list