[mythtv-users] commercial flagging questions

Thu Mar 27 04:09:41 UTC 2008

On 03/26/2008 11:35 PM, Paul wrote:
> On Wed, Mar 26, 2008 at 8:56 PM, Chris Pinkham <cpinkham at bc2va.org> wrote:
>   
>>  Since people keep suggesting things to work on
>>  for the flagger in Myth, I'll say that given the current code design and
>>  where the most bang for the buck is, I'd recommend looking at the closed
>>  captions and the silence detection.  Just thought I'd throw that in
>>  since this thread may invariably turn into how to improve Myth's
>>  flagger.
>>     
> I've actually experimented with trying to use close captions to help
> commercial flagging, though i did it with the experimental flagger,
> not the classical one. It was more a proof of concept thing so to
> start with i used ccextractor to extract the closed captions and then
> attempted to match the closed captions up to frames based on the
> timestamps. Then i tried to apply the same Bayesian type filtering
> used in spam detection. It worked horribly, so i tried different ways
> of tokenizing the captions. Still horrible. Now there are many things
> i could have been doing wrong, possibly not getting the captions lined
> up to the correct frames, or not tokenizing well, but as i looked over
> my data it seemed that there were no words that were good indicators
> of commercials.  Now it is possible someone else could use them to
> help, but i though i would share my experience.

Which, I think all boils down to: CC doesn't work so well (at least, in
the US) because anymore, many (most?) commercials now have captions,
too.  I'm guessing the implementation in comskip was done at a time when
assuming the show had CC and the commercials didn't was a valid
assumption, so it uses that simplistic approach.

Mike