[mythtv-users] commercial flagging questions
Michael T. Dean
mtdean at thirdcontact.com
Thu Mar 27 04:09:41 UTC 2008
On 03/26/2008 11:35 PM, Paul wrote:
> On Wed, Mar 26, 2008 at 8:56 PM, Chris Pinkham <cpinkham at bc2va.org> wrote:
>
>> Since people keep suggesting things to work on
>> for the flagger in Myth, I'll say that given the current code design and
>> where the most bang for the buck is, I'd recommend looking at the closed
>> captions and the silence detection. Just thought I'd throw that in
>> since this thread may invariably turn into how to improve Myth's
>> flagger.
>>
> I've actually experimented with trying to use close captions to help
> commercial flagging, though i did it with the experimental flagger,
> not the classical one. It was more a proof of concept thing so to
> start with i used ccextractor to extract the closed captions and then
> attempted to match the closed captions up to frames based on the
> timestamps. Then i tried to apply the same Bayesian type filtering
> used in spam detection. It worked horribly, so i tried different ways
> of tokenizing the captions. Still horrible. Now there are many things
> i could have been doing wrong, possibly not getting the captions lined
> up to the correct frames, or not tokenizing well, but as i looked over
> my data it seemed that there were no words that were good indicators
> of commercials. Now it is possible someone else could use them to
> help, but i though i would share my experience.
Which, I think all boils down to: CC doesn't work so well (at least, in
the US) because anymore, many (most?) commercials now have captions,
too. I'm guessing the implementation in comskip was done at a time when
assuming the show had CC and the commercials didn't was a valid
assumption, so it uses that simplistic approach.
Mike
More information about the mythtv-users
mailing list