[mythtv] draft logo detection algorithm

Sun Oct 9 13:04:52 UTC 2005

Heya,

So I'm doing some more work on logo detection, and figured I'd bounce 
some ideas on this list, wondering if people see any gaping holes in 
what I'm thinking of implementing, or see things that could be done smarter.

Logo detection can be split up into a two subproblems:
1) Finding a logo
2) Testing if that logo is present in some frame.

In my tests, problem #1 turns out to be a lot easier than problem #2.
Using a simple sobelfilter is able to find where a logo is in arbitrary 
video quite easily. It performs very poorly at problem #2 however.

Here's my draft algorithm to solve problem #2. I'm assuming every logo 
gets keyed ontop of the content (finalpixel = logopixel * alpha + 
contentpixel * (1-alpha)). Every logo I've ever seen works like this. I 
have yet to see stations do additive, darker, lighten (fill in any 
photoshop/gimp layer effect here) logos.

After having found a logo I want to determine the logopixels, and their 
alpha channels. This isn't straightforward, but I think we could get close.

In order for the algorithm to work, its going to wait with finding a 
logo untill there is a black frame. If there is no black frame during 
the recording with the logo on it, this algorithm will fail. (I've 
checked my recordings, and couldn't find any that didn't have a quick 
fade to black somewhere, where the logo was still onscreen).

So we'll just have mythcommflag play the entire show, untill there is a 
black frame. In a black frame, we'll look for a logo. (Normally a 
sobelfilter works great for this, but since the frame is black anyway, 
doing a sobelfilter is nothing more than just checking pixelcolors).
If we find it, write down the rectangle (I'll call this logo-region from 
now).

Having found this logo, knowing it is currently being keyed onto a black 
frame, the algorithm analyzes the logopixels and alphachannels as follows.

For every pixel in the logo-region:
   (for the example, the pixel found will be rgb(200,60,110) we know 
this color must come from the logo, since the frame is black)

   * Find the pixel-component with the highest value. (red)
   * From this determine the range this pixel's alphachannel could have.
     (extremes: alpha=1.0 logopixelred=200 or   alpha=.78 , logopixelred 
= 255)

   * Knowing the possible range of the alphachannel, calculate possible 
ranges for this logopixels color components:
     (red: 200 <-> 255
      green: 60 <-> 76     (60/.78 = 76)
      blue: 110 <-> 141)   (110/.78=141)

   * Now we have a rough idea of all logo pixels, and their 
alphachannels. If we were lucky, the pixel was rather opaque, and the 
ranges are narrow. If we were unlucky, the pixel was very transparent, 
and the ranges are very wide.

   * From this rough idea, we can determine the extremes of final colors
they would produce in the image. extremes can be obtained by 'rendering' 
our rough logo onto full white, and onto full black.

(taking green as example:
FinalPixel = LogoPixel * alpha + ContentPixel * (1-alpha)
FinalPixel = 60 * 1.0 + 255  * 0 = 60        //if rendered on white
FinalPixel = 60 * 1.0 + 0    * 0 = 60        //if rendered on black
FinalPixel = 76 * .78 + 255  * (1-.78) = 131 //if rendered on white
FinalPixel = 76 * .78 + 0    * (1-.78) = 60  //if rendered on black

So looking at all extremes, this logopixel can never produce a 
framepixel-red outside of the 60<-> 131 range. If the framepixel does 
have a redvalue outside of that range, the logo cannot be present.

Repeat this for all color components of all logoregion pixels.

I'm still unsure how to best deal with noise / slightly animated logos.
I'm considering either adding some padding to the calculated ranges, and 
having a threshold of what percentage of pixels in the logoregion are 
allowed to be a false match.

   * Once we found this logo, proceed with the rest of the recording, 
looking for more logos on each black frame found. When we're done, do 
another pass on the recording, in which we'll test all found logos 
against each frame (or one frame per scenechange). The logo that gets 
matches most of the time, will be considered the channel logo we were 
looking for.

Downsides I see to all of this:
- lot of work to implement :)
- If the content does not have a black frame containing the logo it will 
fail.
- very animating logo's will be found, but won't be matched.

Upsides I see:
- Scans entire recording for a logo, instead of just at the beginning, 
and hoping its there.
- Hopefully high accuracy.

Bye, Lucas