[mythtv-users] Concert footage and video artifacts

Fri Nov 18 10:37:22 UTC 2016

Jay Foster <jayf0ster at roadrunner.com> wrote:

> That is compression artifacts.  Usually due to insufficient bandwidth to encode the video.  This is typically introduced by the local broadcaster, most of which choose quantity (i.e., more subchannels) over quality (i.e., a higher bit rate).

It may be worth expanding a bit on why that happens, for those that aren't familiar with the mechanics. The below is "somewhat simplified", but should give an idea of what the process is and why these artifacts happen. Feel free to pick out any errors - I'm no expert, and I certainly can't follow the maths !

The video is encoded with what's called a discrete cosine transformation (DCT) https://en.wikipedia.org/wiki/Discrete_cosine_transform along with other tricks like motion compensation and frame differences.

To start with, for a video of a static image, nothing much changes from frame to frame - so you start by taking the differences between frames to give a massive reduction in required information. So for example, if there is a fixed camera pointing at a football/rugby field, then there are large areas of green that don't change much from frame to frame - with smaller areas where players are running around and creating changes. Similarly, at the awards ceremony, there's a lot of static scenery with one or two people moving around a bit.

If the camera pans, then the whole frame moves, but if you have the right process, it can detect this major shift, move the bits around in memory, and only have to process the new stuff that's come in on one side. There's a similar process that can be applied to individual areas of the image - effectively detecting small areas that are moving independently of the rest of the image (such as a player running around on the field), and separately processing these.

So by taking inter-frame deltas and applying motion detection you massively reduce the amount of video information you need to compress.

What's left is now compressed, block by block, typically by DCT or some variation of it. If you look at that Wikipedia page, there's a lot of maths, but down at the bottom there's an animation showing how a letter A is encoded by successively finer cosine functions - starting with a flat grey block equivalent to the average brightness of the block, then 1 and 2 dimensional gradients, and so on adding detail.

The key thing here is that you can keep adding more and more detail - but you need to transmit more values to do it. If bandwidth constrained, then you omit sending the finer detail and accept that the decoded result will not be as faithful to the original. As above, broadcasters have to make a tradeoff between quality and bandwidth - if they use less bandwidth they can get more logical channels into one carrier, and thus get more programs in front of more eyes.

So lets go back to the original situation mentioned - flashing lights, rapid scene lighting changes etc.
There is the encoder happily doing it's bandwidth reduction thing - and then the lighting on the whole scene changes. Instead of most of the image not changing much from frame to frame, the whole thing has changed massively - so there is suddenly a very large quantity of change to encode. Given that the encoder is bandwidth constrained, it can only do it's best - and that means that for the first few frames after the change it cannot include much detail in each DCT block. The result is that the viewer gets to see the image made up of block of fairly "flat" video which quickly regain their detail. Assuming the lighting change is a one off (so the inter frame deltas are close to zero again), over the next few frames, the encoder can catch up - adding more and more detail with each frame.

For something like the ball game I gave as an example, it also explains why running players can appear to be surrounded by a "halo" of blockiness. There's a lot of change around the player - on one side the image is changing from green to (say) white, and on the other it's changing back to green. So in a bandwidth constrained system, the moving player leaves a trail of incomplete rendition until such time as the encoder finds the bandwidth to catch up - or there's a key frame (see below).

I assume that the encoder keeps an internal representation of what a decoder should have - so it can encode the difference between what's gone before and what there should be. So all the time it's encoding not the inter frame differences it's been given, but the difference between the current frame and what the decoded stream will be showing. Thus it's adaptive, and on fairly static video will give high detail, but on rapidly changing material will degrade the detail to stay within the bandwidth limit set.

Key detail - the higher the bandwidth given to the encoder, the faster it can track gross changes.

For good measure, there are "key frames" sent periodically in the stream. If all that were transmitted were the changes, then no receiver would ever be able to start - it wouldn't have a base image to apply the changes to. Also, any interference would leave a permanent "stain" on the decoded image. So periodically, a full frame is transmitted (again using DCT compression) which allows the decoders to know what to start applying changes to.