[mythtv] HDTV patches

Fri May 21 17:37:41 EDT 2004

I've uploaded the latest set of patches:
  http://www.mrl.nyu.edu/~danielk/mythtv/all-patches-v2.tbz

Some of them are actually empty because they've been applied to CVS. I
generate these automatically and will have to manually remove the empty
ones. The interesting one is the xvmc-efficiency patch. Issac had
trouble with the last one using a GeForce 4 without IDCT acceleration. I
may have fixed that bug, but I can't test it myself. If someone has a
GeForce 4 Ti and wants to help, just try playing an MPEG-2 video with
this patch and  XvMC enabled and tell me how it looks. There are several
different paths though the code depending on the hardware support
available. I can only test a subset of these with my GeForceFX 5200
card. I may try splitting up this patch further, if this patch doesn't
fix the problem. I've also attached a detailed description of this
patch to this e-mail...

-- Daniel

On Fri, 21 May 2004, Daniel Thor Kristjansson wrote:

]
]Thanks. I'm just working on documenting xvmc-efficiency, I think I may
]know what your problem is. I think I botched handling cards that want
]the idct information in unsigned format, I think the MC vs IDCT is a
]Red Herring, I can disable IDCT and everything works fine. I can't
]disable the signed idct format, so that part is untested by me.
]
]The hdtv-recording patch is not near prime-time, I didn't mean for
]that to be applied. I'm telling people to use the hdtv-recording-v6
]patch instead of the more recent ones. But I don't think that's good
]enough for CVS either. hdtv-recording is basically my attempt to parse
]all the data being transmitted, there are several bugs I know about and
]sure ones I don't in there.
]
]-- Daniel
]
]On Thu, 20 May 2004, Isaac Richards wrote:
]
]]On Wednesday 19 May 2004 06:38 pm, Daniel Thor Kristjansson wrote:
]]>  The whole set of the latest patches is available here:
]]>   http://www.mrl.nyu.edu/~danielk/mythtv/all-patches-v1.tbz
]]>
]]> When I apply all of them, I do it in this order: xvmc-deinterlace,
]]> xvmc-idct, hdtv-signalcheck, hdtv-recorder, xvmc-efficiency. Of the
]]> non-patch files, XvMCSurfaceTypes.h is required for xvmc-idct, and the
]]> rest are used by the recorder patch; they all go in libs/libmythtv.
]]
]]I've applied all but hdtv-recorder and xvmc-efficiency.
]]
]]hdtv-recorder seems, well, messy.  There's quite a lot of completely unchecked
]]char * access going on in the new files, and I don't really want to have to
]]fix bugs in that later on.. =)
]]
]]xvmc-efficiency is the cause of the horrible video quality I mentioned
]]earlier.  Without the patch, xvmc decoded video is full of corruption.  I do
]]have a card that doesn't do IDCT accel, so it seems like you've broken the MC
]]only case.
]]
]]Isaac
]]_______________________________________________
]]mythtv-dev mailing list
]]mythtv-dev at mythtv.org
]]http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-dev
]]
]
-------------- next part --------------

The xvmc-efficiency patch is made up of modifications to two
files: videoout_xvmc.cpp and xvmcvideo.c. 

There are two fairly simple changes to VideoOutputXvMC. The first
simply increases kNumBuffers from 7 to 8. This gives us one more 
XvMC buffer, which is generally good. I also don't know of any
XvMC implementation that supports fewer than 8 buffers. The second
change is to make replace some XvMCSyncSurface calls with
XvMCFlushSurface calls. This lets the hardware decoder work in
parallel with the ffmpeg portion of the decoding. A better 
solution would seperate passing buffers to the video card and
and creating those buffers into independent threads. But that 
would introduce much more complexity than this patch does.

The changes to xvmcvideo.c are a little more extensive. First
I should explain that this file must implement X functions:
XVMC_field_start, XVMC_init_block, XVMC_decode_mb, 
XVMC_pack_pblocks, and XVMC_field_end. I'll describe each of
these in turn and what changes I made.

XVMC_field_start simply fills in some required information in
the render state. Most of the changes I've made to this are 
already in CVS. But I added a couple of asserts that should 
always pass and simplified the setting of flags. The 
simplification of flag setting isn't really important here as
this only gets called once per field, but I did it throughout
the whole file as it does provide speedup especially in 
XVMC_decode_mb.

XVMC_init_block gets the XvMC render state from the 
ffmpeg contex private data, and fetches a the buffer for that
we will be filling with XVMC_decode_mb. I simplified this
a little by creating a seperate function getRenderState() to
fetch the render state and verify that it is valid. I also use 
this function whenever I need the render state in those other 
XVMC functions.

XVMC_decode_mb is at the heart of the MPEG decoding process. It
converts ffmpeg's mpeg format into xvmc's mpeg format, and performs
the IDCT if you do not have IDCT acceleration. The function in
CVS does this with one massive function with a long chain of if's
and switch statements. I took the common parts and placed them in
two functions setupContext() and getMacroblock(). It also splits
out handleIBlock() and handlePBBlock(), along with 
setupPMVforFramePrediction() and setupPMVforFieldPrediction().
These are not functional splits in the old XVMC_decode_mb, but
the ones that made the most sense.

Of the two common functions, setupContext() checks to see if the 
context is valid. If it is it also fixes the context if it has 
been accidentally set to skip blocks, which XVMC doesn't like.
It also sets the quantization table, which we don't strictly 
need but isn't expensive and will avoid one hard to find bug if
XvMC is extended to allow post processing. (The new nvidia cards
supposedly can do this and MPEG-4, but this isn't exposed yet in
XvMC.) The other function getMacroBlock() fetches a macroblock
from the XvMC render state and sets it up with basics like the
location and prediction type (field or frame). Note: field or 
frame denotes the encoding but not whether the image is truely
interlaced. If it's interlaced you want to use field encoding 
for the best results, but MPEG-1 encoders do not support it and
I've seen broadcasts that used frame prediction types for
interlaced data.

On the block specific path Intra Blocks are simpler. The old code
calculated the block count set the block pattern and rearanged the
blocks for output with multiple tests to see if it was an intra
block throughout the one monster function. By taking seperating
intra blocks from the P and B blocks we only need to special
case grey images, and for whether we have IDCT acceleration.
grey scale images have no chroma(color), but the do have a
chroma block we need to clear. We then need to pack the blocks
if the card requires packed blocks and we need to convert the 
the block if the video card requires signed blocks. If the card
has IDCT acceleration we use rearrangeBlock for this, and if it
doesn't we use rearrangeBlockIDCT for this. The first one is more
optimised because if we have to do an IDCT any optimization here
isn't going to have any effect so I opted for better readability.

Note: Intra blocks are like little 8x8 JPEG images, the IDCT is 
the inverse discrete cosine transform, which is like the inverse
fourier transform except without the imaginary numbers.

Prediction blocks just move around existing data on the screen.
Both B and P frames contain these, but they include different 
subsets of them and P frames also contain Intra blocks. 
Different prediction blocks are also needed depending on 
whether the current frame is field or frame predicted. 
XVMC_field_start told us which in the picture_structure, 
so we use that to tell us whether to call 
setupPMVforFramePrediction() or setupPMVforFieldPrediction()
These set up themotion vector block and set the macroblock 
type for the hardware decoding. I've made a few changes to
the code here, one is to seperate out these two instead
of just having an if (picture_structure==..) within each
case block. Unfortunately, this makes it a little harder to
see what exactly I changed. I also found the common parts
for both field and frame prediction types and put them in
their own function setupPMV(). Finally, I handle two error
conditions, a field motion vector in a frame encoded frame
and an MPEG-4 motion vector. The with the first it tries to
do the right thing, and also prints out a warning. The 8x8
vector it just prints out a warning. The first error condition
I've seen in once in broadcast data, and the second will 
theoretically allow you to play an MPEG-4 stream, with some
lost data of course, with XvMC acceleration. All of the
prediction blocks must be either counted or copied like 
I Blocks and this is basically the same as the I Blocks
procedure I explained except that there are never any unsigned
blocks.

The important thing is that I've rewritten the prediction block
handling so that it produces better code, especially if you
compile for a CPU with the cmov instruction at -O2 or better.
This will eliminate a lot of branching to the point where
CPU usage becomes very low for playback. You should also know
that the libffmpeg.pro sets it's own compile flags based on 
whether you compile for release or debug in settings.pro.
So to get the benefit you either need to edit libffmpeg.pro
or compile for release. I didn't change libffmpeg.pro since
end users will probably be using packages compiled for release
anyway and the developers will assume that compiling for debug
gives a fully debugable executable.

XVMC_pack_pblocks is used to set pblocks pointers in ffmpeg's
context to the data blocks. This is done based on a bitmap 
created by calc_cbp. I've made this slightly more efficient.
This is used by cards that allow reordering, such as the 
nvidia fx, and is then called as often as XVMC_decode_mb.
I also optimized calc_bp which is called just as often. With
broadcasted streams the chroma is never 422 or 444, these are
professional and studio respectively. But we do continue to
support these in xvmcvideo. However MythTV would need some
minor changes elsewhere to support these with XvMC. These
may be needed if we start to use MPEG streams from other
sources, like archive.org, or want to use mythtv as our MPEG-2
player for MythVideo.

Finally, XVMC_field_end is pretty much the same, it's only
slightly simplified by using getRenderState() as described
in the XVMC_field_start.

So to reiterate, XVMC_pack_pblocks, calc_cbp and XVMC_decode_mb
are the most performance critical and have been rewritten with
that in mind. The code has also been simplified and modularized
with functions such as getRenderState(), setupContext(), etc.
A few more asserts have been added based on my reading of
the XvMC v1.0 spec, and error reporting support has been added
for two types of illegal motion vectors, a 16x8 in a frame 
encoded frame and MPEG-4 8x8 vectors.