[mythtv] Playback next steps

Fri Dec 14 22:09:47 UTC 2018

Mark

I wish I understood everything you are saying here. I have been making 
small changes to the OpenGL code and keeping what works, without full 
understanding of what is going on. I will add some notes below as to 
specific things that I have done or found out.

On 12/14/18 10:16 AM, MARK KENDALL wrote:
> Peter/David,
>
> I've been digging around and playing with the OpenGL, VAAPI and OpenMax code.
>
> Focusing on OpenGL for now...
>
> I forked master last week - just to keep track of patches etc. You can see what I've been doing at:
>
> https://github.com/mark-kendall/mythtv/commits/master
>
> In summary so far for OpenGL:
>
> - fixed UYVY kernel deinterlacer (I see that's already in master)
> - fixed YV12 kernel deinterlacer (pretty sure linear blend is broken as well, it looks terrible)
Linear blend looks fine to me with YV12. That is what I use. Maybe I am 
missing something.
One thing to note - using kernel on some android devices has a 
performance impact, it starts dropping frames - that does not happen 
with linear blend.
> - patched mythavtest to add double rate deinterlacing support (mythavtest is really useful for performance testing if you haven't used it before)
> - some openglbicubic fixes
> - minor improvement to the UYVY kernel deinterlacer
> - fix for desktop OpenGL ES2
>
> In the pipeline already:
>
> - add support for NPOT textures on GLES2/2.0 - should save a lot of video memory on Pi/Android etc
> - optimisations for UYVY and desktop GL2.0
> - fix use of glTexImage1D - just use 2D instead (1D not available on ES2.0)
>
> I've also started some extensive debugging/logging code for OpenGLVideo to show exactly what is happening under the hood - it's fairly invasive though.
> Does that sound useful?
It sounds useful to me. At one point I temporarily added some debug code 
to print the OpenGL shader code used. The MythTV source applies many 
dynamic changes to the shader code before sending it to OpenGL, and it 
was difficult for me to know what the code that actually ran looked like.
> While digging around and trying to get EGL and OpenGLES2.0 working properly on my system, I noticed the comment about ES2.0 and OpenMax playback - and all the subsequent ifdeffery required to disable QT5 opengl support...
>
> Not tested the theory yet, but I think the reason OpenMax fails with QT5 OpenGL/EGL is because Lawrence creates his own EGL render device for the OSD. If using eglfs, this will interfere with the existing Qt screen (I don't think you can create 2 EGL devices). The simple solution I think is to check the Qt QPA platform and disallow the EGL OSD in VideoOutputOMX if the platform is eglfs. This should allow you to remove the whole OPENGL_QT5 ifdef stuff - which would really clean things up and ensure as many people as possible actually use the ES2.0 renderer (with or without EGL).
>
> The more involved solution is to fix VideoOutputOMX. At the moment Lawrence's code effectively assumes an X11 desktop. He uses the OMXVideoRender component to put images on screen (does that even work with eglfs?) and because of the approach has to handle all sorts of windowing issues/masks etc. He then doesn't like the softblend osd:) so creates an additional render device to display on top of the video.
Some of this I did. Lawrence left the project abruptly and I got the 
Raspberry Pi code to where it would compile and run under X11. His code 
originally was for full screen OpenGL ES and required a customized QT 
build. Some people did not like the softblend so I did some strange 
stuff with the OpenGL ES OSD. What we have now is x11 based QT 
displaying the GUI in a X11 window, OpenMAX video displaying the 
playback using the full-screen OpenMAX API, and OpenGL ES displaying the 
OSD using the OpenGL ES full-screen API. For OpenMAX and OpenGL it does 
some calculations to put the video and the OSD into the correct place to 
position it on the QT window, to give the illusion that it is actually 
all in a window.

In versions of Raspberry Pi Raspbian starting from 2018, using the 
OpenGL ES OSD causes severe slowdowns in the video playback, even if 
nothing is visible in the OSD, so it has become useless and we reverted 
to the softblend OSD.

Piotr O has his own build of Raspberry Pi Mythfrontend which uses 
full-screen QT specially built for the purpose. I don't know how that 
all works.
> A relatively simple solution is:
> - for egl/fs, create VideoOutputOMXEGL (prob a sub-class of VideoOutputOpenGL) and replace the OMXVideoRender component with the Broadcom specific egl_render. EGL images transferred direct to screen and regular OpenGL OSD thrown in for free.
> - for X11/desktop, I would actually remove the MythRenderEGL code and if they don't like the softblend osd, encourage them to use EGL...
We had thought to switch to the new "experimental" OpenGL for the 
raspberry pi, which is an OpenGL implementation in X11 with gpu 
acceleration. I compiled MyhthTV with suitable config settings to work 
this way, and it was able to play video through OpenGL, then suddenly it 
stopped working and would segfault every time I started it up. I never 
figured out what was happening, the segfault was in QT event handling 
right after initializing OpenGL.
> There is also some broadcom specific code that is not properly ifdef'd out.
>
> If I get the chance, I'm going to have a play with QT5/eglfs/OpenMax over Christmas.
>
> Back to OpenGL proper, having got my head around the code again, I have a better idea of what is happening in the YV12 code - and can compare it to the other options.
>
> Remember the aim of the game is to take a planar YUV420P/YV12 image in main memory and display it as a packed RGBA image on screen.
> So there are three significant operations - repacking from planar to packed, transferring to video memory and YUV to RGB conversion - and just like skinning cats, there are multiple ways of doing it.
> And remember that a YV12 image is 12bpp and full RGBA is 32bpp.
>
> The simplest fallback route is to do the entire conversion in memory - repacking and colourspace conversion (note this should never actually happen with the current code):
> CPU Load: High
> GPU Load: Low
> Memory transfer: High - 32bpp image transferred.
> Colourspace control: None (using FFmpeg)
> Availability: Always
>
> The default option is to repack the frame into a full 32bit, packed format and perform colourspace conversion in the GPU. Repacking requires some custom code - interlaced material needs special handling.
> CPU Load: Moderate with MMX support - all other platforms fall back to 'plain c'
> GPU Load: Lowish - simple 1 texture sampling and colourspace control
> Memory transfer: High - 32bpp
> Colourspace control: Full
> Availability: Always
>
> The OpenGL 'Lite' route uses custom extensions in the GPU. Taking this route the video frame is repacked into a packed UYVY422 video frame, transferred to video memory and 'magically' converted to RGBA.
> CPU Load: Moderate - repack from planar to packed.
> GPU Load: ??
> Memory transfer: Medium - image is 16bpp
> Colourspace control: None
> Availability: Variable
>
> The custom UYVY code uses the same UYVY422 packed frame format and uses a custom texture format and shaders to convert to RGBA.
> CPU Load: 'moderate' CPU load - repack
> GPU Load: Medium - the packed frame only requires 1 texture sample per pixel (no deint) but does require an extra filter stage to ensure exact 1 to 1 mapping between input and output. Any horizontal interpolation breaks sampling (because 2 pixels are encapsulated in one RGBA sample). Video memory usage is lower as frame is half width.
> Memory transfer: Medium - 16bpp
> Colourspace control: Full
> Availability: Always
Note there is an android problem with UYVY. In some devices (e.g. fire 
stick g2), OpenGL ES does not support float precision highp and defaults 
to mediump. The OpenGL code that applies the color suffers a rounding 
error and instead of each pixel getting its correct color, each 
alternate pixel gets the color for its neighbor instead, on the right 
hand half of the screen. See https://imgur.com/dLoMUau and 
https://imgur.com/lbfyEWQ . I don't know why YV12 does not suffer from 
that problem.

> The YV12 code is actually where I started about 10 years ago:) There is no repacking in main memory - the planar frame is transferred to video memory and repacked and converted to RGBA in the GPU. Sounds nice but...
> CPU Load: Low to very low..
> GPU Load: High to very high. Each output pixel requires 3 texture samples, 2 of which are non-contiguous - as the video data is still planar. For progressive content this is not too bad but deinterlacing gets ugly really quickly:) see below. Also the GLSL shader cannot use rectangular textures so requires more GPU memory - but I have a fix for that coming.
> Memory transfer: Low - 12bpp
> Colourspace control: Full
> Availability: Always
>
> Texture sampling is the most expensive operation in a GLSL shader - and accessing memory away from the current sample is usually more expensive. So it is best to minimise texture sampling and not to access texture memory 'randomly'.
Something in the OpenGL is slow enough to impact playback on fire stick 
g2. Many frames are dropped because the OpenGL seems to be taking longer 
that 1 frame interval to execute, including with progressive frames.
> With the software fallback, default, OpenGL lite and UYVY approach - there is only one, coherent texture sample for progressive content. For OpenGL deinterlacers this increases depending on the deinterlacer: linear blend makes 3 (2 non-contiguous) and kernel 8 (7 non-contiguous) - which is why it is slower.
>
> With YV12 you start with 3 texture samples for progressive - which in my testing offsets the gain from very low CPU usage and memory transfer - but for the kernel deinterlacer that increases to 24 texture samples (21 non-contiguous).
>
> ... and that is why I tried to find an alternative. It's fine for progressive content but deinterlacing performance just gets worse and worse.
>
> I settled on the UYVY code - it balances its 'performance' between CPU, memory transfer and GPU.
>
> In summary:
> software fallback - why bother unless you have a modern CPU and a 15 year old GPU.
Some reasons for using software decode
- VDPAU has a bug with decoding MPEG2 that results in pixellation on 
many USA stations.
- fire stick 2g mediacodec has a bug where deinterlaced content causes 
the decoder to hang.
- Subtitles are not working for some decoders. They work with software 
decoding, for those people who need them.
> default - custom packing code may not be efficient on non X86 architecture and large memory transfer
> opengl-lite - nice if available but colour rendition not great.
> UYVY - simple repacking, smaller memory transfer and lower GPU texturing.
> YV12 - low CPU (straight copy), smallest memory transfer but worse to terrible GPU texturing.
Except on fire tv 2g and the like where UYVY is terrible and YV12 seems 
fine.
> The code could probably try and make some assumptions about the best route to take depending on reported driver/hardware and compile type. e.g. Intel desktop and Pi have shared CPU/GPU memory so memory transfers probably aren't a bottleneck. A more powerful dedicated video card proably won't blink at the sampling required for YV12. At the end of the day, however, there is no right or wrong solution - as long as it works!
>
> Again, hopefully this is helpful. Any questions, just ask.
>
> Regards
> Mark
>
> P.S. Probably worth mentioning that I don't really think the code needs both UYVY and YV12 - and unsurprisingly I would suggest ditching YV12. At the same time the OpenGL code could be simplified greatly by removing OpenGL1 support - I'd be amazed if anyone is actually still using it.
>
>

Peter