[mythtv-users] Incorrect characters in the EPG and program name

R. G. Newbury newbury at mandamus.org
Mon Jan 14 18:05:11 UTC 2013


On 01/14/2013 03:27 AM, Karl Dietz wrote:
> Hi Steve,
>
> replying to myself with an example of what we need to fix the guide
> until the broadcaster fixes their SI generator. (While creating the
> example I noticed that the broadcaster has fixed their guide, so there
> is still hope :)
>
> ------------------------------------------------------------
> SECT-Packet: 00000032   PID: 18 (0x0012), Length: 475 (0x01db)
> Time received: Mon 2013-01-14  09:15:31.707
> ------------------------------------------------------------
>    0000:  4e f1 d8 40 2a dd 00 01  22 03 21 14 01 4e d1 75
>           N..@*...".!..N.u
>    0010:  db f2 07 55 00 00 50 00  81 bd 4d 2f 44 45 55 15
>           ...U..P...M/DEU.
>    0020:  05 44 65 72 20 6d 61 73  6b 69 65 72 74 65 20 52
>           .Der maskierte R
>
> Notice the 05 that starts the last line, this is the first byte of
> the string and signals Latin Alphabet 5 aka ISO 8859-9.
>
>
>    0030:  e4 75 62 65 72 15 05 44  65 72 20 6d 61 73 6b 69
>           .uber..Der maski
>    0040:  65 72 74 65 20 52 e4 75  62 65 72 4e fe 01 44 45
>           erte R.uberN..DE
>
> Before the encoding was signaled I had to guess from e4 between 'R' and
> 'uber' that this must be an a-umlaut as thats how you spell Räuber.
> Looking up the various candidate encodings that map a-umlaut to e4 I
> could narrow the encoding down. By looking up more characters (and using
> other hints, like the encoding of other channels on the same multiplex)
> I could narrow it down to only one remaining candidate and added a fixup
> for that channel.
>
> Guess table from table id...
> EIT-decoding....
> Table_ID: 78 (0x4e)  [= Event Information Table (EIT) - actual
> Service_ID: xxxxx (0xxxxx)  [=  --> refers to PMT program_number]
> Transport_stream_ID: 8707 (0x2203)
> Original_network_ID: 8468 (0x2114)  [= German Digital Terrestrial
>
> These three values are used to identify the service. With a DVB-C
> provider that inserts its own guide its possible that keying off just
> the Original_network_ID is enough.
>
>
>      ISO639_2_language_code:  DEU
>    event_name_length: 21 (0x15)
>    event_name: "Der maskierte R?uber" -- Charset: Latin alphabet no. 5
>    text_length: 21 (0x15)
>    text_char: "Der maskierte R?uber"  -- Charset: Latin alphabet no. 5
>
> Here it was saying just "Latin alphabet" before they fixed it.

There are a couple of places where locale setting can go wrong.
The environment can be different for various users, *including the 
backend* vis-a-vis the user. It should be the same throughout.

For mythtv the locale setting should be 'global'. Ensure that the locale 
is properly set in /etc/profile and repeated in /home/mythtv/bashrc or 
~/.bashrc.

A quick way to discover the coding of an unknown text is to force the 
locale for a console window, and use vim (or nano etc) to review the text.

Add something like the following, choosing a locale you *think* might be 
the coding of the text file,
export LANG=en_US.utf-8
export LANGUAGE=en_US
export LC_ALL=en_US.utf-8
export LC_COLLATE=C
export LC_CTYPE=en_US.UFT-8

to your ~/.bashrc, close all console windows then open one and 'vim 
funny-text. If it is displayed properly, then you know what the 
broadcaster is sending. Adding the proper locale to /etc/profile and 
~/.bashrc and rebooting *may* fix this problem.
The program iconv will convert files from one charset to another, but is 
probably of no help within mythtv.

Geoff














More information about the mythtv-users mailing list