[mythtv-users] Incorrect characters in the EPG and program name
R. G. Newbury
newbury at mandamus.org
Mon Jan 14 18:05:11 UTC 2013
On 01/14/2013 03:27 AM, Karl Dietz wrote:
> Hi Steve,
>
> replying to myself with an example of what we need to fix the guide
> until the broadcaster fixes their SI generator. (While creating the
> example I noticed that the broadcaster has fixed their guide, so there
> is still hope :)
>
> ------------------------------------------------------------
> SECT-Packet: 00000032 PID: 18 (0x0012), Length: 475 (0x01db)
> Time received: Mon 2013-01-14 09:15:31.707
> ------------------------------------------------------------
> 0000: 4e f1 d8 40 2a dd 00 01 22 03 21 14 01 4e d1 75
> N..@*...".!..N.u
> 0010: db f2 07 55 00 00 50 00 81 bd 4d 2f 44 45 55 15
> ...U..P...M/DEU.
> 0020: 05 44 65 72 20 6d 61 73 6b 69 65 72 74 65 20 52
> .Der maskierte R
>
> Notice the 05 that starts the last line, this is the first byte of
> the string and signals Latin Alphabet 5 aka ISO 8859-9.
>
>
> 0030: e4 75 62 65 72 15 05 44 65 72 20 6d 61 73 6b 69
> .uber..Der maski
> 0040: 65 72 74 65 20 52 e4 75 62 65 72 4e fe 01 44 45
> erte R.uberN..DE
>
> Before the encoding was signaled I had to guess from e4 between 'R' and
> 'uber' that this must be an a-umlaut as thats how you spell Räuber.
> Looking up the various candidate encodings that map a-umlaut to e4 I
> could narrow the encoding down. By looking up more characters (and using
> other hints, like the encoding of other channels on the same multiplex)
> I could narrow it down to only one remaining candidate and added a fixup
> for that channel.
>
> Guess table from table id...
> EIT-decoding....
> Table_ID: 78 (0x4e) [= Event Information Table (EIT) - actual
> Service_ID: xxxxx (0xxxxx) [= --> refers to PMT program_number]
> Transport_stream_ID: 8707 (0x2203)
> Original_network_ID: 8468 (0x2114) [= German Digital Terrestrial
>
> These three values are used to identify the service. With a DVB-C
> provider that inserts its own guide its possible that keying off just
> the Original_network_ID is enough.
>
>
> ISO639_2_language_code: DEU
> event_name_length: 21 (0x15)
> event_name: "Der maskierte R?uber" -- Charset: Latin alphabet no. 5
> text_length: 21 (0x15)
> text_char: "Der maskierte R?uber" -- Charset: Latin alphabet no. 5
>
> Here it was saying just "Latin alphabet" before they fixed it.
There are a couple of places where locale setting can go wrong.
The environment can be different for various users, *including the
backend* vis-a-vis the user. It should be the same throughout.
For mythtv the locale setting should be 'global'. Ensure that the locale
is properly set in /etc/profile and repeated in /home/mythtv/bashrc or
~/.bashrc.
A quick way to discover the coding of an unknown text is to force the
locale for a console window, and use vim (or nano etc) to review the text.
Add something like the following, choosing a locale you *think* might be
the coding of the text file,
export LANG=en_US.utf-8
export LANGUAGE=en_US
export LC_ALL=en_US.utf-8
export LC_COLLATE=C
export LC_CTYPE=en_US.UFT-8
to your ~/.bashrc, close all console windows then open one and 'vim
funny-text. If it is displayed properly, then you know what the
broadcaster is sending. Adding the proper locale to /etc/profile and
~/.bashrc and rebooting *may* fix this problem.
The program iconv will convert files from one charset to another, but is
probably of no help within mythtv.
Geoff
More information about the mythtv-users
mailing list