[mythtv-users] Issue with accents in titles and descriptions: Myth or Schedules Direct?

Bill Meek keemllib at gmail.com
Mon Apr 6 19:25:16 UTC 2015


On 04/06/2015 02:04 PM, Nick Morrott wrote:
...
 > I've seen similar issues in the past in the upstream listings data for
> the XMLTV/tv_grab_uk_rt grabber. I did a lot of work to both find and
> then correct such errors whenever they were spotted in the source data
> so that end-users were not affected.
>
> In my uk_rt experience, bad upstream listings data itself was the culprit.

Here's what I captured with Wireshark as the data came in to
my BE:

Good: Con Sabor a Per\303\272         which is: 0x c3 ba       or: ú
Bad:  Con Sabor a Per\303\203\302\272 which is: 0x c3 83 c2 ba or: Ã º

(and for RobertK's question, it's with the SD-DD feed, not JSON. It's the
  7th bullet here: http://forums.schedulesdirect.org/viewtopic.php?f=3&p=8335&sid=c27b784bbf5571a1017e3e5fe30e984d#p8335 )

> Instead of having the relevant 'extended ASCII' character (or higher
> Unicode codepoint) correctly encoded as 2 (or more) bytes in the UTF-8
> output, each of the bytes of the UTF-8 representation of the character
> had been encoded into UTF-8 again. Reversing this 'double UTF-8
> encoding' of the bad characters produced the intended output.

> The end result of this double-encoding issue was that the UTF-8 data
> contained some quite 'regular' characters (in the UK data's case,
> mostly French, German and Scandinavian characters) encoded with 4 (or
> sometimes 6) bytes.

Is that the double encoding you're referring to?

-- 
Bill


More information about the mythtv-users mailing list