[mythtv-users] Browsing UPnP "By title" + international characters = bug [PATCH]

David Kubicek foceni at gmail.com
Sun Nov 7 17:30:48 UTC 2010

On 7.11.2010 14:32, Michael T. Dean wrote:
>  On 11/06/2010 02:45 PM, David Kubicek wrote:
>> On 6.11.2010 18:24, David Kubicek wrote:
>>> On 19.6.2010 01:29, Svend Høst wrote:
>>>> A recording has been made and the title from the epg contains danish
>>>> letters "æ/ae,ø/oe and å/aa".
>>>> If i try to locate and play the recording by upnp and browsing the
>>>> upnp tree By Date, the recording is shown properly.
>>>> If i choose By Title no content is streamed to my player, so the
>>>> recording is there and it is properly recorded since i can see it
>>>> when choosing by date.
>>>>  From the log :
>>>> 2010-06-19 00:30:16.119 HTTPRequest::ProcessSOAPPayload :
>>>> "urn:schemas-upnp-org:service:ContentDirectory:1#Browse" :
>>>> 2010-06-19 00:30:16.413 UPnpCDS::HandleBrowse ObjectID=RecTv/1,
>>>> ContainerId=
>>>> 2010-06-19 00:30:16.431 UPNP Browse : Searching for : RecTv  /
>>>> ObjectID : RecTv/1
>>>> 2010-06-19 00:30:16.449 UPnpCDSTv::IsBrowseRequestForUs - Not sure...
>>>> Calling base class.
>>>> 2010-06-19 00:30:16.548 HTTPRequest::SendResponse(xml/html) () :200
>>>> OK -><>: 1
>>>> 2010-06-19 00:30:31.478 HTTPRequest::ProcessSOAPPayload :
>>>> "urn:schemas-upnp-org:service:ContentDirectory:1#Browse" :
>>>> 2010-06-19 00:30:31.566 UPnpCDS::HandleBrowse
>>>> ObjectID=RecTv/1/key=Bonder�ven retro, ContainerId=
>>>> 2010-06-19 00:30:31.586 UPNP Browse : Searching for : RecTv  /
>>>> ObjectID : RecTv/1/key=Bonder�ven retro
>>>> 2010-06-19 00:30:31.605 UPnpCDSTv::IsBrowseRequestForUs - Not sure...
>>>> Calling base class.
>>>> 2010-06-19 00:30:31.699 HTTPRequest::SendResponse(xml/html) () :200
>>>> OK -><>: 1
>>> I've had the same issue - living in the Czech Republic, many shows and
>>> movies have the Czech diacritics (CP: iso-8859-2 / win1250) in them.
>>> Until now, I had to use "By Date", which was a bit of a pain in the
>>> ass to be honest.
>>> I never looked into it before, just a bit of googling, and I haven't
>>> found anybody with the same issue, so I thought it was a local problem
>>> in my setup. Probably nobody ever reported it - except you, that is.
>>> :) About an hour ago, I noticed your email by pure chance and it
>>> helped me to see that my situation was the same: the bug with empty
>>> folders via UPnP appeared only for shows with Czech characters -- I
>>> never noticed that, thought the issue was "random".
>>> For example: "*By Title*" browsing displayed a folder "*C(erná zmie
>>> (18)*", but when I clicked it open on PS3 or in Totem on the desktop,
>>> it was *empty*. Locating the show via "*By Date*" played it without
>>> fail.
>>> So, I checked the source in my local MythTV 0.23-fixes repo and
>>> indeed, there was a bug in handling UTF8 requests. US coders didn't
>>> expect the search filters to contain UTF8 characters, so they used
>>> .toLatin1() conversions all around the place.
> Actually, they wrote code assuming that QUrl worked as described in
> the Qt documentation.  In Qt's API, URIs are UTF-8 encoded characters
> percent-encoded to ASCII (as they should be, per RFCs 3986 and 3987).
>>>   I'm attaching a simple
>>> fix, switching from .toLatin1() to .toUtf8() fixes the whole issue.
>>> Applying to upstream is as simple as "search and replace".
>>> Some of the 11 conversions don't actually need .toUtf8(), but it
>>> doesn't hurt either. Developers can apply just those parts that are
>>> required.
>> New ticket for this issue: http://svn.mythtv.org/trac/ticket/9188
> Out of curiosity, are you using a properly-configured environment:
> http://www.gossamer-threads.com/lists/mythtv/dev/439348#439348
> Svend, please try the configuration in that post, too.  It should
> allow everything to work without code changes/requiring a recompile
> and redeploy.
> Please post the output of locale /in the environment that's running
> mythbackend/.  And please try again, without your patch, with a
> properly-configured UTF-8 locale (which is the only way QUrl works as
> documented--the bug mentioned in the above)--i.e. starting mythbackend
> from a shell where you can verify the environment.  Most distros have
> start scripts that don't properly configure the environment.
> I'm pretty sure the changes in your patch will have bad effects on
> some configurations.
> If existing code works with a UTF-8 locale (and I'm almost positive it
> will), please say so on your ticket.  Your changes will definitely
> need to be tested in multiple different environments running with
> multiple different Latin-compatible and non-Latin-compatible encodings
> specified.
> If you're really interested in tracking down the issue I think may
> exist in Qt (which I haven't made time to do since I'm way behind on
> my list of deliverables for MythTV), I'd be very appreciative.  I can
> give you some information that should be a good start (and will
> require a bit of code-sleuthing in QUrl, QTextCodec, QString, and some
> related classes).
> Thanks,
> Mike

My environment is OK, locale gives "cs_CZ.UTF-8" as it should for all
variables. Without it, date formatting, sorting and even keyboard input
doesn't work properly, this setting is one of the first orders of
business after Linux installation. BUT, it doesn't matter in this case. :)

You see, UPnP XML replies are sent in UTF8 encoding by Myth forcibly,
without consulting locale. That's good, it's universal and what
UPnP/DLNA clients expect. You're talking about how Myth interfaces with
system-wide CP settings, along with DB's CP, etc. That's not the issue
here, that mechanism works perfectly and is not touched by the patch.
The patch only changes CP handling of UPnP's HTTP lib when talking to
remote clients. Not how Myth works with CP's and conversions "on localhost".

Thing is, Myth UPnP uses UTF8 for all communication (whatever CP you
have system-wide, it doesn't matter), so my point is just that it should
expect UTF8 from clients **by default** too. Of course, the **proper**
way to handle it is to consult the encoding specified in clients' HTTP
headers + XML bodies and use that to decode the requests, but until such
mechanism is in place, expecting UTF8 back is way more appropriate than
expecting Latin1, which we don't speak - ever - and which isn't used by
any UPnP/DLNA client I've tested.

The tests showed that all common clients (WMP11+, PS3 and Linux-running
Totem with UPnP plugin) also send UPnP requests in UTF8. It's not
because Myth talks in UTF8, they too use it for all UPnP communications
simply because it's universal. Just like Myth. But once you speak UTF8,
you should default to reading UTF8 back, not anything else and certainly
not Latin1.

Until the default handling switches from Latin1 to UTF8, UPnP won't work
properly for characters not present in Latin1.

*In short:* you're simply ignoring what CP client says it is and force
the conversion into Latin1. As long as we know everything - including us
- speaks in UTF8, this is wrong. Ideally, we should honor the CP
indicated by the client, but until that is done, expecting UTF8 works
much, much better.

David Kubicek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mythtv.org/pipermail/mythtv-users/attachments/20101107/8b804828/attachment-0001.htm>

More information about the mythtv-users mailing list