[Mythtv-translators] Themestrings has been updated for 0.25 - It's time to start translating! :)
knight at teksavvy.com
Fri Mar 23 06:55:23 UTC 2012
On 3/22/2012 2:24 PM, Kenni Lund wrote:
> 2012/3/22 Nicolas Riendeau<knight at teksavvy.com>
>> On 3/22/2012 12:57 PM, Nick Morrott wrote:
>>> I noticed some UTF-8 weirdness today after updating for the en_gb
>>> translations. The XML element generated by lupdate containing the
>>> description text for the Steppes theme (it contains "Français") was
>>> not generated with valid UTF-8, but rather each of the two bytes
>>> representing the "ç" character (should be C3 A7) was further
>>> re-encoded into UTF-8 so that 4 bytes in total (C3 83 C2 A7) were
>>> output for the character in the file.
> Heh...that bug just won't die :) First the encoding issue appeared in
> the theme downloader generation script, then in the themestrings tool
> and now in lupdate...
(Kenni I know you most likely know a good deal of this but since I'm
posting this to the mailing list I might as well document the problem we
had with this...)
I can only assume All of these scripts/programs assumed that the strings
where all in US-ASCII or when they were made there wasn't anything to
test them with to make sure they produced the expected results.
In the case of the themestrings tool what was most likely happening is
that the output which was forced to be outputted in UTF-8 was later
re-encoded into our local character set (which is most likely UTF-8 for
many if not all of us) by the QTextStream.
This time the problem is slightly different... By default lupdate
assumes that we are using ISO-8859-1** in the source files (when what we
trying to make it process is in UTF-8) so it takes it, assumes the "ç"
which is encoded using two bytes in UTF-8 is actually two characters in
ISO-8859-1 and proceeds to re-encode it into UTF-8 to store it in the
translation file with catastrophic results.
** lupdate default encoding, it's also known under the name Latin1.
The reason why this never caused problems before is that our strings are
normally in US-ASCII and both ISO-8859-1 and UTF-8 are supersets of
US-ASCII. What this means is that as long as the original text only
contains US-ASCII characters its encoding is *identical* in both
ISO-8859-1 and UTF-8.
While both are supersets of US-ASCII all non-US-ASCII characters are not
encoded them in the same way (even if the character values match).
So as long as everything was in US-ASCII none of these encoding problems
>> I think I have an idea how to fix it (assuming lupdate is actually able
>> to extract UTF-8 correctly)
> Ok, good, I haven't looked at it yet.
I'll do a few more spot checks tomorrow (it's pretty late here now) but
unless I find a problem with the fix I found (I'm not expecting to find
any though) I'll commit the fix in every file except for the one for the
programs under mythtv/ since we don't want to fix it right now since it
would actually add a new string.
(The fix will be added at a later time...)
The resulting translation file is encoded correctly after applying the
fix and it will display correctly in the main translation window but Qt
Linguist will still be unable to display the source file correctly (a
bug in Qt Linguist).
>> freeze. If we get reports from any of the translators that some strings
>> are untranslatable we *might* temporarily break the string freeze in
>> order to correct these issues and fix this at the same time..
> Yep, if it's the only string that needs fixing, let's just fix it
> through the translations. If we want to, we can always fix the source
> string as well as the single character in all of the translations, on
> the day before the release of 0.25.
Yep, the problem is quite harmless and doesn't justify adding a new
string at this time...
Have a nice day!
More information about the Mythtv-translators