[mythtv-users] Database encodings

Michael T. Dean mtdean at thirdcontact.com
Mon Mar 30 14:50:11 UTC 2009


On 03/30/2009 04:08 AM, Glenn Sommer wrote:
> I saw at http://www.mythtv.org/wiki/Fixing_Corrupt_Database_Encoding 
> that mythtv 0.22 can only handle latin1 connections to the MySQL 
> database - but uses UTF8 internally (Actually it writes UTF8 into the 
> database).

You've got your versions wrong.

MythTV 0.21-fixes and below use UTF-8.  MythTV 0.21-fixes and below 
stores UTF-8 in the database.  MythTV 0.21-fixes and below tells MySQL 
that the text columns are actually latin1.  MythTV 0.21-fixes and below 
does /not/ use latin1.

MythTV trunk use UTF-8.  MythTV trunk stores UTF-8 in the database.  
MythTV trunk tells MySQL that the text columns are actually UTF-8.  
MythTV trunk does /not/ use latin1.

In other words, the /only/ difference is that MythTV 0.21-fixes and 
below doesn't tell MySQL what encoding is actually in use.

> I don't understand why MythTV doesn't use UTF8 all the way - so no 
> encoding/decoding is required when talking to the database?
> Also, putting UTF8 text in a latin1 database is in my opinion wrong...

It does.  It used to store UTF-8 data in MySQL without /allowing/ MySQL 
to know that the data inside was UTF-8 to reduce the size of the 
database columns and indices significantly for a database where MySQL 
knows the data is UTF-8 if most of the data is actually latin1 (as it is 
for a /large/ number of users).  And, MythTV had to wait until MySQL had 
sufficient support for sufficiently-long columns and indices, and we've 
only recently started /requiring/ versions of MySQL that do.

> Other clients will be unable to read the data correctly (like 
> phpMyAdmin for example).

Well, the /only/ client that should be using MythTV database is really 
MythTV or other clients designed for use with MythTV (and, therefore, 
aware of the encoding).  And, that being said, if you knew what you were 
doing, you could actually make it work rather easily even in "other" 
clients that didn't realize what was going on.

> In my opinion latin1 text is for latin1 databases - and UTF8 text is 
> for UTF8 databases...
>
> Surely I must be missing something here?

Yes.  You're missing an understanding of what that page actually said.  :)

> What is the reason for breaking the database - instead of fixing MythTV?

Again, re-read that page.  We're simply telling people who have 
completely broken data (because they had configurations where they told 
MySQL to ignore the database schema's defined charset, so MySQL did 
character-set conversions it should /not/ have done) that they cannot 
successfully upgrade their databases until they fix the data.

Mike


More information about the mythtv-users mailing list