[mythtv-commits] Ticket #5946: Fix (unimportant) data corruption in database character set conversion
MythTV
mythtv at cvs.mythtv.org
Mon Dec 1 22:59:34 UTC 2008
#5946: Fix (unimportant) data corruption in database character set conversion
----------------------------------------------+-----------------------------
Reporter: sphery <mtdean at thirdcontact.com> | Owner: ijr
Type: patch | Status: new
Priority: minor | Milestone: unknown
Component: mythtv | Version: head
Severity: medium | Resolution:
Mlocked: 0 |
----------------------------------------------+-----------------------------
Comment(by sphery <mtdean at thirdcontact.com>):
mythtv-5946-fix_database_utf8_conversion_corruption.patch fixes the
corruption caused by the initial (pre
mythtv-5946-fix_database_utf8_conversion.patch ) implementation of the
UTF-8 conversion in DB update 1216.
Because of references from the credits and recordedcredits table to person
(ID's) in the people table, the fix involves:
1. query all corrupt records in people
2. find all records that are duplicates once null-padding is removed
a. if there are duplicates:
1. update references in {,recorded}credits to refer to the original
(corrupt) person
2. delete the duplicate (not-null-padded) record from people
3. update the corrupt (null-padded) name on the original person record
b. if there are no duplicates
1. update the corrupt (null-padded) name on the original person record
Though we are updating the corrupt (null-padded) name on the original
person record whether there are duplicates or not, we cannot just move the
code outside the duplicate-checking conditional because if there are
duplicates, it's only safe to correct the name if we were able to update
references (we will only delete the duplicate if we were able to update
references).
Also, it seems inefficient to update the {,recorded}credits table to use
the corrupt person, delete the not-corrupt person, then correct the
corrupt person's name, but we need to use this approach to ensure that
data is valid even if we have multiple duplicates. For example, if users
have edited database data directly, they may have gotten additional
duplicates with varying numbers of null-pad characters.
The update is rather long-running. For various test cases on my Athlon X2
5000+ dev system, it took:
* 80229 records + 80229 dups = 58s
* 80229 records + 39771 dups (40458 corrupt, not dup) = 43s
* 80229 records + 0 dups (39771 corrupt records) = 11s
* 80229 records + 0 dups (0 corrupt records) = 0.5s (because of the fixes
for programgenres, programrating, and recordedrating)
* 72650 records with mythtv-fix_database_utf8_conversion.patch applied to
prevent corruption = 0.5s
--
Ticket URL: <http://svn.mythtv.org/trac/ticket/5946#comment:1>
MythTV <http://www.mythtv.org/>
MythTV
More information about the mythtv-commits
mailing list