[mythtv-users] How big is your database?

Fri Dec 2 14:06:19 EST 2005

Ant Daniel wrote:

>On 01/12/05, chris at cpr.homelinux.net <chris at cpr.homelinux.net> wrote:
>
>>No doubt most of that is taken up by the "previously recorded" data.
>>That could probably be reduced to just the series # and epsiode # if it
>>wasn't for the "forget old" feature.  The other thing that a good
>>clean-up would do is check the relationships so that there aren't
>>orphans in large tables like the markup table.  Another nice thing to do
>>would be to have an ability to remove old frontend data from the system.
>>
>Once again us poor non-US not considered ;)
>
On the bright side, I don't think this idea would be considered for 
inclusion--even for US users.

>In the UK our listing data doesn't appear to give a decent series # /
>epsiode #, I know myth's supposed to generate something when not
>provided, but I haven't worked out if that works. (I guess with almost
>2yrs data I should go and have a look).
>
>However when first reading your post (again we don't tend to use # to
>symbolise number), I read it as series hash, epsiode hash. Now that's
>a good idea.
>
>Why not hash the episode description rather than store the full length
>text, wouldn't that make a significant impact on previously recorded?
>
OK, so why shouldn't this be considered?

My database:
Total number of records (SELECT COUNT(*) FROM ...):
    oldrecorded: 3787
    recorded: 189
    recordedmarkup: 1746031 (Yes, that is 1.75M records.)

SELECT MIN(starttime) FROM oldrecorded;  ->  2004-05-03 03:00:00
SELECT NOW();  -> 2005-12-02 13:07:4

So, my database is approximately 19 months old.  In more than a year and 
a half, I've got 3787 records in oldrecorded.

SELECT MAX(LENGTH(description)) FROM oldrecorded;  ->  251

(and this is with DataDirect, which typically has much longer, more 
detailed descriptions than other grabbers).

So, let's assume that every description is this same length.  That means 
I have 950537 bytes of description after 1 1/2 years.

Let's add in title and subtitle:

SELECT MAX(LENGTH(title)) FROM recorded;  ->  39
SELECT MAX(LENGTH(subtitle)) FROM recorded;  ->  58

Once again, assume all titles/subtitles are the as long as the longest, 
and we have an additional 147693 and 219646 bytes.  Adding these up, we 
have 1317876 bytes.  That's 1.26MiB.  OK, so what about the actual 
storage requirements (length + 1 for the two VARCHAR types and length + 
2 for the TEXT type).  If we don't assume that's taken care of by the 
fact that most values are smaller than the max current value, we need to 
add an additional 15148 bytes, So, we're up to 1.27MiB...

Looking at the actual files:
# ls -al oldrecorded.*
-rw-rw----  1 mysql mysql 1004848 Dec  2 12:59 oldrecorded.MYD
-rw-rw----  1 mysql mysql  462848 Dec  2 12:59 oldrecorded.MYI
-rw-rw----  1 mysql mysql    9040 Nov  2 16:46 oldrecorded.frm

So, including other fields and the index, we have 1.41MiB (and notice 
that we've overestimated the table size by assuming the max length).  
Now, I could go down the road of comparing sizes when using the length 
of episode ID (12) or the length of a hash, but it all comes down 
to--even if we save 1.4MiB, have we accomplished anything?

To answer this question, let's look at recordedmarkup.  As mentioned 
above, I have 189 recordings which give me 1.75 million records in 
recordedmarkup.  Since recordedmarkup has a relatively simple structure 
(only one VARCHAR), we can figure the row size using column type storage 
requirements:

Field          Type                 Storage Required
-----          ----                 ----------------
chanid         int(10) unsigned      4
starttime      datetime              8
mark           bigint(20)            8
offset         varchar(32)           l+1 (we'll call it 11)
type           int(11)               4
-------        ----                 ----------------
Total                                35

and we get about 100 of these records per minute of an MPEG-2 recording.

 From inside the directory containing the mythconverg binary files:
# du -sh
192M    .

# ls -al recordedmarkup.*
-rw-rw----  1 mysql mysql 106883020 Dec  2 13:45 recordedmarkup.MYD
-rw-rw----  1 mysql mysql  69468160 Dec  2 13:45 recordedmarkup.MYI
-rw-rw----  1 mysql mysql      8668 Jul 19  2004 recordedmarkup.frm

So, as Gregorio said, most of the storage (168MiB of 192MiB) required is 
in the recordedmarkup table.  And, there are two ways for you to 
minimize the size of your recordedmarkup table:
    a) delete your recordings, or
    b) transcode all your recordings to NUV's with MPEG-4

Note that recordedmarkup will be small for users with framegrabbers 
because they use RTJPEG or MPEG-4 instead of MPEG-2, so they don't have 
to work to get a small database.  However, IMHO, it's not worth the 
effort of transcoding even for MPEG-2 users.  I have one transcoded 
recording out of 189 total and my 192MiB database is the equivalent of 
less than 10 min of video even with the terribly low bitrate (2200kbps 
average) I use.

Mike