[mythtv-users] Log output of user jobs via MythLog class?

Thu May 15 18:25:50 UTC 2014

Am 08.05.2014, 00:07 Uhr, schrieb Sam Jacobs <samlists at ijacobs.co.uk>:

> On 7 May 2014 at 09:31:32, Karl Dietz (dekarl at spaetfruehstuecken.org) wrote:
>> On 07.05.2014 10:10, Sam Jacobs wrote:
>> > You’re right—it was the “Sitcom” at the beginning of the subtitle that was throwing
>> > me off. I wonder why myth decided to remove *that* from the description but not the> rest of the detected subtitle (for me the detected subtitle always gets removed> from the description).
>> MythTV didn't decide anything, the broadcaster sends a short and a long
>> text.
>
> That’s interesting, thanks. I thought that short text simply referred to the programme title. Probably long text isn’t used in the UK, then.
>
>
>> Some broadcasters do send an episode title (or some nonsense for
>> movies, not the tagline, but "Movie 2014 DE/AU" written differently
>> every time) but a shorter description instead.
>> One common thing with this broadcaster is to put a "genreword" *or* a
>> "genre sentence related to the series" in front of the short description.
>> Then start the long description with a generic series description
>> followed by a long episode description.
>
> Then it’s perhaps inappropriate for MythTV to use the short text for the subtitle field for data from these broadcasters. Based on what you’ve written, I would personally suggest removing the part of the beginning of the long text that matches the end of the short text, then concatenating the two strings and using them for the description. In the examples Nicolas has provided, there is clearly nothing in the short text or the long that could reasonably be the subtitle of the programme.
>
> At the least, somebody should look at the distribution of word counts in the data[1][2] and implement a sensible word limit—automatically using a string of 22 words doesn’t seem sensible, to me.
>
> Sam
>
>
> [1] I’ve been using this SQL to look at the distribution of word counts in my program table, where video sources 1 and 3 are fed from EIT:
>
> SELECT IF(LENGTH(subtitle) > 0, LENGTH(subtitle) - LENGTH(REPLACE(subtitle, ' ', ''))+1, 0) subtitle_len, count(*) FROM program WHERE chanid IN (select chanid from channel where (sourceid=1 or sourceid=3)) GROUP BY subtitle_len;
>
> If *all* of one’s video sources are EIT fed, one can remove the WHERE clause:
>
> SELECT IF(LENGTH(subtitle) > 0, LENGTH(subtitle) - LENGTH(REPLACE(subtitle, ' ', ''))+1, 0) subtitle_len, count(*) FROM program GROUP BY subtitle_len;
>
>
> [2] My data, from a UK EIT fed program table:
>
> subtitle	programme
> length		count
> --------	---------
> 0		55630
> 1		4198
> 2		7853
> 3		8718
> 4		4639
> 5		2721
> 6		1919
> 7		2090
> 8		1040
> 9		1358
>
>

I was unsure if you requested me or someone else to execute the sql statements on our mythtv boxes. But currently I recognised that for non-tv series recordings the subtitle field does contain correct or correct alike informations. I see same stuff as Karl mentioned as well (genreword, generic series description).

Having started to investigate this a little, I fired your second sql statement into my mythconverg with this result:

# subtitle_len, count(*)
0, 90223
1, 4000
2, 3059
3, 7812
4, 4617
5, 2871
6, 2057
7, 1807
8, 1705
9, 1075
10, 2609
11, 1344
12, 671
13, 423
14, 744
15, 846
16, 2245
17, 1895
18, 2266
19, 2350
20, 2037
21, 1043
22, 587
23, 185
24, 65
25, 24
26, 15
27, 159

Nicolas

-- 
www.nskcomputing.de