[mythtv-users] Problems in load-balancing commflagging

f-myth-users at media.mit.edu f-myth-users at media.mit.edu
Wed Dec 7 23:27:11 EST 2005


[Should I move this to -dev and/or open a ticket?  I think I found
a bug.  Is there any point to bug-reporting 0.18.1, since SVN has
diverged so far from it at this point?]

I have new data.  But I'd still -really- love it if somebody, anybody
could answer any subset of the questions at the very bottom of this
message, since I'm still mostly buffaloed by this whole thing.
Thanks!  (And I've annotated some of 'em with my new understanding,
so really there are only a couple I'm really confused about...)

First off, I realized that I never ran mythtv-setup on my SBE to
change the job queue limit -there-; it was still at 1.  I've since
changed it to 5, the current max unless I recompile, and I've also
reset the job check frequency there to 10s to match the MBE.

Next, I ran both backends (slave and master), and the frontend, with
"-v all" and am dumping the output into three different files.  I then
tried another run recording 2.3,4,5,6,7 simultaneously.

It appears that the reason (or at least -one- reason) mythtranscode is
erroring is because most of the jobs running it are being attempted on
the SBE (probably because the MBE is busy with 5 commflagging jobs and
1 more queued---which won't run 'cause of the 5-job limit), -and- that
mythtranscode incorrectly believes it can't run on an SBE!

Instead, I see lines like this in the backend log on the SBE (I didn't
see them last time because I was only running the MBE backend with -v):

2005-12-07 22:15:45.360 Attempted to transcode myth://192.168.0.20:6543/1002_20051207221000_20051207221500.nuv. Mythtranscode is currently unable to transcode remote files.

Yet this is (probably!) bogus because the slave has the master's
recordings NFS-mounted anyway, and mythcommflag certainly has no
trouble looking at that filesystem.  It looks like mythtranscode
is asking the database for the filename, getting back something
with this myth:// access path instead, and punting.  What it
-should- be getting back is just a filename that points into /myth/tv,
which is NFS-mounted and corresponds to the same directory with the
same pathname on the MBE.

Curiously, I see a different number of these failures for each
channel; this might be related to when they started (since they
started at least 10s apart from each other, but all 6 recordings
started and ended simultaneously.  Here's the complete log of just
those lines:

2005-12-07 22:15:45.360 Attempted to transcode myth://192.168.0.20:6543/1002_20051207221000_20051207221500.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:15:45.480 Attempted to transcode myth://192.168.0.20:6543/1002_20051207221000_20051207221500.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:15:45.598 Attempted to transcode myth://192.168.0.20:6543/1002_20051207221000_20051207221500.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:15:45.719 Attempted to transcode myth://192.168.0.20:6543/1002_20051207221000_20051207221500.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:30:45.341 Attempted to transcode myth://192.168.0.20:6543/1002_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:30:45.461 Attempted to transcode myth://192.168.0.20:6543/1002_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:30:45.581 Attempted to transcode myth://192.168.0.20:6543/1002_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:30:45.699 Attempted to transcode myth://192.168.0.20:6543/1002_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:31:45.339 Attempted to transcode myth://192.168.0.20:6543/1003_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:31:45.459 Attempted to transcode myth://192.168.0.20:6543/1003_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:31:45.579 Attempted to transcode myth://192.168.0.20:6543/1003_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:31:45.699 Attempted to transcode myth://192.168.0.20:6543/1003_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:32:45.371 Attempted to transcode myth://192.168.0.20:6543/1004_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:32:45.491 Attempted to transcode myth://192.168.0.20:6543/1004_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:32:45.610 Attempted to transcode myth://192.168.0.20:6543/1004_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:32:45.729 Attempted to transcode myth://192.168.0.20:6543/1004_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:33:45.389 Attempted to transcode myth://192.168.0.20:6543/1005_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:33:45.509 Attempted to transcode myth://192.168.0.20:6543/1005_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:33:45.630 Attempted to transcode myth://192.168.0.20:6543/1005_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:33:45.751 Attempted to transcode myth://192.168.0.20:6543/1005_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:34:45.397 Attempted to transcode myth://192.168.0.20:6543/1006_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:34:45.519 Attempted to transcode myth://192.168.0.20:6543/1006_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:34:45.639 Attempted to transcode myth://192.168.0.20:6543/1006_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.
2005-12-07 22:34:45.759 Attempted to transcode myth://192.168.0.20:6543/1006_20051207222500_20051207223000.nuv. Mythtranscode is currently unable to transcode remote files.

    Date: Tue, 6 Dec 2005 23:42:00 -0500 (EST)
    From: f-myth-users at media.mit.edu

	Date: Tue, 6 Dec 2005 19:02:09 -0500 (EST)
	From: "Chris Pinkham" <cpinkham at bc2va.org>

	> (b) Possibly because of (a), if I do a test recording of 6 things
	>     simultaneously, commflagging only happens on 5 of them at once.
	>     The 6th waits until the others are done, and then runs.  They
	>     all run on the MBE, even though I don't have "run jobs only on
	>     original recording host" set, but OTOH, there's that cap of 5
	>     jobs total, so would I ever see that sixth job on the SBE?

	One of them should have run on the SBE as long as you have the SBE
	configured to allow running flagging jobs.  Turn on JobQueue debugging
	with "-v jobqueue" on the backend to see if it tells you why it isn't
	firing off the 6th job on the SBE.  With "-v jobqueue" enabled, it
	prints out information about every job everytime through the loop
	so you can see what's running, what's queued, what's finished, etc..

    Things are now broken in a different way.  I have many questions.

    I tried running "mythbackend -d -v jobqueue" (as the mythtv user) on
    the MBE, and got -very- different results than I was getting before;
    I suspect that this is because I hadn't restarted mythbackend since
    trying to turn on transcoding.  (It's been booted -many- times since
    setting up commflagging, but that was in previous weeks.)  Do changes
    to transcoding and/or recording profiles only take effect on restart
    of the backend?  I didn't -think- so (and it'd be pretty inconvenient
    if it always took a backend restart to change this sort of thing) but
    I haven't done anything else to the box

    My test environment was to start recording on channels 2,3,4,5,6,7
    simultaneously for 5 minutes via manual scheduling.

    The -previous- behavior (before I restarted mythbackend this evening
    with "-v jobqueue" was to do all commflagging on the MBE, with the
    first 5 jobs running in parallel, and the sixth running (I think!) on
    the MBE as well (coulda been the SBE; I might not have noticed if it
    was), but definitely after the first five.  Transcoding errored out
    and didn't run at all.

    -Now- what happens is the following:
    (a) The instant recording was due to commence, the backend logged a
	bunch of "Skipping "Flag Commercials" job for chanid 1002 @
	20051206213500, should be run on 'sbe' instead"; it logged one of
	these page channel I'd scheduled (with appropriate chanid's, of
	course).  I have no idea why the MBE (which has 5 of the 6 tuners)
	is suddenly claiming that the SBE should be running commflagging
	instead of the MBE.  No commflagging jobs ever ran on the MBE, as
	far as I could tell by running "ps -elf | grep comm" a lot.
    (b) One commflagging job started up on the SBE.  When it finished,
	another started, and so forth---no parallelism.
[I think I fixed (b), as above, by upping the job limit in
mythtv-setup on the SBE.]
    (c) Five transcoding jobs started up on the MBE, in parallel, as soon
	as recording finished.  (Before, -no- transcoding job were starting.)
[Leaves a question about whether this is kosher or not.  Separate message.]
    (d) The sixth transcoding job claimed an "errored" state instead of
	doing anything.  I can't -guarantee- that was the job that
	corresponded to the recording on the SBE's tuner, but I'm
	suspicious that it might have been, since it was channel 7 and
	they might have been allocated to tuners in the order in which I
	created the schedules.  ["select * from mythlog" isn't telling me
	what tuner recorded what; is there some better way to find out?]

    So, my still-unanswered questions:
    (a) How do I debug this better?  If a job claimed "errored", how do I
	grab ahold of its diagnostic output so I can see -why- it errored?
[I'm -hoping- that "-v all" in starting mythbackend is sufficient to
get mythtranscode to dump all its diagnostics at me---though there's
also figuring how to change its -V arg to dump more, since the 4099
it's being called with looks like a bitmask & I don't know if there's
more info I can get from it with a different one.]
    (b) What's going on w/the job queues here?
    (c) Transcoding was supposed to start -after- commflagging, but it
	didn't.  Why not?
[Does it matter if transcoding runs before or after commflagging?
Don't know.  Separate message.]
    (d) Why did 1 out of 6 transcoding jobs get an error, and what exactly
	-was- the error?
[I think it was the "no remote" error, but am not sure.]
    (e) There are at least two places to turn on transcoding---one is in
	the various profiles for Default, LiveTV, High, and Low, and one
	is in the Transcoding->MPEG2 slot one menu page away.  Which of
	these -should- I turn on, and which -shouldn't- I?  (Right now,
	they're -both- on, because turning on the Default one didn't do
	anything when I tried it; see previous message about that.)
[Misunderstanding---I think the profiles are -what I'm getting from
the -card-, and the Transcoding thing is -how to transcode it-.  But
it's -really- confusing that the questiona about whether to transcode
-at all- is on the UI page with the -card- and not the -transcoders-.]
    (f) Is MPEG2-PS the right thing in that menu, or should it be TS, or
	is it something else entirely?  Where are these choices
	documented?
[Still no idea what this means.]
    (g) Why is the diagnostic output from mythbackend mentioning that it's
	studiously ignoring a bunch of commflagging jobs from two days
	ago?  (That's the last time I tried to record anything.)  -Those-
	jobs ran okay, incidentally, and on the MBE.  It's also mentioning
	the two "Errored" transcoding jobs I tried to run that day when I
	was trying to debug transcoding.  Why are they still hanging
	around in the job queue?  What makes them go away?
[Still no idea how to make these go away.]
    (h) What other options does mythbackend take, and what do they do?
	(I note that it has no manpage.)
[I've been reading the source code, which at least mentions most of
the options in its usage string, but IIRC missed a couple I found in
the code itself.]

    Thanks!


More information about the mythtv-users mailing list