[mythtv-users] find_orphans.py sort of working, sort of timing out

Thu Mar 8 21:15:19 UTC 2012

On Tue, Mar 6, 2012 at 2:36 PM, Matt Emmott <memmott at gmail.com> wrote:

>
>
> On Tue, Mar 6, 2012 at 11:22 AM, Raymond Wagner <raymond at wagnerrp.com>wrote:
>
>>  On 3/6/2012 10:50, Matt Emmott wrote:
>>
>>
>>
>> On Mon, Mar 5, 2012 at 8:48 PM, Matt Emmott <memmott at gmail.com> wrote:
>>
>>>
>>> On Mar 5, 2012 7:23 PM, "Thomas Boehm" <mythtv-users at lists.boehmi.net>
>>> wrote:
>>> >
>>> > Raymond Wagner wrote:
>>> > > Matt Emmott wrote:
>>> > >> My slave backend has now lost a second 750GB disk, ...
>>> >
>>> > > Did you recently lose a hard drive, or have a corrupt database?
>>> >
>>> > Looks like to me ;-)
>>> >
>>>  Yup! Two seagates went boom. My mbe is hooked to a raid 5 fiber array
>>> so hopefully that won't be an issue in the future.
>>
>>
>> This has been a very slow crawl. I have some ssh screens open on another
>> monitor at work and have been running a cleanup every few minutes. I'm down
>> to 1026 recordings. Every time I tell it to delete orphaned entries, I get
>> a ton of the following on my MBE:
>>
>> 2012-03-06 10:43:56.183 ProgramInfo(4746_20120128013000.mpg), Error:
>> GetPlaybackURL: '4746_20120128013000.mpg' should be local, but it can not
>> be found.
>>
>> Now, I assume this is normal because it's parsing through the orphaned
>> entries. But I also see a few of the following:
>>
>> 2012-03-06 10:43:24.182 Error deleting
>> 'GetPlaybackURL/UNABLE/TO/FIND/LOCAL/FILE/ON/magicbe/4067_20120209180000.mpg'
>> could not open
>>                         eno: No such file or directory (2)
>> 2012-03-06 10:43:24.193 Delete Error
>> 'GetPlaybackURL/UNABLE/TO/FIND/LOCAL/FILE/ON/magicbe/4067_20120209180000.mpg'
>>                         eno: No such file or directory (2)
>>
>> "Magic" is my SBE and "MagicBE" is my MBE. I've been having ongoing
>> issues with my HDHR prime failing on some recordings. That's on the list to
>> fix later.
>>
>> Is there anything I can do to stop the race condition, as Raymond called
>> it? It's a bit frustrating to have to keep cycling through the delete
>> function over and over (Although I do appreciate the fact that such a tool
>> exists).
>>
>>
>> The race condition is a consequence of how the bindings are set to set up
>> to delete.  The list of recordings are pulled from the database, but the
>> bindings are not allowed to delete from the database (so they don't have to
>> keep track of all the linked sundry tables and files).  Instead, it uses
>> the slightly inefficient mechanism of requesting a ProgramInfo from the
>> backend, and then passing it right back to the backend to delete.
>>
>> Now after you've run find_orphans.py the first pass through, it should
>> have queued up all 1056 entries for deletion.  When it refreshed, a number
>> of those entries still existed in the database, however when you tell it to
>> delete a second time, it tries to pull that ProgramInfo from the backend,
>> but since some have been deleted in the meantime, there will be no program
>> to pull, and that method will raise an error.  At that point, it flushes
>> back through and repopulates the list.
>>
>> In theory, that should mean all 1026 recordings are queued up in the
>> backend for deletion, and the backend should get to them in relatively
>> short order.  Since this is at least a day now, and it's only managed to
>> delete 30 of those entries, that sounds like there is something wonky going
>> on in the backend, outside the control of find_orphans.py.  You should be
>> seeing logs in the backend where it is working through all of those files.
>> The backend currently starts a new thread for each new file it intends to
>> delete.  Those threads sit idle for roughly three seconds, and then run
>> synchronously, one at a time, through a shared lock.  Perhaps there are
>> some problems dealing with that volume of threads in that short a time.
>>
>> _______________________________________________
>>
>
> Thanks for the info. I assume when you say that things should occur in the
> logs, you mean mythbackend.log. When I run the script I get the  errors I
> quoted above... The file cannot be found error seems to occur once for
> every orphaned recording, but I haven't gone back and counted. After it
> runs and I'm back at the menu prompt for find_orphans, I don't see anything
> else happening in mythbackend.log. I've had tail -f running all day and all
> I see are the standard job queues, etc. After the 100% spike in mysqld (I
> have 2 CPUs but it never goes over 100 for that process), I don't see any
> real load on the server. Mythbackend happily chugs along at very low CPU
> usage, and my load averages tend to stay around 0.3%. Anything else I can
> try?
>

I've been plugging away with the console open in another monitor. I run the
cleanup, watch the mythbackend log and mysqld CPU usage, and repeat my
delete when everything dies down. I'm down to 523 orphaned files after two
days.

One thing I notice... As it churns through the "should be local, but it can
not be found" errors, it stops at the following:

2012-03-08 16:09:26.425 Scheduled 2643 items in 18.6 = 0.01 match + 18.54
place
2012-03-08 16:09:26.440 Reschedule requested for id 0.
2012-03-08 16:09:45.127 Scheduled 2643 items in 18.7 = 0.02 match + 18.64
place

This is always the last line in the log after a delete run. I have a lot of
schedules set up with the "max recordings" set/ My feeling is that the
script chokes when it deletes an episode of something that lowers the max
recordings, and the reschedule kicks in.  Does the "2643 items" mean that I
actually have that many schedules set up? Seems like a lot.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.mythtv.org/pipermail/mythtv-users/attachments/20120308/f42df4ce/attachment.html