[mythtv-users] Backend machine seems to crash every few days

Damian myth at surr.co.uk
Fri Sep 30 07:32:17 UTC 2016


On 30/09/2016 06:04, Mark Wedel wrote:
> On 09/29/16 08:35 AM, Keith Pyle wrote:
>> On 09/29/16 07:00, Steve Goodey wrote:
>>> I don't know if this is a Myth backend issue, or something bigger, but
>>> my backend/sever seems to crash/hang every few days.
>>>
>>> The computer seems to still be running (the light is on), but I can't
>>> access the machine via vnc or ssh or anything. I can't even ping it.
>>> It's a headless server, so I just have to hold the power button in and
>>> then start it up again.
>>>
>>> Can anyone help me to get to the bottom of this? I have no idea what 
>>> I'm
>>> looking at if I view a log file.
>>>
>>> Any help would be much appreciated.
>>>
>>> Thanks,
>>> Damian
>>> _______________________________________________
>>> Damian,
>>>
>>> Unfortunately there's not a lot to go on with the info you've given.
>>>
>>> I'm not expert on this but I think first thing I'd do is see if you 
>>> can get a
>>> monitor, keyboard and mouse hooked up so that you can see what state
>>> the machine is in when it goes wrong.
>>>
>> It's probably not a mythbackend issue, per se, since a non-privileged
>> application should not be able to crash a system.  It is possible that
>> Myth or another program is doing something that the kernel doesn't
>> handle properly, but that's still a kernel bug IMO.  A hardware issue or
>> kernel bug would be my leading candidates.
>>
>> Steve's suggestion is a good one - put a monitor and keyboard on the
>> system if possible.  You might see a message displayed on the console.
>> If your kernel has CONFIG_MAGIC_SYSRQ, you might be able to use the
>> magic SysRq keys to determine if the kernel trapped a problem (SysRq
>> keys work) or there was a kernel panic (SysRq keys don't work), sync
>> file systems, do a cleaner reboot, etc.
>>
>> You should definitely look at your system log files, typically in
>> /var/log.  The specific files will vary by Linux distribution and your
>> settings for syslog (kern.log, syslog, and messages are common names).
>> Look specifically for any messages that were logged just prior to the
>> hang time and that seem unusual compared to log entries during a time of
>> normal operation, e.g., kernel panic, out of memory killer messages,
>> device not responding, etc.
>>
>> Consider *any* changes you made to the system shortly before this began
>> happening.  Have you added any devices, updated the kernel, installed
>> new programs, etc.?
>>
>> Do you have any system monitoring, particularly for temperature?  If
>> not, see if you have environmental monitoring configured and if the
>> lm-sensors command "sensors" works.  If it does, start looking at the
>> output periodically and see if the core temperatures climb.  You could
>> put the "sensors" command into a crontab entry and save the output every
>> 5 minutes or so to create a crude, temporary monitor.  Inspect your CPU
>> fan and make sure it is running.  Make sure the fan and the CPU cooler
>> fins are not obstructed by dust, fur, etc.
>
>  In additional to all of the above, verify that the disk is working 
> OK.  If the disk stops responding, that will pretty much hang up the 
> system (but the BIOS reset than gets it working again).  Using the 
> smartools, you can check some of the status info of the disk, eg, 
> 'smartctl -a /dev/sda'
>
>  Many disks will report temperature, so will also give you an idea if 
> your system is running hot.
>
>  Bad memory or bad cpu would be other likely culprits.  I had a bad 
> cpu (amd x4) where one of the cores would occasionally not execute 
> this right.  Most of the time, this might be a user process, so the 
> web browser or something might die (which with firefox, not that 
> unexpected), but if it was running the kernel, that might result in a 
> panic/hang.  This happened infrequently enough under normal usage that 
> it was tough to pinpoint it, but when doing compiles (where it was 
> running a lot stuff), I'd see random crashses of the compiler.
>
> The fact that this is a back end and has no monitor, it could even be 
> something so simple as the network interface/driver crapping out.  
> Though in that case, I'd expect you should be able to see something in 
> the logs.
>
>  But trying to track this down without a monitor is going to be really 
> difficult - you are much better off connecting a monitor to see its 
> state when it actually does hang vs trying to randomly diagnose what 
> the problem may be.

Thanks everyone,

I'll get a monitor and keyboard hooked up and run some tests.

If it passes the tests, I'll getting running with normal use, but with 
the monitor and keyboard still plugged in.

Are there any commands or anything that I should run in terminals to be 
'looking out' for any errors that hang the machine or network?

And if I need to go to the logs, how the hell do you guys get useful 
information out of them? I may as well be looking at the code from the 
Matrix for all of the sense they make to me!

Thanks again for the tips,
Damian


More information about the mythtv-users mailing list