[mythtv-users] Track down unstable hardware?

Steven Adeff adeffs.mythtv at gmail.com
Thu Feb 12 04:36:00 UTC 2009


On Wed, Feb 11, 2009 at 9:10 PM,  <jarpublic at gmail.com> wrote:
> On Wed, Feb 11, 2009 at 8:16 PM, Brian Wood <beww at beww.org> wrote:
>> On Wednesday 11 February 2009 17:42:37 jarpublic at gmail.com wrote:
>> > At this point I am getting off topic for this list. It is certainly some
>> > hardware failure. When it fails I can't get it to reboot. When I try to
>> > boot from a live CD I get the same kernel panic. However, I would hate
>> > get
>> > rid of the whole system, just because I am too ignorant to track down
>> > exactly which pieced of hardware is failing. Does anybody know a good
>> > linux
>> > list that may be able to help me track down which bit of hardware is
>> > going
>> > bad? It is especially challenging because if I let the system sit for a
>> > while it will boot up an work fine for some some indeterminate amount of
>> > time. I have used lm-sensors to track temps and nothing seems to be hot,
>> > all of the fans are running, and I have checked all of the drives for
>> > bad
>> > blocks. I don't know what else to do at this point. I don't want to
>> > bother
>> > the list anymore but does somebody know the right group to bother about
>> > troubleshooting linux hardware?
>>
>> A machine that always works after being off for a while probably has some
>> sort
>> of thermal problem. Sensors are seldom helpful, as this could be on just
>> about anything, chips, resistors, or even solder connections.
>>
>> You might try cooling various components with freeze-spray, that sometimes
>> helps identify this sort of trouble. Remember that if the problem is on a
>> chip die or the like it will take several seconds at least before things
>> start to work after you spray it. Don't be impatient, or you will have
>> sprayed lots of components and not know which one it was if it starts
>> working.
>>
>> Otherwise, unless you have a lab full of test gear, the only practical
>> troubleshooting method is substitution, replace things one by one with
>> known
>> good replacements until you find the problem.
>>
>> I'd suspect the PSU first, but YMMV.
>
>
> A thermal problem seemed to be the most likely problem to me, but I wasn't
> sure how to narrow this thing down. I didn't really consider the power
> supply because it doesn't completely crash. It just freezes on the current
> screen, and I lose all input and network. Even if I had hardware around to
> switch out the problem is made complicated by the fact that even the bad
> hardware works for some of the time. So it would be hard to say if switching
> a component out help things work because of that component or because the
> failing component happens to be working at that moment. The kernel panic
> comes up immediately after grub before anything happens. So I was hoping
> that it would be simple to narrow it down to a drive or perhaps there was
> some way to get me some fore verbose error messages.
>

peripherally following this thread, but I have to agree with Brian
that the first thing I would check is the power supply. I've seen
similar issues arise from power supply's on their last legs.
other than that, without one of those PCI slot-based hardware testers
it could be very hard to figure out without swapping out hardware
piece by piece.

-- 
Steve
http://www.mythtv.org/wiki/index.php/User:Steveadeff
Before you ask, read the FAQ!
http://www.mythtv.org/wiki/index.php/Frequently_Asked_Questions
then search the Wiki, and this list,
http://www.gossamer-threads.com/lists/mythtv/
Mailinglist etiquette -
http://www.mythtv.org/wiki/index.php/Mailing_List_etiquette


More information about the mythtv-users mailing list