2000-11-30 15:43:53

by Buddha Buck

[permalink] [raw]
Subject: HOW-DO-I: Diagnosing hardware problems

Hi,

I'm having troubles with my system, which has been having erratic lockups
(usually 1-2 a day, or more). While the problem may be kernel-related, it
is starting to look like a hardware problem.

The lockups happen at seemingly random times, not related to system load,
particular usages, etc. The lockups disable interrupts, as near as I can
tell. Among other things, it stops responding to the keyboard (numlock and
capslock no longer toggle the keyboard lights, Ctrl-Alt-Del ineffective,
and the console does not unblank. In lockups past, all these would still
work, even after a kernel panic).

When this first started (under 2.4.0pre10), I was getting oopses, showing
the system was dying in wake_up, while trying to schedule during an
interrupt (I think that's what the oops said). Some oopses would be
logged, and not kill the system, others would kill the system, and not be
logged. When I downgraded to 2.2.17+ide, I stopped getting oopses, and the
lockups stopped, for a while. Now the system (under both 2.2.18 and
2.4.0pre11) lockups but doesn't oops, not even to the console.

I also sometimes experience disk freezes, where the system blocks on
reading the disk, preventing new programs to load or files to be
read/written, but not freezing memory-resident applications or halting
interrupts. I have -once- been able to wait-out a freeze, and got a
message that the system was disabling DMA on the IDE drive. The system
usually gets unstable after that, and locks up soon afterwards. However,
not all lockups are predicated by freezes.

I have run memtest overnight with no reported errors. I replaced the
network card (a Linksys NE2k-PCI card with a Linksys tulip card) because it
was giving me errors. I have manually checked the processor temperature
and it is cool to the touch. This problem cropped up within the last few
months, and seems to be getting worse.

When running 2.4.0pre10-11, I tend to get some disk corruption detected
when I fsck during a reboot. I haven't noticed this as much with the 2.2.x
kernels.

I don't know where to proceed from here in eliminating hardware as an issue
before I report a kernel problem. Any suggestions?

later,
Buddha


2000-12-02 17:11:22

by Scott Prader

[permalink] [raw]
Subject: Re: HOW-DO-I: Diagnosing hardware problems


On Thu, 30 Nov 2000 10:12:34 -0500, Buddha Buck blurted forth:

> Hi,
[snip]
> When this first started (under 2.4.0pre10), I was getting oopses, showing
> the system was dying in wake_up, while trying to schedule during an
> interrupt (I think that's what the oops said). Some oopses would be
> logged, and not kill the system, others would kill the system, and not be
> logged. When I downgraded to 2.2.17+ide, I stopped getting oopses, and the
> lockups stopped, for a while. Now the system (under both 2.2.18 and
> 2.4.0pre11) lockups but doesn't oops, not even to the console.
[snip]

sounds like a bad CPU fan, take the cover off of the computer and start
it up, observe how fast the fan on the CPU is running. if it's not
running at all, that's your problem. turn the computer off and do not
start it again until you've replaced it, you can seriously burn stuff
out if it continues to run like that. also, is there a lot of clutter
in the case? wires everywhere? u may want to consider getting some
twist-ties or rubber bands or whatever and using them to clean up the
mess to allow proper ventilation of the system and get heat out of
there... this sounds like the most logical problem/solution to me, but
of course i could be wrong.. give this a go and see what happens.

--
.oO gnea at rochester dot rr dot com Oo.
.oO url: http://garson.org/~gnea Oo.

"You can tune a filesystem, but you can't tuna fish" -unknown