2005-03-30 20:46:22

by Noah Silverman

[permalink] [raw]
Subject: Hangcheck problem

Hi,

I'm been experiencing a weird problem....

I get endlessly repeated hangcheck errors in my syslog with no explanation:

Mar 30 12:41:43 db kernel: Hangcheck: hangcheck value past margin!

Eventually, after a few weeks, the box will hang. It is pingable, but I
can't ssh or connect to any servcie.

I would love to diagnose the problem, but the syslog entries don't give
me much to go on.

Does anybody have any suggestions.

Thanks,

-Noah


2005-03-30 22:29:55

by Noah Silverman

[permalink] [raw]
Subject: Re: Hangcheck problem

Sorry

2.6.7


Burton Windle wrote:
> Kernel version?
>

On Wed, 30 Mar 2005, Noah Silverman wrote:

> Hi,
>
> I'm been experiencing a weird problem....
>
> I get endlessly repeated hangcheck errors in my syslog with no
explanation:
>
> Mar 30 12:41:43 db kernel: Hangcheck: hangcheck value past margin!
>

2005-03-31 04:50:27

by Peter Chubb

[permalink] [raw]
Subject: Re: Hangcheck problem

>>>>> "Noah" == Noah Silverman <[email protected]> writes:

Noah> Sorry 2.6.7


Noah> Burton Windle wrote:
>> Kernel version?

Are you running on an x86 machine without TSC, e.g., a 486? the
Hangcheck timer then devolves into using jiffies, and a single jiffy
error gives you the printout you mention.

--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
The technical we do immediately, the political takes *forever*

2005-04-02 01:57:54

by Joseph Fannin

[permalink] [raw]
Subject: Re: Hangcheck problem

On Wed, Mar 30, 2005 at 02:29:45PM -0800, Noah Silverman wrote:
> On Wed, 30 Mar 2005, Noah Silverman wrote:

> > I'm been experiencing a weird problem....
> >
> > I get endlessly repeated hangcheck errors in my syslog with no
> explanation:
> >
> > Mar 30 12:41:43 db kernel: Hangcheck: hangcheck value past margin!

> Burton Windle wrote:
> > Kernel version?
> >
>
> 2.6.7

That's a really old kernel, and I'm sure anyone who could look
into this will ask you to upgrade to something recent and reproduce it
as the first step in tracking it down.

Is this an older box? I've seen the hangcheck warnings on a
486 I was using as a firewall/router -- ultimately I applied a patch
to set HZ to 100 and the problem went away. I *think*, once that patch
bitrotted, that I just turned off the hangcheck timer, but I can't
remember for sure.

If you turn off the hangcheck timer, does the problem go away
(i.e. no more lockups)?

--
Joseph Fannin
[email protected]