2003-08-27 23:26:31

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)


Ville,

Which kernel doesnt hang on your box? 2.4 something ? 2.2?


2003-08-28 06:10:21

by Ville Herva

[permalink] [raw]
Subject: Re: 2.4.22pre8 hangs too (Re: 2.4.21-jam1 solid hangs)

On Wed, Aug 27, 2003 at 08:22:07PM -0300, you [Marcelo Tosatti] wrote:
>
> Ville,
>
> Which kernel doesnt hang on your box? 2.4 something ?

2.4.20pre7 ran for over 9 months before it suddenly begun locking up (I
_suppose_ it could just mean the bug/problem is hard to trigger.) Nothing
had been changed: the box had been up for that nine month period, and the
same oracle dump cron job had been running each night.

Earlier 2.4's had too many problems with aic7xxx (crashes and so on), so I
can't comment on them.

After 2.4.20pre7, I tried 2.4.21-jam1 (based on -aa patchset) and
2.4.22-pre8. I also tried compiling 2.4.21-jam1 with gcc-3.2.1 instead of
2.96. All of those locked up eventually, sometimes within a day from
reboot, some times it takes weeks. At one point, 2.4.21-jam1 seemed to
reliably lock up when compiling kernel, but it no longer happens no matter
how hard I try. Usually the lock up happens during nightly oracle backup
dump.

> 2.2?

2.2.22 and 2.2.22-secure1 never locked up, but I didn't run them for more
than a couple of months.

I realize I should try to change all pieces of hardware one at the time and
try various different kernels, but it all seem tedious and shooting in the
dark: the lock up can take weeks to trigger. I would love to get some kind
of lead or theory on what causes the problem before resorting to brute force
search.

I think I'll try 2.4.22-final + a kernel debugger next. Or is there a better
way of getting information on hard lock ups than kdb? (Tried nmi watchdog
and sysrq already).


-- v --

[email protected]