2004-11-03 11:36:56

by Thomas Oulevey

[permalink] [raw]
Subject: 2.4 lockup issue (flush_tlb_all)

Hello,

We are experiencing some lockup problems with our SMP configuration.
Here are the details :
- The computers lockup with no relevant logs.
- The kernel still replies to ping but higher level services are not
responding.
- After few hours (5-8), the kernel answers again and the load is around
40 then decreasing.

We manage to get some SysRq showPc output (screenshot :
http://www.elonex.ch/shot/)
According to the basic sysreq debugging, the problem seems to be related
to the function flush_tlb_all, and it is triggered with a write or read
(local or on nfs sometimes).

I looked at the LKML, and didn't find any known issues.
Maybe it has been corrected but not backported by redhat !
I'll appreciate any help.

Thank you in advance.

detailed configuration :
---------------
Processor : 2 x 2.8Ghz Pentium Xeon
Motherboard : Intel se7501cw2
Memory : 4 x 512MB DDR 266 ECC registered
Kernel : 2.4.20-31 (Redhat 7.3 with updates)


PLEASE CC the answers/comments

--
Thomas OULEVEY System Engineer
Elonex Switzerland Email: [email protected]
Switzerland


2004-11-07 20:29:02

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.4 lockup issue (flush_tlb_all)

On Wed, Nov 03, 2004 at 12:36:47PM +0100, Thomas Oulevey wrote:
> Hello,
>
> We are experiencing some lockup problems with our SMP configuration.
> Here are the details :
> - The computers lockup with no relevant logs.
> - The kernel still replies to ping but higher level services are not
> responding.
> - After few hours (5-8), the kernel answers again and the load is around
> 40 then decreasing.
>
> We manage to get some SysRq showPc output (screenshot :
> http://www.elonex.ch/shot/)
> According to the basic sysreq debugging, the problem seems to be related
> to the function flush_tlb_all, and it is triggered with a write or read
> (local or on nfs sometimes).
>
> I looked at the LKML, and didn't find any known issues.
> Maybe it has been corrected but not backported by redhat !
> I'll appreciate any help.
>
> Thank you in advance.
>
> detailed configuration :
> ---------------
> Processor : 2 x 2.8Ghz Pentium Xeon
> Motherboard : Intel se7501cw2
> Memory : 4 x 512MB DDR 266 ECC registered
> Kernel : 2.4.20-31 (Redhat 7.3 with updates)

You should report this one to the RH people, but I think RH 7.3
isnt support anymore?

Upgrading the kernel is a good idea.