2007-06-27 10:24:42

by Andre Noll

[permalink] [raw]
Subject: 2.6.21.x kernel panic (tg3 and nfs related)

Hi

Our nfs server recently paniced under heavy nfs load. The backtrace
indicates that this might be a problem with the tigon3 network driver
which drives the onboard chips of the machine.

The first crash under 2.6.21.1 happened after about 4 days of uptime,
2.6.21.5 already crashed after 15 Minutes.

Screenshots of the resulting kernel panics are available at

http://www.systemlinux.org/~maan/shots/huangho-crash-2.6.21.1.png
and
http://www.systemlinux.org/~maan/shots/huangho-crash-2.6.21.5.png

We're now running 2.6.18.6 again which happens to be rock solid for our
workload. However, this kernel now spits out zillons of messages like

[55122.674290] RPC: bad TCP reclen 0x00010094 (large)

I'm sure it didn't do that half a year ago when it was running for
several months. The 2.6.21.x kernels did not print these messages
either, but from what I understand this is due to a patch which went
in somewhere between 2.6.18 and 2.6.21 and which just ratelimited
the message.

So something weird seems to be going on in our network and this might
well be related to the 2.6.21.x crashes we are seeing.

Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe


Attachments:
(No filename) (0.00 B)
(No filename) (286.00 B)
(No filename) (140.00 B)
Download all attachments

2007-06-27 15:58:23

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.6.21.x kernel panic (tg3 and nfs related)

On 06/27/2007 06:16 AM, Andre Noll wrote:
> Hi
>
> Our nfs server recently paniced under heavy nfs load. The backtrace
> indicates that this might be a problem with the tigon3 network driver
> which drives the onboard chips of the machine.
>
> The first crash under 2.6.21.1 happened after about 4 days of uptime,
> 2.6.21.5 already crashed after 15 Minutes.
>
> Screenshots of the resulting kernel panics are available at
>
> http://www.systemlinux.org/~maan/shots/huangho-crash-2.6.21.1.png
> and
> http://www.systemlinux.org/~maan/shots/huangho-crash-2.6.21.5.png
>

Looks the the known oops in show_mem(); patch is queued for 2.6.21.6.

But it is hard to tell when the screen shot doesn't show the whole
message...

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-27 16:14:35

by Andre Noll

[permalink] [raw]
Subject: Re: 2.6.21.x kernel panic (tg3 and nfs related)

On 11:58, Chuck Ebbert wrote:
> > Screenshots of the resulting kernel panics are available at
> >
> > http://www.systemlinux.org/~maan/shots/huangho-crash-2.6.21.1.png
> > and
> > http://www.systemlinux.org/~maan/shots/huangho-crash-2.6.21.5.png
> >
>
> Looks the the known oops in show_mem(); patch is queued for 2.6.21.6.

I see. Thanks.

> But it is hard to tell when the screen shot doesn't show the whole
> message...

There's no way to scroll up after a kernel panic. I already changed
the video setting to use a smaller font, so if it happens again,
there will be more info visible.

Andre
--
The only person who always got his work done by Friday was Robinson Crusoe


Attachments:
(No filename) (0.00 B)
(No filename) (286.00 B)
(No filename) (140.00 B)
Download all attachments