2007-08-20 06:39:16

by gshan

[permalink] [raw]
Subject: kernel crashes inside MV643xx driver

Hi All,

After I started the NFS server, it crashed:

<3>Badness in local_bh_enable at
/home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
Badness in local_bh_enable at
/home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
Call trace:
[c0005340] check_bug_trap+0xbc/0x11c
[c0005604] ProgramCheckException+0x264/0x2bc
[c0004ac4] ret_from_except_full+0x0/0x4c
[c0022ae4] local_bh_enable+0x18/0x80
[c024648c] skb_copy_bits+0x168/0x3b8
[c024db44] __skb_linearize+0x90/0x150
[c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
[c025c934] qdisc_restart+0xac/0x2bc
[c024de9c] dev_queue_xmit+0x298/0x34c
[c0269814] ip_finish_output+0x140/0x2b8
[c026a3ac] ip_fragment+0x3cc/0x6e0
[c026bac8] ip_push_pending_frames+0x3dc/0x46c
[c0289ec4] udp_push_pending_frames+0x10c/0x1cc
[c028a7c4] udp_sendpage+0x104/0x188
[c0292fc8] inet_sendpage+0x90/0xb8

I searched the webs and found the similar problems:
http://www.mail-archive.com/[email protected]/msg05199.html
http://oss.sgi.com/archives/netdev/2005-09/msg00025.html

Who knew there are fixes for the problem?

Thanks,
Gavin


2007-09-05 15:26:30

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel crashes inside MV643xx driver

> On Mon, 20 Aug 2007 14:38:57 +0800 gshan <[email protected]> wrote:
> Hi All,
>
> After I started the NFS server, it crashed:
>
> <3>Badness in local_bh_enable at
> /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> Badness in local_bh_enable at
> /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> Call trace:
> [c0005340] check_bug_trap+0xbc/0x11c
> [c0005604] ProgramCheckException+0x264/0x2bc
> [c0004ac4] ret_from_except_full+0x0/0x4c
> [c0022ae4] local_bh_enable+0x18/0x80
> [c024648c] skb_copy_bits+0x168/0x3b8
> [c024db44] __skb_linearize+0x90/0x150
> [c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
> [c025c934] qdisc_restart+0xac/0x2bc
> [c024de9c] dev_queue_xmit+0x298/0x34c
> [c0269814] ip_finish_output+0x140/0x2b8
> [c026a3ac] ip_fragment+0x3cc/0x6e0
> [c026bac8] ip_push_pending_frames+0x3dc/0x46c
> [c0289ec4] udp_push_pending_frames+0x10c/0x1cc
> [c028a7c4] udp_sendpage+0x104/0x188
> [c0292fc8] inet_sendpage+0x90/0xb8
>
> I searched the webs and found the similar problems:
> http://www.mail-archive.com/[email protected]/msg05199.html
> http://oss.sgi.com/archives/netdev/2005-09/msg00025.html
>
> Who knew there are fixes for the problem?
>

Well that got a tremendous response, didn't it?

What do you mean by "crashed"? The above is a warning and the system
should have survived.

Which kernel version is being used?


2007-09-05 16:04:17

by Stephen Hemminger

[permalink] [raw]
Subject: Re: kernel crashes inside MV643xx driver

On Wed, 5 Sep 2007 08:24:52 -0700
Andrew Morton <[email protected]> wrote:

> > On Mon, 20 Aug 2007 14:38:57 +0800 gshan <[email protected]> wrote:
> > Hi All,
> >
> > After I started the NFS server, it crashed:
> >
> > <3>Badness in local_bh_enable at
> > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > Badness in local_bh_enable at
> > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > Call trace:
> > [c0005340] check_bug_trap+0xbc/0x11c
> > [c0005604] ProgramCheckException+0x264/0x2bc
> > [c0004ac4] ret_from_except_full+0x0/0x4c
> > [c0022ae4] local_bh_enable+0x18/0x80
> > [c024648c] skb_copy_bits+0x168/0x3b8
> > [c024db44] __skb_linearize+0x90/0x150
> > [c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
> > [c025c934] qdisc_restart+0xac/0x2bc
> > [c024de9c] dev_queue_xmit+0x298/0x34c
> > [c0269814] ip_finish_output+0x140/0x2b8
> > [c026a3ac] ip_fragment+0x3cc/0x6e0
> > [c026bac8] ip_push_pending_frames+0x3dc/0x46c
> > [c0289ec4] udp_push_pending_frames+0x10c/0x1cc
> > [c028a7c4] udp_sendpage+0x104/0x188
> > [c0292fc8] inet_sendpage+0x90/0xb8
> >
> > I searched the webs and found the similar problems:
> > http://www.mail-archive.com/[email protected]/msg05199.html
> > http://oss.sgi.com/archives/netdev/2005-09/msg00025.html
> >
> > Who knew there are fixes for the problem?
> >
>
> Well that got a tremendous response, didn't it?
>
> What do you mean by "crashed"? The above is a warning and the system
> should have survived.
>
> Which kernel version is being used?

The transmit start rework look like it should have already fixed the problem.
The driver was calling spin_lock_irqsave before calling skb_linearize, now
it checks first.

commit c8aaea25e0b069e9572caa74f984e109899c1765
Author: Dale Farnsworth <[email protected]>
Date: Fri Mar 3 10:02:05 2006 -0700

[PATCH] mv643xx_eth: Refactor tx command queuing code

Simplify and remove redundant code for filling transmit descriptors.
No changes in features; it's just a code reorganization/cleanup.

Signed-off-by: Dale Farnsworth <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

2007-09-05 17:02:34

by Dale Farnsworth

[permalink] [raw]
Subject: Re: kernel crashes inside MV643xx driver

On Wed, Sep 05, 2007 at 08:24:52AM -0700, Andrew Morton wrote:
> > On Mon, 20 Aug 2007 14:38:57 +0800 gshan <[email protected]> wrote:
> > Hi All,
> >
> > After I started the NFS server, it crashed:
> >
> > <3>Badness in local_bh_enable at
> > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > Badness in local_bh_enable at
> > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > Call trace:
> > [c0005340] check_bug_trap+0xbc/0x11c
> > [c0005604] ProgramCheckException+0x264/0x2bc
> > [c0004ac4] ret_from_except_full+0x0/0x4c
> > [c0022ae4] local_bh_enable+0x18/0x80
> > [c024648c] skb_copy_bits+0x168/0x3b8
> > [c024db44] __skb_linearize+0x90/0x150
> > [c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
> > [c025c934] qdisc_restart+0xac/0x2bc
> > [c024de9c] dev_queue_xmit+0x298/0x34c
> > [c0269814] ip_finish_output+0x140/0x2b8
> > [c026a3ac] ip_fragment+0x3cc/0x6e0
> > [c026bac8] ip_push_pending_frames+0x3dc/0x46c
> > [c0289ec4] udp_push_pending_frames+0x10c/0x1cc
> > [c028a7c4] udp_sendpage+0x104/0x188
> > [c0292fc8] inet_sendpage+0x90/0xb8
> >
> > I searched the webs and found the similar problems:
> > http://www.mail-archive.com/[email protected]/msg05199.html
> > http://oss.sgi.com/archives/netdev/2005-09/msg00025.html
> >
> > Who knew there are fixes for the problem?
> >
>
> Well that got a tremendous response, didn't it?
>
> What do you mean by "crashed"? The above is a warning and the system
> should have survived.
>
> Which kernel version is being used?

That is the key question. From the pathnames, I suspect that gshan is
using a MontaVista version. I'm still (especially) interested, since
MontaVista pays me.

BTW, I never received the original message on netdev or linux-kernel.
Hmm. Thanks to Andrew for replying.

-Dale

2007-09-05 23:27:17

by Satyam Sharma

[permalink] [raw]
Subject: Re: kernel crashes inside MV643xx driver



On Wed, 5 Sep 2007, Dale Farnsworth wrote:

> On Wed, Sep 05, 2007 at 08:24:52AM -0700, Andrew Morton wrote:
> > > On Mon, 20 Aug 2007 14:38:57 +0800 gshan <[email protected]> wrote:
> > > Hi All,
> > >
> > > After I started the NFS server, it crashed:
> > >
> > > <3>Badness in local_bh_enable at
> > > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > > Badness in local_bh_enable at
> > > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > > Call trace:
> > > [c0005340] check_bug_trap+0xbc/0x11c
> > > [c0005604] ProgramCheckException+0x264/0x2bc
> > > [c0004ac4] ret_from_except_full+0x0/0x4c
> > > [c0022ae4] local_bh_enable+0x18/0x80
> > > [c024648c] skb_copy_bits+0x168/0x3b8
> > > [c024db44] __skb_linearize+0x90/0x150
> > > [c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
> > > [c025c934] qdisc_restart+0xac/0x2bc
> > > [c024de9c] dev_queue_xmit+0x298/0x34c
> > > [c0269814] ip_finish_output+0x140/0x2b8
> > > [c026a3ac] ip_fragment+0x3cc/0x6e0
> > > [c026bac8] ip_push_pending_frames+0x3dc/0x46c
> > > [c0289ec4] udp_push_pending_frames+0x10c/0x1cc
> > > [c028a7c4] udp_sendpage+0x104/0x188
> > > [c0292fc8] inet_sendpage+0x90/0xb8
> > >
> > > I searched the webs and found the similar problems:

> > > http://www.mail-archive.com/[email protected]/msg05199.html

> > > http://oss.sgi.com/archives/netdev/2005-09/msg00025.html
> > >
> > > Who knew there are fixes for the problem?
> >
> > Well that got a tremendous response, didn't it?

I thought of replying to this post when I saw it couple weeks back, but
didn't because (1) that's an ancient kernel and (2) the first link that
was mentioned up there in the original post itself pointed to a patch to
solve it (which I verified was since applied to mainline too).


> > What do you mean by "crashed"? The above is a warning and the system
> > should have survived.
> >
> > Which kernel version is being used?
>
> That is the key question. From the pathnames, I suspect that gshan is
> using a MontaVista version. I'm still (especially) interested, since
> MontaVista pays me.
>
> BTW, I never received the original message on netdev or linux-kernel.
> Hmm. Thanks to Andrew for replying.