2007-10-24 11:09:23

by Andrew

[permalink] [raw]
Subject: 2.6.24-rc1: NULL pointer dereference using netconsole

Hi,

I booted up 2.6.24-rc1 this morning [Real early over a brew ;-)] and was
having a problems with multiple ~5 second hangs on SATA/drive init
(Something to do with "EH" something-or-other and resets but I'll email
in separately about it later unless its fixed by the time I get the chance).

Anyway, I went to fire up netconsole to get a decent log dump and hit
across the following nasty. Netconsole works fine in 2.6.23.1 with a
similar config and the same kernel parameters.

A shot of the screen is the only method I could come up with to capture
the log, I hope that is OK, it is pretty readable.


The nasty:
http://andotnet.nfshost.com/linux/2.6.24-rc1-netconsole-nullderef.jpg

The config:
http://andotnet.nfshost.com/linux/config-2.6.24-rc1.txt

The 'old' 2.6.23.1 config:
http://andotnet.nfshost.com/linux/config-2.6.23.1.txt

Netconsole log from successfully booting 2.6.23.1:
http://andotnet.nfshost.com/linux/successful-nclog-2.6.23.1.txt

The hardware, etc:
eth0 : Netgear FA311 10/100Mbps ethernet adapter (netsemi driver)
connected directly to another PC using a crossover cable.
cpu : athlon64 3500+
compiler : gcc 4.2.2
kernel params : root=/dev/sda3 vga=0x31B nmi_watchdog=0
[email protected]/eth0,[email protected]/00:11:d8:c0:5e:96


I hope this report is helpful and it's not too early in the release cycle,
I'm still pretty new and not subscribed, so please CC :-)

Btw, is it preferable for non-subscribers to use the kernel bugzilla? Or
post to the list?


Andrew (On his 3rd brew of the day)


2007-10-24 11:59:27

by Ingo Molnar

[permalink] [raw]
Subject: [patch] natsemi: fix oops, link back netdevice from private-struct


* Andrew Nelless <[email protected]> wrote:

> Hi,
>
> I booted up 2.6.24-rc1 this morning [Real early over a brew ;-)] and
> was having a problems with multiple ~5 second hangs on SATA/drive init
> (Something to do with "EH" something-or-other and resets but I'll
> email in separately about it later unless its fixed by the time I get
> the chance).
>
> Anyway, I went to fire up netconsole to get a decent log dump and hit
> across the following nasty. Netconsole works fine in 2.6.23.1 with a
> similar config and the same kernel parameters.
>
> A shot of the screen is the only method I could come up with to
> capture the log, I hope that is OK, it is pretty readable.
>
>
> The nasty:
> http://andotnet.nfshost.com/linux/2.6.24-rc1-netconsole-nullderef.jpg


the NULL dereference is here:

(gdb) list *0xffffffff804a9504
0xffffffff804a9504 is in natsemi_poll (drivers/net/natsemi.c:717).
712 return count;
713 }
714
715 static inline void __iomem *ns_ioaddr(struct net_device *dev)
716 {
717 return (void __iomem *) dev->base_addr;
718 }
719

which is this code from natsemi.c:

2227 struct net_device *dev = np->dev;
2228 void __iomem * ioaddr = ns_ioaddr(dev);
2229 int work_done = 0;

seems like the NAPI changes in -rc1 added an np->dev field but forgot to
initialize it ...

does the patch below fix the oops for you?

Ingo

-------------------->
Subject: natsemi: fix oops, link back netdevice from private-struct
From: Ingo Molnar <[email protected]>

this commit:

commit bea3348eef27e6044b6161fd04c3152215f96411
Author: Stephen Hemminger <[email protected]>
Date: Wed Oct 3 16:41:36 2007 -0700

[NET]: Make NAPI polling independent of struct net_device objects.

added np->dev to drivers/net/natsemi.c's struct netdev_private, but
forgot to initialize that new field upon driver init. The result was
a predictable NULL dereference oops the first time the hardware
generated an interrupt.

Reported-by: Andrew Nelless <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/net/natsemi.c | 1 +
1 file changed, 1 insertion(+)

Index: linux/drivers/net/natsemi.c
===================================================================
--- linux.orig/drivers/net/natsemi.c
+++ linux/drivers/net/natsemi.c
@@ -864,6 +864,7 @@ static int __devinit natsemi_probe1 (str

np = netdev_priv(dev);
netif_napi_add(dev, &np->napi, natsemi_poll, 64);
+ np->dev = dev;

np->pci_dev = pdev;
pci_set_drvdata(pdev, dev);

2007-10-24 17:37:40

by Andrew

[permalink] [raw]
Subject: Re: [patch] natsemi: fix oops, link back netdevice from private-struct

On Wed, October 24, 2007 12:58, Ingo Molnar wrote:
>
> the NULL dereference is here:
>
> (gdb) list *0xffffffff804a9504
> 0xffffffff804a9504 is in natsemi_poll (drivers/net/natsemi.c:717).
> 712 return count;
> 713 }
> 714
> 715 static inline void __iomem *ns_ioaddr(struct net_device *dev)
> 716 {
> 717 return (void __iomem *) dev->base_addr;
> 718 }
> 719
>
>
> which is this code from natsemi.c:
>
> 2227 struct net_device *dev = np->dev;
> 2228 void __iomem * ioaddr = ns_ioaddr(dev);
> 2229 int work_done = 0;
>
>
> seems like the NAPI changes in -rc1 added an np->dev field but forgot to initialize it ...
>
> does the patch below fix the oops for you?
>
> Ingo
>
>

Yep, that got it, thanks.

2007-10-25 07:33:21

by Jeff Garzik

[permalink] [raw]
Subject: Re: [patch] natsemi: fix oops, link back netdevice from private-struct

Ingo Molnar wrote:
> * Andrew Nelless <[email protected]> wrote:
>
>> Hi,
>>
>> I booted up 2.6.24-rc1 this morning [Real early over a brew ;-)] and
>> was having a problems with multiple ~5 second hangs on SATA/drive init
>> (Something to do with "EH" something-or-other and resets but I'll
>> email in separately about it later unless its fixed by the time I get
>> the chance).
>>
>> Anyway, I went to fire up netconsole to get a decent log dump and hit
>> across the following nasty. Netconsole works fine in 2.6.23.1 with a
>> similar config and the same kernel parameters.
>>
>> A shot of the screen is the only method I could come up with to
>> capture the log, I hope that is OK, it is pretty readable.
>>
>>
>> The nasty:
>> http://andotnet.nfshost.com/linux/2.6.24-rc1-netconsole-nullderef.jpg
>
>
> the NULL dereference is here:
>
> (gdb) list *0xffffffff804a9504
> 0xffffffff804a9504 is in natsemi_poll (drivers/net/natsemi.c:717).
> 712 return count;
> 713 }
> 714
> 715 static inline void __iomem *ns_ioaddr(struct net_device *dev)
> 716 {
> 717 return (void __iomem *) dev->base_addr;
> 718 }
> 719
>
> which is this code from natsemi.c:
>
> 2227 struct net_device *dev = np->dev;
> 2228 void __iomem * ioaddr = ns_ioaddr(dev);
> 2229 int work_done = 0;
>
> seems like the NAPI changes in -rc1 added an np->dev field but forgot to
> initialize it ...
>
> does the patch below fix the oops for you?
>
> Ingo
>
> -------------------->
> Subject: natsemi: fix oops, link back netdevice from private-struct
> From: Ingo Molnar <[email protected]>
>
> this commit:
>
> commit bea3348eef27e6044b6161fd04c3152215f96411
> Author: Stephen Hemminger <[email protected]>
> Date: Wed Oct 3 16:41:36 2007 -0700
>
> [NET]: Make NAPI polling independent of struct net_device objects.
>
> added np->dev to drivers/net/natsemi.c's struct netdev_private, but
> forgot to initialize that new field upon driver init. The result was
> a predictable NULL dereference oops the first time the hardware
> generated an interrupt.
>
> Reported-by: Andrew Nelless <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> drivers/net/natsemi.c | 1 +
> 1 file changed, 1 insertion(+)
>
> Index: linux/drivers/net/natsemi.c
> ===================================================================
> --- linux.orig/drivers/net/natsemi.c
> +++ linux/drivers/net/natsemi.c
> @@ -864,6 +864,7 @@ static int __devinit natsemi_probe1 (str
>
> np = netdev_priv(dev);
> netif_napi_add(dev, &np->napi, natsemi_poll, 64);
> + np->dev = dev;
>
> np->pci_dev = pdev;
> pci_set_drvdata(pdev, dev);

applied