e1000 faults in 2.6.20-git, while 2.6.20 worked fine.
System is a D875PBZ with LOM.
clues?
thanks,
-Len
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
*pde = 3747c001
Oops: 0000 [#1]
SMP
Modules linked in: dm_mod video thermal sbs i2c_ec fan button dock battery asus_acpi ac lp intel_agp agpgart ehci_hcd parport_serial parpt
CPU: 0
EIP: 0060:[<00000000>] Not tainted VLI
EFLAGS: 00010246 (2.6.20-g724339d7 #32)
EIP is at _stext+0x3fefed10/0x14
eax: c21cd3a0 ebx: f8840000 ecx: 00000000 edx: c22d2e44
esi: c21cd3a0 edi: 00000000 ebp: c21cd564 esp: f755de6c
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process ip (pid: 2975, ti=f755d000 task=f75590b0 task.ti=f755d000)
Stack: c02a2077 fffffff4 f7987404 00000000 c02a1fc3 c0140cb5 00000010 012321b4
c21cd3a0 c21cd000 c21cd000 00000000 c02a1a71 c21cd000 c21cd000 c21cd3a0
00000000 c21cd3a0 00000000 c02a4916 c21cd000 00001003 00001002 c03b3751
Call Trace:
[<c02a2077>] e1000_intr+0xb4/0x107
[<c02a1fc3>] e1000_intr+0x0/0x107
[<c0140cb5>] request_irq+0xa5/0xcc
[<c02a1a71>] e1000_request_irq+0xad/0xe6
[<c02a4916>] e1000_open+0x43/0xbd
[<c03b3751>] dev_open+0x2b/0x62
[<c03b224d>] dev_change_flags+0x47/0xe4
[<c03eb20e>] devinet_ioctl+0x250/0x56d
[<c03b3348>] dev_ifsioc+0x113/0x38d
[<c021f4ee>] copy_to_user+0x2d/0x43
[<c03a9a28>] sock_ioctl+0x19f/0x1be
[<c03a9889>] sock_ioctl+0x0/0x1be
[<c0168767>] do_ioctl+0x1f/0x62
[<c01689ee>] vfs_ioctl+0x244/0x256
[<c0168a33>] sys_ioctl+0x33/0x4c
[<c0103d88>] sysenter_past_esp+0x5d/0x81
=======================
Code: Bad EIP value.
EIP: [<00000000>] _stext+0x3fefed10/0x14 SS:ESP 0068:f755de6c
/etc/sysconfig/network-scripts/ifup-eth: line 272: 2975 Segmentation fault ip link set dev ${REALDEVICE} up
Failed to bring up eth0.
[FAILED]
Len Brown <[email protected]> writes:
> e1000 faults in 2.6.20-git, while 2.6.20 worked fine.
>
> System is a D875PBZ with LOM.
>
> clues?
I'm guessing this is an old bug found by the following bit of
debug coded added into since v2.6.20
+#ifdef CONFIG_DEBUG_SHIRQ
+ if (irqflags & IRQF_SHARED) {
+ /*
+ * It's a shared IRQ -- the driver ought to be prepared for it
+ * to happen immediately, so let's make sure....
+ * We do this before actually registering it, to make sure that
+ * a 'real' IRQ doesn't run in parallel with our fake
+ */
+ if (irqflags & IRQF_DISABLED) {
+ unsigned long flags;
+
+ local_irq_save(flags);
+ handler(irq, dev_id);
+ local_irq_restore(flags);
+ } else
+ handler(irq, dev_id);
+ }
+#endif
I don't have a clue why the e1000 wasn't ready though.
Eric
>
> Bringing up loopback interface: [ OK ]
> Bringing up interface eth0: BUG: unable to handle kernel NULL pointer
> dereference at virtual address 00000000
> printing eip:
> *pde = 3747c001
> Oops: 0000 [#1]
> SMP
> Modules linked in: dm_mod video thermal sbs i2c_ec fan button dock battery
> asus_acpi ac lp intel_agp agpgart ehci_hcd parport_serial parpt
> CPU: 0
> EIP: 0060:[<00000000>] Not tainted VLI
> EFLAGS: 00010246 (2.6.20-g724339d7 #32)
> EIP is at _stext+0x3fefed10/0x14
> eax: c21cd3a0 ebx: f8840000 ecx: 00000000 edx: c22d2e44
> esi: c21cd3a0 edi: 00000000 ebp: c21cd564 esp: f755de6c
> ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
> Process ip (pid: 2975, ti=f755d000 task=f75590b0 task.ti=f755d000)
> Stack: c02a2077 fffffff4 f7987404 00000000 c02a1fc3 c0140cb5 00000010 012321b4
> c21cd3a0 c21cd000 c21cd000 00000000 c02a1a71 c21cd000 c21cd000 c21cd3a0
> 00000000 c21cd3a0 00000000 c02a4916 c21cd000 00001003 00001002 c03b3751
> Call Trace:
> [<c02a2077>] e1000_intr+0xb4/0x107
> [<c02a1fc3>] e1000_intr+0x0/0x107
> [<c0140cb5>] request_irq+0xa5/0xcc
> [<c02a1a71>] e1000_request_irq+0xad/0xe6
> [<c02a4916>] e1000_open+0x43/0xbd
> [<c03b3751>] dev_open+0x2b/0x62
> [<c03b224d>] dev_change_flags+0x47/0xe4
> [<c03eb20e>] devinet_ioctl+0x250/0x56d
> [<c03b3348>] dev_ifsioc+0x113/0x38d
> [<c021f4ee>] copy_to_user+0x2d/0x43
> [<c03a9a28>] sock_ioctl+0x19f/0x1be
> [<c03a9889>] sock_ioctl+0x0/0x1be
> [<c0168767>] do_ioctl+0x1f/0x62
> [<c01689ee>] vfs_ioctl+0x244/0x256
> [<c0168a33>] sys_ioctl+0x33/0x4c
> [<c0103d88>] sysenter_past_esp+0x5d/0x81
> =======================
> Code: Bad EIP value.
> EIP: [<00000000>] _stext+0x3fefed10/0x14 SS:ESP 0068:f755de6c
> /etc/sysconfig/network-scripts/ifup-eth: line 272: 2975 Segmentation fault ip
> link set dev ${REALDEVICE} up
> Failed to bring up eth0.
> [FAILED]
Eric W. Biederman wrote:
> Len Brown <[email protected]> writes:
>
>> e1000 faults in 2.6.20-git, while 2.6.20 worked fine.
>>
>> System is a D875PBZ with LOM.
>>
>> clues?
>
> I'm guessing this is an old bug found by the following bit of
> debug coded added into since v2.6.20
>
> +#ifdef CONFIG_DEBUG_SHIRQ
> + if (irqflags & IRQF_SHARED) {
> + /*
> + * It's a shared IRQ -- the driver ought to be
> prepared for it + * to happen immediately, so let's
> make sure.... + * We do this before actually
> registering it, to make sure that + * a 'real' IRQ
> doesn't run in parallel with our fake + */
> + if (irqflags & IRQF_DISABLED) {
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + handler(irq, dev_id);
> + local_irq_restore(flags);
> + } else
> + handler(irq, dev_id);
> + }
> +#endif
>
> I don't have a clue why the e1000 wasn't ready though.
>
our code is clearly calling request_irq before we have assigned the
function pointer adapter->clean_rx as well as adapter->alloc_rx_buf
That would be a bug, a possible patch would be (inline and attached):
compile tested, *but* I couldn't test this patch to make sure it worked
because I couldn't boot 2.6.20-git due to it not finding my RAID0 + lvm
disk.
[PATCH] e1000: fix shared interrupt warning message
From: Jesse Brandeburg <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
---
drivers/net/e1000/e1000_main.c | 13 +++++++------
1 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/net/e1000/e1000_main.c
b/drivers/net/e1000/e1000_main.c
index 619c892..b8c4d5c 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1417,10 +1417,6 @@ e1000_open(struct net_device *netdev)
if ((err = e1000_setup_all_rx_resources(adapter)))
goto err_setup_rx;
- err = e1000_request_irq(adapter);
- if (err)
- goto err_req_irq;
-
e1000_power_up_phy(adapter);
if ((err = e1000_up(adapter)))
@@ -1431,6 +1427,10 @@ e1000_open(struct net_device *netdev)
e1000_update_mng_vlan(adapter);
}
+ err = e1000_request_irq(adapter);
+ if (err)
+ goto err_req_irq;
+
/* If AMT is enabled, let the firmware know that the network
* interface is now open */
if (adapter->hw.mac_type == e1000_82573 &&
@@ -1439,10 +1439,11 @@ e1000_open(struct net_device *netdev)
return E1000_SUCCESS;
+err_req_irq:
+ e1000_down(adapter);
+ e1000_free_irq(adapter);
err_up:
e1000_power_down_phy(adapter);
- e1000_free_irq(adapter);
-err_req_irq:
e1000_free_all_rx_resources(adapter);
err_setup_rx:
e1000_free_all_tx_resources(adapter);
On Thursday 15 February 2007 21:10, Brandeburg, Jesse wrote:
> Eric W. Biederman wrote:
> > Len Brown <[email protected]> writes:
> >
> >> e1000 faults in 2.6.20-git, while 2.6.20 worked fine.
> >>
> >> System is a D875PBZ with LOM.
> >>
> >> clues?
> >
> > I'm guessing this is an old bug found by the following bit of
> > debug coded added into since v2.6.20
> >
> > +#ifdef CONFIG_DEBUG_SHIRQ
> > + if (irqflags & IRQF_SHARED) {
> > + /*
> > + * It's a shared IRQ -- the driver ought to be
> > prepared for it + * to happen immediately, so let's
> > make sure.... + * We do this before actually
> > registering it, to make sure that + * a 'real' IRQ
> > doesn't run in parallel with our fake + */
> > + if (irqflags & IRQF_DISABLED) {
> > + unsigned long flags;
> > +
> > + local_irq_save(flags);
> > + handler(irq, dev_id);
> > + local_irq_restore(flags);
> > + } else
> > + handler(irq, dev_id);
> > + }
> > +#endif
> >
> > I don't have a clue why the e1000 wasn't ready though.
> >
>
> our code is clearly calling request_irq before we have assigned the
> function pointer adapter->clean_rx as well as adapter->alloc_rx_buf
>
> That would be a bug, a possible patch would be (inline and attached):
> compile tested, *but* I couldn't test this patch to make sure it worked
> because I couldn't boot 2.6.20-git due to it not finding my RAID0 + lvm
> disk.
>
> [PATCH] e1000: fix shared interrupt warning message
>
> From: Jesse Brandeburg <[email protected]>
>
> Signed-off-by: Jesse Brandeburg <[email protected]>
> ---
>
> drivers/net/e1000/e1000_main.c | 13 +++++++------
> 1 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/e1000/e1000_main.c
> b/drivers/net/e1000/e1000_main.c
> index 619c892..b8c4d5c 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -1417,10 +1417,6 @@ e1000_open(struct net_device *netdev)
> if ((err = e1000_setup_all_rx_resources(adapter)))
> goto err_setup_rx;
>
> - err = e1000_request_irq(adapter);
> - if (err)
> - goto err_req_irq;
> -
> e1000_power_up_phy(adapter);
>
> if ((err = e1000_up(adapter)))
> @@ -1431,6 +1427,10 @@ e1000_open(struct net_device *netdev)
> e1000_update_mng_vlan(adapter);
> }
>
> + err = e1000_request_irq(adapter);
> + if (err)
> + goto err_req_irq;
> +
> /* If AMT is enabled, let the firmware know that the network
> * interface is now open */
> if (adapter->hw.mac_type == e1000_82573 &&
> @@ -1439,10 +1439,11 @@ e1000_open(struct net_device *netdev)
>
> return E1000_SUCCESS;
>
> +err_req_irq:
> + e1000_down(adapter);
> + e1000_free_irq(adapter);
> err_up:
> e1000_power_down_phy(adapter);
> - e1000_free_irq(adapter);
> -err_req_irq:
> e1000_free_all_rx_resources(adapter);
> err_setup_rx:
> e1000_free_all_tx_resources(adapter);
>
Works for me(tm) on latest 2.6.20-git and D875PBZ.
thanks,
-Len
[Adding Dimitri Mishin to the CC - he proposed the same patch earlier]
Len Brown wrote:
> On Thursday 15 February 2007 21:10, Brandeburg, Jesse wrote:
>> Eric W. Biederman wrote:
>>> Len Brown <[email protected]> writes:
>>>
>>>> e1000 faults in 2.6.20-git, while 2.6.20 worked fine.
>>>>
>>>> System is a D875PBZ with LOM.
>>>>
>>>> clues?
>>> I'm guessing this is an old bug found by the following bit of
>>> debug coded added into since v2.6.20
>>>
>>> +#ifdef CONFIG_DEBUG_SHIRQ
>>> + if (irqflags & IRQF_SHARED) {
>>> + /*
>>> + * It's a shared IRQ -- the driver ought to be
>>> prepared for it + * to happen immediately, so let's
>>> make sure.... + * We do this before actually
>>> registering it, to make sure that + * a 'real' IRQ
>>> doesn't run in parallel with our fake + */
>>> + if (irqflags & IRQF_DISABLED) {
>>> + unsigned long flags;
>>> +
>>> + local_irq_save(flags);
>>> + handler(irq, dev_id);
>>> + local_irq_restore(flags);
>>> + } else
>>> + handler(irq, dev_id);
>>> + }
>>> +#endif
>>>
>>> I don't have a clue why the e1000 wasn't ready though.
>>>
>> our code is clearly calling request_irq before we have assigned the
>> function pointer adapter->clean_rx as well as adapter->alloc_rx_buf
>>
>> That would be a bug, a possible patch would be (inline and attached):
>> compile tested, *but* I couldn't test this patch to make sure it worked
>> because I couldn't boot 2.6.20-git due to it not finding my RAID0 + lvm
>> disk.
>>
>> [PATCH] e1000: fix shared interrupt warning message
>>
>> From: Jesse Brandeburg <[email protected]>
>>
>> Signed-off-by: Jesse Brandeburg <[email protected]>
>> ---
>>
>> drivers/net/e1000/e1000_main.c | 13 +++++++------
>> 1 files changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/e1000/e1000_main.c
>> b/drivers/net/e1000/e1000_main.c
>> index 619c892..b8c4d5c 100644
>> --- a/drivers/net/e1000/e1000_main.c
>> +++ b/drivers/net/e1000/e1000_main.c
>> @@ -1417,10 +1417,6 @@ e1000_open(struct net_device *netdev)
>> if ((err = e1000_setup_all_rx_resources(adapter)))
>> goto err_setup_rx;
>>
>> - err = e1000_request_irq(adapter);
>> - if (err)
>> - goto err_req_irq;
>> -
>> e1000_power_up_phy(adapter);
>>
>> if ((err = e1000_up(adapter)))
>> @@ -1431,6 +1427,10 @@ e1000_open(struct net_device *netdev)
>> e1000_update_mng_vlan(adapter);
>> }
>>
>> + err = e1000_request_irq(adapter);
>> + if (err)
>> + goto err_req_irq;
>> +
>> /* If AMT is enabled, let the firmware know that the network
>> * interface is now open */
>> if (adapter->hw.mac_type == e1000_82573 &&
>> @@ -1439,10 +1439,11 @@ e1000_open(struct net_device *netdev)
>>
>> return E1000_SUCCESS;
>>
>> +err_req_irq:
>> + e1000_down(adapter);
>> + e1000_free_irq(adapter);
>> err_up:
>> e1000_power_down_phy(adapter);
>> - e1000_free_irq(adapter);
>> -err_req_irq:
>> e1000_free_all_rx_resources(adapter);
>> err_setup_rx:
>> e1000_free_all_tx_resources(adapter);
>>
>
> Works for me(tm) on latest 2.6.20-git and D875PBZ.
If there are no objections I'll push this patch to Jeff Garzik together with two
other changes I have for him.
Cheers,
Auke
On Thu, 15 Feb 2007 18:10:53 -0800 "Brandeburg, Jesse" <[email protected]> wrote:
> @@ -1431,6 +1427,10 @@ e1000_open(struct net_device *netdev)
> e1000_update_mng_vlan(adapter);
> }
>
> + err = e1000_request_irq(adapter);
> + if (err)
> + goto err_req_irq;
> +
> /* If AMT is enabled, let the firmware know that the network
> * interface is now open */
> if (adapter->hw.mac_type == e1000_82573 &&
> @@ -1439,10 +1439,11 @@ e1000_open(struct net_device *netdev)
>
> return E1000_SUCCESS;
>
> +err_req_irq:
> + e1000_down(adapter);
> + e1000_free_irq(adapter);
> err_up:
We don't want that e1000_free_irq(adapter) in the error path.
Andrew Morton wrote:
> On Thu, 15 Feb 2007 18:10:53 -0800 "Brandeburg, Jesse" <[email protected]> wrote:
>
>> @@ -1431,6 +1427,10 @@ e1000_open(struct net_device *netdev)
>> e1000_update_mng_vlan(adapter);
>> }
>>
>> + err = e1000_request_irq(adapter);
>> + if (err)
>> + goto err_req_irq;
>> +
>> /* If AMT is enabled, let the firmware know that the network
>> * interface is now open */
>> if (adapter->hw.mac_type == e1000_82573 &&
>> @@ -1439,10 +1439,11 @@ e1000_open(struct net_device *netdev)
>>
>> return E1000_SUCCESS;
>>
>> +err_req_irq:
>> + e1000_down(adapter);
>> + e1000_free_irq(adapter);
>> err_up:
>
> We don't want that e1000_free_irq(adapter) in the error path.
indeed, thanks for spotting and telling me before I sent this to Jeff.
Cheers,
Auke