2006-03-05 18:08:16

by Martin Michlmayr

[permalink] [raw]
Subject: de2104x: interrupts before interrupt handler is registered

We have three independent reports about problems with de2104x involving
interrupts. Alan Stern suggested that it "sure looks as though the
ethernet interface is generating an interrupt request before the
de2104x driver has registered its interrupt handler".

The three reports are:
- de2104x does not work (non-fatal oops) when uhci_hcd is loaded
first. http://lkml.org/lkml/2006/2/3/402 The problem does not
occur under 2.4 with the tulip module, so this is a regression.
- fatal de2104x interrupt oops (without uhci_hcd).
http://lkml.org/lkml/2006/2/5/64
- "kernel panic after the first transmission attempt times out"
Regression from 2.4. http://bugs.debian.org/288821

I have a system on which I can reproduce this bug 100%. While I have
no idea how to fix the issue, I can provide debugging information and
test a fix. However, I'm (temporarily) leaving the country in three
weeks and won't have access to this PC for several months, so it would
be great if someone could look into this soon. Jeff?


1)
eth0: enabling interface
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
irq 10: nobody cared (try booting with the "irqpoll" option)
[<c012f89e>] __report_bad_irq+0x31/0x73
[<c012f96d>] note_interrupt+0x75/0x98
[<c012f46a>] __do_IRQ+0x67/0x91
[<c0104fc1>] do_IRQ+0x19/0x24
[<c0103afa>] common_interrupt+0x1a/0x20
[<c0119a1c>] __do_softirq+0x2c/0x7d
[<c0119a8f>] do_softirq+0x22/0x26
[<c0104fc6>] do_IRQ+0x1e/0x24
[<c0103afa>] common_interrupt+0x1a/0x20
[<c481da07>] de_set_rx_mode+0xf/0x12 [de2104x]
[<c481e2c1>] de_init_hw+0x6d/0x76 [de2104x]
[<c481e59e>] de_open+0x64/0xe4 [de2104x]
[<c0225a5f>] dev_open+0x30/0x66
[<c0226a9a>] dev_change_flags+0x4d/0xf0
[<c025d301>] devinet_ioctl+0x224/0x4bd
[<c0155541>] do_ioctl+0x21/0x50
[<c0155774>] vfs_ioctl+0x152/0x161
[<c01557cb>] sys_ioctl+0x48/0x65
[<c0102a99>] syscall_call+0x7/0xb
handlers:
[<c4890d97>] (usb_hcd_irq+0x0/0x56 [usbcore])
Disabling IRQ #10

3)
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
[__report_bad_irq+42/144] __report_bad_irq+0x2a/0x90
[note_interrupt+108/160] note_interrupt+0x6c/0xa0
[do_IRQ+289/304] do_IRQ+0x121/0x130
[common_interrupt+24/32] common_interrupt+0x18/0x20
[__do_softirq+48/128] __do_softirq+0x30/0x80
[acpi_irq+0/22] acpi_irq+0x0/0x16
[do_softirq+38/48] do_softirq+0x26/0x30
[do_IRQ+253/304] do_IRQ+0xfd/0x130
[common_interrupt+24/32] common_interrupt+0x18/0x20
[__crc_do_softirq+25311/208152] de_set_rx_mode+0x26/0x50 [de2104x]
[__crc_do_softirq+28277/208152] de_init_hw+0x8c/0x90 [de2104x]
[__crc_do_softirq+29105/208152] de_open+0x68/0x140 [de2104x]
[profile_hook+45/75] profile_hook+0x2d/0x4b
[dev_open+203/256] dev_open+0xcb/0x100
[dev_mc_upload+36/80] dev_mc_upload+0x24/0x50
[dev_change_flags+81/288] dev_change_flags+0x51/0x120
[devinet_ioctl+582/1424] devinet_ioctl+0x246/0x590
[inet_ioctl+94/160] inet_ioctl+0x5e/0xa0
[sock_ioctl+249/688] sock_ioctl+0xf9/0x2b0
[sys_ioctl+269/656] sys_ioctl+0x10d/0x290
[syscall_call+7/11] syscall_call+0x7/0xb
eth0: link up, media 10baseT auto

--
Martin Michlmayr
http://www.cyrius.com/


2006-03-05 19:04:13

by Francois Romieu

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

Martin Michlmayr <[email protected]> :
[...]
> I have a system on which I can reproduce this bug 100%. While I have
> no idea how to fix the issue, I can provide debugging information and
> test a fix. However, I'm (temporarily) leaving the country in three
> weeks and won't have access to this PC for several months, so it would
> be great if someone could look into this soon. Jeff?

(not compile-tested)

diff --git a/drivers/net/tulip/de2104x.c b/drivers/net/tulip/de2104x.c
index d7fb3ff..d16a5a0 100644
--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -1376,18 +1376,20 @@ static int de_open (struct net_device *d
return rc;
}

- rc = de_init_hw(de);
- if (rc) {
- printk(KERN_ERR "%s: h/w init failure, err=%d\n",
- dev->name, rc);
- goto err_out_free;
- }
+ dw32(IntrMask, 0);

rc = request_irq(dev->irq, de_interrupt, SA_SHIRQ, dev->name, dev);
if (rc) {
printk(KERN_ERR "%s: IRQ %d request failure, err=%d\n",
dev->name, dev->irq, rc);
- goto err_out_hw;
+ goto err_out_free;
+ }
+
+ rc = de_init_hw(de);
+ if (rc) {
+ printk(KERN_ERR "%s: h/w init failure, err=%d\n",
+ dev->name, rc);
+ goto err_out_free_irq;
}

netif_start_queue(dev);
@@ -1395,11 +1397,8 @@ static int de_open (struct net_device *d

return 0;

-err_out_hw:
- spin_lock_irqsave(&de->lock, flags);
- de_stop_hw(de);
- spin_unlock_irqrestore(&de->lock, flags);
-
+err_out_free_irq:
+ free_irq(dev->irq, dev);
err_out_free:
de_free_rings(de);
return rc;

2006-03-06 13:02:54

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered


On Sun, 5 Mar 2006, Martin Michlmayr wrote:

> We have three independent reports about problems with de2104x involving
> interrupts. Alan Stern suggested that it "sure looks as though the
> ethernet interface is generating an interrupt request before the
> de2104x driver has registered its interrupt handler".
>
> The three reports are:
> - de2104x does not work (non-fatal oops) when uhci_hcd is loaded
> first. http://lkml.org/lkml/2006/2/3/402 The problem does not
> occur under 2.4 with the tulip module, so this is a regression.
> - fatal de2104x interrupt oops (without uhci_hcd).
> http://lkml.org/lkml/2006/2/5/64
> - "kernel panic after the first transmission attempt times out"
> Regression from 2.4. http://bugs.debian.org/288821
>
> I have a system on which I can reproduce this bug 100%. While I have
> no idea how to fix the issue, I can provide debugging information and
> test a fix. However, I'm (temporarily) leaving the country in three
> weeks and won't have access to this PC for several months, so it would
> be great if someone could look into this soon. Jeff?
>
>
> 1)
> eth0: enabling interface
> eth0: set link 10baseT auto
> eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
> eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
> irq 10: nobody cared (try booting with the "irqpoll" option)
> [<c012f89e>] __report_bad_irq+0x31/0x73
> [<c012f96d>] note_interrupt+0x75/0x98
> [<c012f46a>] __do_IRQ+0x67/0x91
> [<c0104fc1>] do_IRQ+0x19/0x24
> [<c0103afa>] common_interrupt+0x1a/0x20
> [<c0119a1c>] __do_softirq+0x2c/0x7d
> [<c0119a8f>] do_softirq+0x22/0x26
> [<c0104fc6>] do_IRQ+0x1e/0x24
> [<c0103afa>] common_interrupt+0x1a/0x20
> [<c481da07>] de_set_rx_mode+0xf/0x12 [de2104x]
> [<c481e2c1>] de_init_hw+0x6d/0x76 [de2104x]
> [<c481e59e>] de_open+0x64/0xe4 [de2104x]
> [<c0225a5f>] dev_open+0x30/0x66
> [<c0226a9a>] dev_change_flags+0x4d/0xf0
> [<c025d301>] devinet_ioctl+0x224/0x4bd
> [<c0155541>] do_ioctl+0x21/0x50
> [<c0155774>] vfs_ioctl+0x152/0x161
> [<c01557cb>] sys_ioctl+0x48/0x65
> [<c0102a99>] syscall_call+0x7/0xb
> handlers:
> [<c4890d97>] (usb_hcd_irq+0x0/0x56 [usbcore])
> Disabling IRQ #10
>
> 3)
> eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
> eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
> [__report_bad_irq+42/144] __report_bad_irq+0x2a/0x90
> [note_interrupt+108/160] note_interrupt+0x6c/0xa0
> [do_IRQ+289/304] do_IRQ+0x121/0x130
> [common_interrupt+24/32] common_interrupt+0x18/0x20
> [__do_softirq+48/128] __do_softirq+0x30/0x80
> [acpi_irq+0/22] acpi_irq+0x0/0x16
> [do_softirq+38/48] do_softirq+0x26/0x30
> [do_IRQ+253/304] do_IRQ+0xfd/0x130
> [common_interrupt+24/32] common_interrupt+0x18/0x20
> [__crc_do_softirq+25311/208152] de_set_rx_mode+0x26/0x50 [de2104x]
> [__crc_do_softirq+28277/208152] de_init_hw+0x8c/0x90 [de2104x]
> [__crc_do_softirq+29105/208152] de_open+0x68/0x140 [de2104x]
> [profile_hook+45/75] profile_hook+0x2d/0x4b
> [dev_open+203/256] dev_open+0xcb/0x100
> [dev_mc_upload+36/80] dev_mc_upload+0x24/0x50
> [dev_change_flags+81/288] dev_change_flags+0x51/0x120
> [devinet_ioctl+582/1424] devinet_ioctl+0x246/0x590
> [inet_ioctl+94/160] inet_ioctl+0x5e/0xa0
> [sock_ioctl+249/688] sock_ioctl+0xf9/0x2b0
> [sys_ioctl+269/656] sys_ioctl+0x10d/0x290
> [syscall_call+7/11] syscall_call+0x7/0xb
> eth0: link up, media 10baseT auto
>
> --
> Martin Michlmayr
> http://www.cyrius.com/
> -

This started to happen in a lot of PCI drivers once it became
necessary to call pci_enable_device() in order to make the
returned IRQ values valid. This has been reported numerious
times and has not been fixed. Basically, in order to get
the correct value, one needs to disable the board in some
unspecified way so it is not possible for it to generate
an interrupt before enabling the board. With some devices
this may not be possible!

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.47 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-03-06 14:35:28

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Francois Romieu <[email protected]> [2006-03-05 19:59]:
> > I have a system on which I can reproduce this bug 100%. While I have
> > no idea how to fix the issue, I can provide debugging information and
> > test a fix.

> (not compile-tested)

Thanks a lot for your quick response, Francois. I can confirm that
this patch fixes the problem for me.

> -err_out_hw:
> - spin_lock_irqsave(&de->lock, flags);
> - de_stop_hw(de);
> - spin_unlock_irqrestore(&de->lock, flags);

flags is no longer used now, so we get a compilation warning. Updated
patch below. Francois, can you please submit it with a proper
changelog entry and your Signed-off-by.


From: Francois Romieu <[email protected]>
Signed-off-by: Martin Michlmayr <[email protected]>

--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -1362,7 +1362,6 @@ static int de_open (struct net_device *d
{
struct de_private *de = dev->priv;
int rc;
- unsigned long flags;

if (netif_msg_ifup(de))
printk(KERN_DEBUG "%s: enabling interface\n", dev->name);
@@ -1376,18 +1375,20 @@ static int de_open (struct net_device *d
return rc;
}

- rc = de_init_hw(de);
- if (rc) {
- printk(KERN_ERR "%s: h/w init failure, err=%d\n",
- dev->name, rc);
- goto err_out_free;
- }
+ dw32(IntrMask, 0);

rc = request_irq(dev->irq, de_interrupt, SA_SHIRQ, dev->name, dev);
if (rc) {
printk(KERN_ERR "%s: IRQ %d request failure, err=%d\n",
dev->name, dev->irq, rc);
- goto err_out_hw;
+ goto err_out_free;
+ }
+
+ rc = de_init_hw(de);
+ if (rc) {
+ printk(KERN_ERR "%s: h/w init failure, err=%d\n",
+ dev->name, rc);
+ goto err_out_free_irq;
}

netif_start_queue(dev);
@@ -1395,11 +1396,8 @@ static int de_open (struct net_device *d

return 0;

-err_out_hw:
- spin_lock_irqsave(&de->lock, flags);
- de_stop_hw(de);
- spin_unlock_irqrestore(&de->lock, flags);
-
+err_out_free_irq:
+ free_irq(dev->irq, dev);
err_out_free:
de_free_rings(de);
return rc;

--
Martin Michlmayr
http://www.cyrius.com/

2006-03-06 19:52:01

by Francois Romieu

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

Martin Michlmayr <[email protected]> :
[...]
> There's another interrupt related bug in the driver, though. I
> sometimes get a kernel panic when rsycing several 100 megs of data
> across the LAN. A picture showing the call trace can be found at
> http://www.cyrius.com/tmp/de2104x_panic.jpg

Can you publish the .config ?

--
Ueimor

2006-03-06 19:21:06

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Martin Michlmayr <[email protected]> [2006-03-06 14:35]:
> Thanks a lot for your quick response, Francois. I can confirm that
> this patch fixes the problem for me.

There's another interrupt related bug in the driver, though. I
sometimes get a kernel panic when rsycing several 100 megs of data
across the LAN. A picture showing the call trace can be found at
http://www.cyrius.com/tmp/de2104x_panic.jpg

--
Martin Michlmayr
http://www.cyrius.com/

2006-03-06 20:00:25

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Francois Romieu <[email protected]> [2006-03-06 20:48]:
> > There's another interrupt related bug in the driver, though. I
> > sometimes get a kernel panic when rsycing several 100 megs of data
> > across the LAN. A picture showing the call trace can be found at
> > http://www.cyrius.com/tmp/de2104x_panic.jpg
> Can you publish the .config ?

http://www.cyrius.com/tmp/config-2.6.16-rc5-486

By the way, I'm getting the following messages in dmesg:

eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb032
eth0: tx err, status 0x7fffb002

--
Martin Michlmayr
http://www.cyrius.com/

2006-03-06 20:23:58

by Francois Romieu

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

Martin Michlmayr <[email protected]> :
[...]
> http://www.cyrius.com/tmp/config-2.6.16-rc5-486
>
> By the way, I'm getting the following messages in dmesg:

netconsole appears enabled. Do you use it ?

--
Ueimor

2006-03-06 20:29:21

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Francois Romieu <[email protected]> [2006-03-06 21:23]:
> > http://www.cyrius.com/tmp/config-2.6.16-rc5-486
> > By the way, I'm getting the following messages in dmesg:
> netconsole appears enabled. Do you use it ?

It's a standard Debian kernel config so pretty much everything is
enabled as a module. I didn't use netconsole.
--
Martin Michlmayr
http://www.cyrius.com/

2006-03-06 20:55:58

by Francois Romieu

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

Martin Michlmayr <[email protected]> :
[...]
> By the way, I'm getting the following messages in dmesg:
>
> eth0: tx err, status 0x7fffb002

Tx underrun.

Is there anything which could induce a noticeable load on the PCI bus ?

--
Ueimor

2006-03-06 21:19:58

by Francois Romieu

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

Martin Michlmayr <[email protected]> :
[...]
> There's another interrupt related bug in the driver, though. I
> sometimes get a kernel panic when rsycing several 100 megs of data
> across the LAN. A picture showing the call trace can be found at
> http://www.cyrius.com/tmp/de2104x_panic.jpg

Not sure about this one, but...

Signed-off-by: Francois Romieu <[email protected]>

diff --git a/drivers/net/tulip/de2104x.c b/drivers/net/tulip/de2104x.c
index d7fb3ff..49235e2 100644
--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -1455,6 +1455,8 @@ static void de_tx_timeout (struct net_de
synchronize_irq(dev->irq);
de_clean_rings(de);

+ de_init_rings(de);
+
de_init_hw(de);

netif_wake_queue(dev);

2006-03-07 01:16:48

by Robert Hancock

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

linux-os (Dick Johnson) wrote:
> This started to happen in a lot of PCI drivers once it became
> necessary to call pci_enable_device() in order to make the
> returned IRQ values valid. This has been reported numerious
> times and has not been fixed. Basically, in order to get
> the correct value, one needs to disable the board in some
> unspecified way so it is not possible for it to generate
> an interrupt before enabling the board. With some devices
> this may not be possible!

What kind of board behaves that way? pci_enable_device just enables the
device BARs and wakes it up if it was suspended, I should think that any
device that starts generating interrupts from that must be quite broken..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-03-07 05:12:09

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Francois Romieu <[email protected]> [2006-03-06 22:17]:
> Not sure about this one, but...

It seems to help. It's hard to say for sure because I don't have a
foolproof way to reproduce this panic. It _usually_ occurs after
copying a few hundred MB but there's no clear trigger. I've now copied
a few GB around using a kernel with your patch and it hasn't crashed.
--
Martin Michlmayr
http://www.cyrius.com/

2006-03-07 05:16:35

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Francois Romieu <[email protected]> [2006-03-06 21:54]:
> > By the way, I'm getting the following messages in dmesg:
> > eth0: tx err, status 0x7fffb002
> Tx underrun.
> Is there anything which could induce a noticeable load on the PCI bus ?

I was going to say "no" because I was simply copying some data via the
network. However, it seems the situation is a bit more complicated
than this. It seems that I only get these underruns using a specific
hard drive. You see, the reason I'm rsyncing hundred of megabytes of
data across my LAN is because my laptop hard drive is dying, so I put
it in a PC as secondary master using an adapter. Interestingly
enough, I don't get any Tx underruns when using a different disk.
Which is strange because at the moment the disk is working fine (it
sort of started dying but seems to behave right now), so I don't know
why it would change anything. Maybe this makes sense to someone.

By the way, I only get underruns when I rsync from the PC to another
machine - not when I rsync from the other machine to the PC.
--
Martin Michlmayr
http://www.cyrius.com/

2006-03-07 12:07:54

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered


On Mon, 6 Mar 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> This started to happen in a lot of PCI drivers once it became
>> necessary to call pci_enable_device() in order to make the
>> returned IRQ values valid. This has been reported numerious
>> times and has not been fixed. Basically, in order to get
>> the correct value, one needs to disable the board in some
>> unspecified way so it is not possible for it to generate
>> an interrupt before enabling the board. With some devices
>> this may not be possible!
>
> What kind of board behaves that way? pci_enable_device just enables the
> device BARs and wakes it up if it was suspended, I should think that any
> device that starts generating interrupts from that must be quite broken..
>
> --
> Robert Hancock Saskatoon, SK, Canada
> To email, remove "nospam" from [email protected]
> Home Page: http://www.roberthancock.com/

No. It would be good if that was true. Unfortunately, the IRQ
returned before the pci_enable_device() is not correct. It
gets re-written by pci_enable_device().


Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-03-07 13:58:39

by Robert Hancock

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

linux-os (Dick Johnson) wrote:
> No. It would be good if that was true. Unfortunately, the IRQ
> returned before the pci_enable_device() is not correct. It
> gets re-written by pci_enable_device().

That wasn't what I meant, yes, that is true in the current kernel.
However, any device which would start generating interrupts just because
its BARs got enabled by pci_enable_device seems broken.

The driver needs to request the interrupt after the device is enabled,
and only after that can it enable the device to generate interrupts.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-03-07 14:21:21

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered


On Tue, 7 Mar 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> No. It would be good if that was true. Unfortunately, the IRQ
>> returned before the pci_enable_device() is not correct. It
>> gets re-written by pci_enable_device().
>
> That wasn't what I meant, yes, that is true in the current kernel.
> However, any device which would start generating interrupts just because
> its BARs got enabled by pci_enable_device seems broken.

Thinking that a device powers ON in a stable state is naive. Many
complex devices will have FPGA devices with floating pins that don't
become stable until their contents are loaded serially. Others will
have IRQ requests based upon power-on states that need to be cleared
with a software reset. One can't issue a software reset until the
device is enabled and enabling the device may generate interrupts
with no handler in place so you have a "can't get there from here"
problem. Linux-2.4.x had IRQs that were stable. One could put
a handler in place that would handle the possible burst of interrupts
upon startup. Then this was changed so the IRQ value is wrong
until an unrelated and illogical event occurs. Now, you need to
make work-arounds that were never before necessary. My request
to fix this fell upon deaf ears.

>
> The driver needs to request the interrupt after the device is enabled,
> and only after that can it enable the device to generate interrupts.
>
> --
> Robert Hancock Saskatoon, SK, Canada
> To email, remove "nospam" from [email protected]
> Home Page: http://www.roberthancock.com/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-03-07 14:57:56

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Martin Michlmayr <[email protected]> [2006-03-07 05:11]:
> * Francois Romieu <[email protected]> [2006-03-06 22:17]:
> > Not sure about this one, but...
>
> It seems to help. It's hard to say for sure because I don't have a
> foolproof way to reproduce this panic. It _usually_ occurs after
> copying a few hundred MB but there's no clear trigger. I've now copied
> a few GB around using a kernel with your patch and it hasn't crashed.

I'm pretty sure now that your patch helps. I left the system running
overnight and it was still alive in the morning after transferring ~10
GB. I do get all kind of underrun messages (see below) but the data
got transferred alright. I then rebooted with the kernel that doesn't
have your patch and it crashed after ~1 GB.


(this was at about 3 GB, but the same goes on and on; but the network
works.)

eth0 Link encap:Ethernet HWaddr 00:80:C8:33:4F:96
inet addr:192.168.1.145 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::280:c8ff:fe33:4f96/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1199533 errors:7 dropped:0 overruns:7 frame:0
TX packets:2344296 errors:396 dropped:252 overruns:396 carrier:0
collisions:0 txqueuelen:1000
RX bytes:64846004 (61.8 MiB) TX bytes:3479989567 (3.2 GiB)
Interrupt:10 Base address:0x2000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:560 (560.0 b) TX bytes:560 (560.0 b)


Adding 136512k swap on /dev/hda5. Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: [email protected]
eth0: enabling interface
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb022
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb032
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb02a
eth0: tx err, status 0x7fffb01a
eth0: tx err, status 0x7fffb02a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 15/37/38
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 54 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 16/6/7
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 60 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 41/47/48
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 32 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 55 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 43 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 2 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 6 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 17/63/0
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002

--
Martin Michlmayr
http://www.cyrius.com/

2006-03-07 15:16:44

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Martin Michlmayr <[email protected]> [2006-03-06 19:17]:
> There's another interrupt related bug in the driver, though. I

There's yet another bug (or two).

I just got another kernel panic:
http://www.cyrius.com/tmp/de2104x_panic2.jpg (which I haven't been
able to reproduce so far; this was without your latest patch applied,
btw). This happened when I was doing DHCP while my server was not
responding to DHCP. I wonder if it's related to another issue I've
observed.

This card is a D-Link DE 530 with both a BNC and RJ-45 connector.
When I boot my machine without having the Ethernet cable plugged in,
Linux thinks there's a BNC connection. When I plug in the cable, the
link light on the card goes on but Linux doesn't seem to notice - in
fact, when I then start DHCP again, the link light goes off again and
Linux talks about BNC being up... [FWIW, Linux 2.4 doesn't handle this
situation either. Under 2.4 the link light doesn't even come up.]


dmesg: booting without the RJ-45 cable plugged in, doing DHCP, then
plugging the RJ-45 cable in and doing DHCP again:

hda: 4999680 sectors (2559 MB) w/256KiB Cache, CHS=4960/16/63, UDMA(33)
hda: hda1 hda2 < hda5 hda6 >
ACPI: PCI Interrupt 0000:00:0b.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10
de0: SROM leaf offset 30, default media 10baseT auto
de0: media block #0: 10baseT-FD
de0: media block #1: BNC
de0: media block #2: 10baseT-HD
eth0: 21041 at 0xb8802000, 00:80:c8:33:4f:96, IRQ 10
Probing IDE interface ide1...
Attempting manual resume
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Real Time Clock Driver v1.12ac
input: PC Speaker as /class/input/input1
FDC 0 is a post-1991 82077
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Adding 136512k swap on /dev/hda5. Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: [email protected]
eth0: enabling interface
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: set link BNC
eth0: mode 0x7ffc0000, sia 0x10c4,0xffffef09,0xfffff7fd,0xffff0006
eth0: set mode 0x7ffc0000, set sia 0xef09,0xf7fd,0x6
eth0: link up, media BNC
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
eth0: disabling interface
eth0: timeout expired stopping DMA
ACPI: PCI interrupt for device 0000:00:0b.0 disabled
eth0: enabling interface
eth0: set link BNC
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef09,0xfffff7fd,0xffff0006
eth0: set mode 0x7ffc0040, set sia 0xef09,0xf7fd,0x6
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up, media BNC
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present


As a comparison, this happens when I boot with the RJ-45 cable plugged
in:

ACPI: PCI Interrupt 0000:00:0b.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10
de0: SROM leaf offset 30, default media 10baseT auto
de0: media block #0: 10baseT-FD
de0: media block #1: BNC
de0: media block #2: 10baseT-HD
eth0: 21041 at 0xb8802000, 00:80:c8:33:4f:96, IRQ 10
Probing IDE interface ide1...
Attempting manual resume
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Real Time Clock Driver v1.12ac
input: PC Speaker as /class/input/input1
FDC 0 is a post-1991 82077
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Adding 136512k swap on /dev/hda5. Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: [email protected]
eth0: enabling interface
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present

--
Martin Michlmayr
http://www.cyrius.com/

2006-03-07 17:51:42

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

On Tuesday 07 March 2006 07:21, linux-os (Dick Johnson) wrote:
> Thinking that a device powers ON in a stable state is naive. Many
> complex devices will have FPGA devices with floating pins that don't
> become stable until their contents are loaded serially. Others will
> have IRQ requests based upon power-on states that need to be cleared
> with a software reset. One can't issue a software reset until the
> device is enabled and enabling the device may generate interrupts
> with no handler in place so you have a "can't get there from here"
> problem.

Maybe you could handle this with a PCI quirk that runs before
pci_enable_device(). IIRC, we considered exposing a separate
interface for PCI IRQ allocation and routing, but decided it
wasn't worth the complexity since so few devices would need it.

> Linux-2.4.x had IRQs that were stable. One could put
> a handler in place that would handle the possible burst of interrupts
> upon startup. Then this was changed so the IRQ value is wrong
> until an unrelated and illogical event occurs.

There are good reasons to wait to allocate the IRQ until you have
a driver that cares about the device. I'm sorry that this broke
your specific case.

Bjorn

2006-03-07 18:17:42

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered


On Tue, 7 Mar 2006, Bjorn Helgaas wrote:

> On Tuesday 07 March 2006 07:21, linux-os (Dick Johnson) wrote:
>> Thinking that a device powers ON in a stable state is naive. Many
>> complex devices will have FPGA devices with floating pins that don't
>> become stable until their contents are loaded serially. Others will
>> have IRQ requests based upon power-on states that need to be cleared
>> with a software reset. One can't issue a software reset until the
>> device is enabled and enabling the device may generate interrupts
>> with no handler in place so you have a "can't get there from here"
>> problem.
>
> Maybe you could handle this with a PCI quirk that runs before
> pci_enable_device(). IIRC, we considered exposing a separate
> interface for PCI IRQ allocation and routing, but decided it
> wasn't worth the complexity since so few devices would need it.
>

The problem is that I can't write device internal registers to
put the device into a stable state without enabling the device.
So, the "fix" (read hack) was to mask off all possible interrupts
in the ioapic, call pci_enable_device(), initialize the device,
clear any pending hardware interrupts on the device, then reenable
the ioapic interrupts. I couldn't just use a spin-lock because
somebody complains and the machine panics.

>> Linux-2.4.x had IRQs that were stable. One could put
>> a handler in place that would handle the possible burst of interrupts
>> upon startup. Then this was changed so the IRQ value is wrong
>> until an unrelated and illogical event occurs.
>
> There are good reasons to wait to allocate the IRQ until you have
> a driver that cares about the device. I'm sorry that this broke
> your specific case.
>
> Bjorn

There are now other "standard" boards that seem to be experiencing
the same problem. Maybe it is time to make a procedure that turns
off interrupts for a specific device (not an unknown IRQ). Then
a subsequent call turns them on after the handler is in place.
This wouldn't affect current drivers. They would still turn on
hot by default.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-03-08 00:01:23

by Robert Hancock

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

linux-os (Dick Johnson) wrote:
> Thinking that a device powers ON in a stable state is naive.

I don't think so.. if you build a device that connects to the PCI bus it
had better come up in a stable state if it wants to be compliant with
the spec. That's what the reset line and power-up reset interval is for.

> Many
> complex devices will have FPGA devices with floating pins that don't
> become stable until their contents are loaded serially. Others will
> have IRQ requests based upon power-on states that need to be cleared
> with a software reset. One can't issue a software reset until the
> device is enabled and enabling the device may generate interrupts
> with no handler in place so you have a "can't get there from here"
> problem.

You still aren't seeing my point. Why does enabling the device BARs
cause the device to generate interrupts? And if there's something you
need to do to prevent the device from generating interrupts, how can you
do it without enabling the device?

Also, the device's ISR must clear the condition which is causing the
interrupt, otherwise interrupt storms will result. If your device can
enter a state where the interrupt cannot be reliably cleared, how can
you possibly comply with this?

> Linux-2.4.x had IRQs that were stable. One could put
> a handler in place that would handle the possible burst of interrupts
> upon startup. Then this was changed so the IRQ value is wrong
> until an unrelated and illogical event occurs. Now, you need to
> make work-arounds that were never before necessary. My request
> to fix this fell upon deaf ears.

I don't think any workarounds are needed except for devices that don't
comply with the spec. Asserting interrupts that have not been
specifically enabled by the driver would meet that definition in my
view. If a device happens to do this then maybe a workaround would be
needed, but that's what it would be, a workaround for a broken device.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-03-08 00:05:09

by Robert Hancock

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

linux-os (Dick Johnson) wrote:
> There are now other "standard" boards that seem to be experiencing
> the same problem. Maybe it is time to make a procedure that turns
> off interrupts for a specific device (not an unknown IRQ). Then
> a subsequent call turns them on after the handler is in place.
> This wouldn't affect current drivers. They would still turn on
> hot by default.

How do you propose to do this? There's no way to mask interrupts from
just one device which is sharing an IRQ line, you have to mask
interrupts from all of those devices. That would be quite ugly IMHO if
one device could disable the interrupt used by another device for
however long it felt like.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-03-08 00:20:06

by Francois Romieu

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

Martin Michlmayr <[email protected]> :
[...]
> It seems to help. It's hard to say for sure because I don't have a
> foolproof way to reproduce this panic. It _usually_ occurs after
> copying a few hundred MB but there's no clear trigger. I've now copied
> a few GB around using a kernel with your patch and it hasn't crashed.

netdev watchdog events appear in the dmesg of the patched driver.
The driver survived it. So I'd say that the patch does its job.

OTOH, if you ever saw the unpatched driver survive this event, yell now.

--
Ueimor

2006-03-08 03:22:34

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Francois Romieu <[email protected]> [2006-03-08 01:15]:
> netdev watchdog events appear in the dmesg of the patched driver.
> The driver survived it. So I'd say that the patch does its job.
>
> OTOH, if you ever saw the unpatched driver survive this event, yell
> now.

No, I've never seen the unpatched driver survive.
--
Martin Michlmayr
http://www.cyrius.com/

2006-03-08 08:18:50

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

On 3/7/06, Bjorn Helgaas <[email protected]> wrote:
> On Tuesday 07 March 2006 07:21, linux-os (Dick Johnson) wrote:
> Maybe you could handle this with a PCI quirk that runs before
> pci_enable_device(). IIRC, we considered exposing a separate
> interface for PCI IRQ allocation and routing, but decided it
> wasn't worth the complexity since so few devices would need it.
>
> > Linux-2.4.x had IRQs that were stable. One could put
> > a handler in place that would handle the possible burst of interrupts
> > upon startup. Then this was changed so the IRQ value is wrong
> > until an unrelated and illogical event occurs.
>
> There are good reasons to wait to allocate the IRQ until you have
> a driver that cares about the device. I'm sorry that this broke
> your specific case.

FWIW, I'd be interested in following up on something like this in
another thread because e100 appears to have (at least in one
reporter's dual e100 machine) a similar "hardware problem" where a
shared interrupt line gets asserted too early and the kernel prints a
Nobody Cared message.

So we have a new way of doing things that exposes more broken
hardware, shouldn't we provide a way for that hardware to continue
working?

http://bugzilla.kernel.org/show_bug.cgi?id=5918

Jesse

2006-03-08 12:03:34

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered


On Tue, 7 Mar 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> Thinking that a device powers ON in a stable state is naive.
>
> I don't think so.. if you build a device that connects to the PCI bus it
> had better come up in a stable state if it wants to be compliant with
> the spec. That's what the reset line and power-up reset interval is for.
>
>> Many
>> complex devices will have FPGA devices with floating pins that don't
>> become stable until their contents are loaded serially. Others will
>> have IRQ requests based upon power-on states that need to be cleared
>> with a software reset. One can't issue a software reset until the
>> device is enabled and enabling the device may generate interrupts
>> with no handler in place so you have a "can't get there from here"
>> problem.
>
> You still aren't seeing my point. Why does enabling the device BARs
> cause the device to generate interrupts? And if there's something you
> need to do to prevent the device from generating interrupts, how can you
> do it without enabling the device?
>
> Also, the device's ISR must clear the condition which is causing the
> interrupt, otherwise interrupt storms will result. If your device can
> enter a state where the interrupt cannot be reliably cleared, how can
> you possibly comply with this?

You don't bother to read. The reported interrupt is WRONG, INVALID,
INCORRECT, BROKEN, until __after__ the device is enabled. That means
that one CANNOT put an interrupt handler in place before the
device is enabled.

It's the Linux code that was broken when 2.6.x started. Previous
Linux code never failed to report the correct IRQ.


>
>> Linux-2.4.x had IRQs that were stable. One could put
>> a handler in place that would handle the possible burst of interrupts
>> upon startup. Then this was changed so the IRQ value is wrong
>> until an unrelated and illogical event occurs. Now, you need to
>> make work-arounds that were never before necessary. My request
>> to fix this fell upon deaf ears.
>
> I don't think any workarounds are needed except for devices that don't
> comply with the spec. Asserting interrupts that have not been
> specifically enabled by the driver would meet that definition in my
> view. If a device happens to do this then maybe a workaround would be
> needed, but that's what it would be, a workaround for a broken device.
>
> --
> Robert Hancock Saskatoon, SK, Canada
> To email, remove "nospam" from [email protected]
> Home Page: http://www.roberthancock.com/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-03-08 16:06:09

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

On Wednesday 08 March 2006 01:18, Jesse Brandeburg wrote:
> On 3/7/06, Bjorn Helgaas <[email protected]> wrote:
> > On Tuesday 07 March 2006 07:21, linux-os (Dick Johnson) wrote:
> > Maybe you could handle this with a PCI quirk that runs before
> > pci_enable_device(). IIRC, we considered exposing a separate
> > interface for PCI IRQ allocation and routing, but decided it
> > wasn't worth the complexity since so few devices would need it.
> >
> > > Linux-2.4.x had IRQs that were stable. One could put
> > > a handler in place that would handle the possible burst of interrupts
> > > upon startup. Then this was changed so the IRQ value is wrong
> > > until an unrelated and illogical event occurs.
> >
> > There are good reasons to wait to allocate the IRQ until you have
> > a driver that cares about the device. I'm sorry that this broke
> > your specific case.
>
> FWIW, I'd be interested in following up on something like this in
> another thread because e100 appears to have (at least in one
> reporter's dual e100 machine) a similar "hardware problem" where a
> shared interrupt line gets asserted too early and the kernel prints a
> Nobody Cared message.
>
> So we have a new way of doing things that exposes more broken
> hardware, shouldn't we provide a way for that hardware to continue
> working?

Booting with "pci=routeirq" gives the previous behavior.

It would be interesting to know whether that makes a difference
in the e100 issue you mention.

2006-03-08 19:35:34

by Martin Michlmayr

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

* Bjorn Helgaas <[email protected]> [2006-03-08 09:05]:
> Booting with "pci=routeirq" gives the previous behavior.
>
> It would be interesting to know whether that makes a difference
> in the e100 issue you mention.

FWIW, I'm pretty sure I tried this with de2104x and it didn't help.
I'm not positive though, but I could test again if people are
interested in the result.
--
Martin Michlmayr
[email protected]

2006-03-08 23:35:52

by Robert Hancock

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

linux-os (Dick Johnson) wrote:
> You don't bother to read. The reported interrupt is WRONG, INVALID,
> INCORRECT, BROKEN, until __after__ the device is enabled. That means
> that one CANNOT put an interrupt handler in place before the
> device is enabled.

And my point is, even if you COULD put an interrupt handler into place
before enabling the device, if the device can be in an unstable state
such that the interrupt can't be acknowledged reliably, how can you
handle it without causing an interrupt storm?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]


2006-03-09 00:02:33

by Robert Hancock

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered

Jesse Brandeburg wrote:
> FWIW, I'd be interested in following up on something like this in
> another thread because e100 appears to have (at least in one
> reporter's dual e100 machine) a similar "hardware problem" where a
> shared interrupt line gets asserted too early and the kernel prints a
> Nobody Cared message.
>
> So we have a new way of doing things that exposes more broken
> hardware, shouldn't we provide a way for that hardware to continue
> working?

I'm not sure this is at all related to the case we're talking about - it
doesn't matter whether the request_irq or pci_enable_device comes first
as the device is pulling on the interrupt line before the driver is even
loaded. To fix that I'd think you'd need some kind of PCI quirk that
would shut off the interrupt on the e100 card before any devices request
the interrupt that it is sharing.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]

2006-03-09 12:42:15

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: de2104x: interrupts before interrupt handler is registered


On Wed, 8 Mar 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> You don't bother to read. The reported interrupt is WRONG, INVALID,
>> INCORRECT, BROKEN, until __after__ the device is enabled. That means
>> that one CANNOT put an interrupt handler in place before the
>> device is enabled.
>
> And my point is, even if you COULD put an interrupt handler into place
> before enabling the device, if the device can be in an unstable state
> such that the interrupt can't be acknowledged reliably, how can you
> handle it without causing an interrupt storm?
>

Easy. Mask off the interrupts in the device. Software should
certainly "know" if the device has been initialized to a stable
state. Until it has been initialized, the ISR will simply
clear and mask the device.

> --
> Robert Hancock Saskatoon, SK, Canada
> To email, remove "nospam" from [email protected]
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.