2002-08-20 23:47:13

by Troy Wilson

[permalink] [raw]
Subject: mdelay causes BUG, please use udelay

Hi Jeff,

We probably shouldn't be doing this, but we can at least avoid the BUG
caused by doing an mdelay in interrupt context if we change to udelay.

Thanks,

- Troy


diff -urN linux-2.5.31/drivers/net/e1000/e1000_hw.c linux-2.5.31.udelay/drivers/net/e1000/e1000_hw.c
--- linux-2.5.31/drivers/net/e1000/e1000_hw.c Sat Aug 10 20:41:28 2002
+++ linux-2.5.31.udelay/drivers/net/e1000/e1000_hw.c Tue Aug 20 18:12:20 2002
@@ -134,7 +134,7 @@
/* Delay to allow any outstanding PCI transactions to complete before
* resetting the device
*/
- msec_delay(10);
+ usec_delay(10000);

/* Issue a global reset to the MAC. This will reset the chip's
* transmit, receive, DMA, and link units. It will not effect


2002-08-21 00:16:12

by Feldman, Scott

[permalink] [raw]
Subject: RE: mdelay causes BUG, please use udelay

> We probably shouldn't be doing this, but we can at least
> avoid the BUG caused by doing an mdelay in interrupt context
> if we change to udelay.

That's BUG is my fault. I put that in there so no one would tempted to call
msec_delay in the interrupt context. As you've discovered, I missed one
case where msec_delay is called in interrupt context: tx_timeout. Dangit!

> - msec_delay(10);
> + usec_delay(10000);

Jeff, 10000 seems on the border of what's OK. If it's acceptable, then
let's go for that. Otherwise, we're going to have to chain several
mod_timer callbacks together to do a controller reset.

-scott

2002-08-21 00:23:49

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: mdelay causes BUG, please use udelay

On Tue, Aug 20, 2002 at 05:20:02PM -0700, Feldman, Scott wrote:
> > - msec_delay(10);
> > + usec_delay(10000);
>
> Jeff, 10000 seems on the border of what's OK. If it's acceptable, then
> let's go for that. Otherwise, we're going to have to chain several
> mod_timer callbacks together to do a controller reset.

10000 is beyond okay. You've just lost multiple timer ticks.

-ben
--
"You will be reincarnated as a toad; and you will be much happier."

2002-08-21 00:24:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: mdelay causes BUG, please use udelay

Followup to: <[email protected]>
By author: "Feldman, Scott" <[email protected]>
In newsgroup: linux.dev.kernel
>
> > - msec_delay(10);
> > + usec_delay(10000);
>
> Jeff, 10000 seems on the border of what's OK. If it's acceptable, then
> let's go for that. Otherwise, we're going to have to chain several
> mod_timer callbacks together to do a controller reset.
>

10 ms in an interrupt context? Surely you're joking...

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-08-21 00:24:30

by Alan

[permalink] [raw]
Subject: RE: mdelay causes BUG, please use udelay

On Wed, 2002-08-21 at 01:20, Feldman, Scott wrote:
> > + usec_delay(10000);
>
> Jeff, 10000 seems on the border of what's OK. If it's acceptable, then
> let's go for that. Otherwise, we're going to have to chain several
> mod_timer callbacks together to do a controller reset.

For some telco and embedded apps 10000 in an IRQ is borderline. One day
the timer stuff will be needed - how hard is it to fix right first time
?

2002-08-21 00:56:38

by Jeff Garzik

[permalink] [raw]
Subject: Re: mdelay causes BUG, please use udelay

Troy Wilson wrote:
> - msec_delay(10);
> + usec_delay(10000);


not the right fix :)

2002-08-21 00:55:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: mdelay causes BUG, please use udelay

Feldman, Scott wrote:
>>- msec_delay(10);
>>+ usec_delay(10000);
>
>
> Jeff, 10000 seems on the border of what's OK. If it's acceptable, then
> let's go for that. Otherwise, we're going to have to chain several
> mod_timer callbacks together to do a controller reset.


That definitely wants fixing. Since I like doing resets and similar
slow-paths in process context -- sleep for as long as you want -- I
would say kick over to a function called via schedule_task()

Just make sure other parts of the driver that may be called
asynchronously, such as ethtool ioctls, are disabled. Remember that
tx_timeout holds the dev->xmit_lock as well, so spending a long time in
there is a bad idea in general.

I would probably call netif_carrier_off() first thing in tx_timeout, too.

Jeff



2002-08-21 00:55:54

by Feldman, Scott

[permalink] [raw]
Subject: RE: mdelay causes BUG, please use udelay


> > Jeff, 10000 seems on the border of what's OK. If it's acceptable,
> > then let's go for that. Otherwise, we're going to have to chain
> > several mod_timer callbacks together to do a controller reset.
>
> For some telco and embedded apps 10000 in an IRQ is
> borderline. One day the timer stuff will be needed - how hard
> is it to fix right first time ?

Ok, ok, 10000 is bad, even when reseting the part, no problem. It's not to
hard to fix the right way; I'll work on a patch to give to Jeff.

-scott

2002-08-21 16:02:24

by Martin J. Bligh

[permalink] [raw]
Subject: RE: mdelay causes BUG, please use udelay

>> - msec_delay(10);
>> + usec_delay(10000);
>
> Jeff, 10000 seems on the border of what's OK. If it's acceptable,
> then let's go for that. Otherwise, we're going to have to chain
> several mod_timer callbacks together to do a controller reset.

Whilst this sort of delay in interrupt context is undoubtedly bad
any way we do it, I'd question the context a little more before we
make a decision. This is called from e1000_reset_hw - are we likely
to ever actually call this except under initialisation? If we just
do it once on system boot, I'd say evil hacks (like this) are
acceptable. If we're going to do this under load, it definitely
needs fixing.

FWIW, this is heavily tested under Apache webserver load on a maxed
out 8 CPU system with at least 4 (8?) gigabit ethernet cards. Whilst
undoubtedly ugly, it's better than what we have now, so might I
suggest that we do this for now until a real fix is forthcoming if
we decide it's needed?

Martin.

2002-08-21 16:42:12

by Chris Friesen

[permalink] [raw]
Subject: Re: mdelay causes BUG, please use udelay

"Martin J. Bligh" wrote:

> Whilst this sort of delay in interrupt context is undoubtedly bad
> any way we do it, I'd question the context a little more before we
> make a decision. This is called from e1000_reset_hw - are we likely
> to ever actually call this except under initialisation?

I currently work on an embedded device and if we detect given network connection isn't working at
all our first response is to switch to a working connection, then we reload the device driver for
the non-working one. Since we may be doing other things at the same time, having this stall the
machine for extended periods of time is definately not a good thing.

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-08-21 16:55:10

by Martin J. Bligh

[permalink] [raw]
Subject: Re: mdelay causes BUG, please use udelay

> I currently work on an embedded device and if we detect given
> network connection isn't working at all our first response is
> to switch to a working connection, then we reload the device
> driver for the non-working one. Since we may be doing other
> things at the same time, having this stall the machine for
> extended periods of time is definately not a good thing.

That's great, IF you have a spare drop out to that net.

M.

2002-08-21 17:18:28

by Dave Hansen

[permalink] [raw]
Subject: Re: mdelay causes BUG, please use udelay

Martin J. Bligh wrote:
>>>- msec_delay(10);
>>>+ usec_delay(10000);
>>
>>Jeff, 10000 seems on the border of what's OK. If it's acceptable,
>>then let's go for that. Otherwise, we're going to have to chain
>>several mod_timer callbacks together to do a controller reset.
>
> Whilst this sort of delay in interrupt context is undoubtedly bad
> any way we do it, I'd question the context a little more before we
> make a decision. This is called from e1000_reset_hw - are we likely
> to ever actually call this except under initialisation?

It doesn't happen often, or under good circumstances. In certain
cases, the driver detects that something timed out and it assumes
something on the card to be dead. Instead of delaying the
reinitialization of the dead card with a timer, they just do it during
the interrupt where the problem was detected.


--
Dave Hansen
[email protected]