2009-04-07 16:36:39

by Michal Schmidt

[permalink] [raw]
Subject: [PATCH] skge: fix occasional BUG during MTU change

The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
was sometimes observed when setting MTU.

skge_down() disables the TX queue, but then reenables it by mistake via
skge_tx_clean().
Fix it by moving the waking of the queue from skge_tx_clean() to the
other caller. And to make sure start_xmit is not in progress on another
CPU, skge_down() should call netif_tx_disable().

The bug was reported to me by Jiri Jilek whose Debian system sometimes
failed to boot. He tested the patch and the bug did not happen anymore.

Signed-off-by: Michal Schmidt <[email protected]>
---
drivers/net/skge.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 952d37f..b2a05af 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -2674,7 +2674,7 @@ static int skge_down(struct net_device *dev)
if (netif_msg_ifdown(skge))
printk(KERN_INFO PFX "%s: disabling interface\n", dev->name);

- netif_stop_queue(dev);
+ netif_tx_disable(dev);

if (hw->chip_id == CHIP_ID_GENESIS && hw->phy_type == SK_PHY_XMAC)
del_timer_sync(&skge->link_timer);
@@ -2881,7 +2881,6 @@ static void skge_tx_clean(struct net_device *dev)
}

skge->tx_ring.to_clean = e;
- netif_wake_queue(dev);
}

static void skge_tx_timeout(struct net_device *dev)
@@ -2893,6 +2892,7 @@ static void skge_tx_timeout(struct net_device *dev)

skge_write8(skge->hw, Q_ADDR(txqaddr[skge->port], Q_CSR), CSR_STOP);
skge_tx_clean(dev);
+ netif_wake_queue(dev);
}

static int skge_change_mtu(struct net_device *dev, int new_mtu)
--
1.6.2.2


2009-04-08 23:02:43

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] skge: fix occasional BUG during MTU change

From: Michal Schmidt <[email protected]>
Date: Tue, 7 Apr 2009 18:36:23 +0200

> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
>
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
>
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
>
> Signed-off-by: Michal Schmidt <[email protected]>

Stephen, an ACK possibly?

2009-04-08 23:07:32

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] skge: fix occasional BUG during MTU change

On Wed, 08 Apr 2009 16:01:52 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: Michal Schmidt <[email protected]>
> Date: Tue, 7 Apr 2009 18:36:23 +0200
>
> > The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> > was sometimes observed when setting MTU.
> >
> > skge_down() disables the TX queue, but then reenables it by mistake via
> > skge_tx_clean().
> > Fix it by moving the waking of the queue from skge_tx_clean() to the
> > other caller. And to make sure start_xmit is not in progress on another
> > CPU, skge_down() should call netif_tx_disable().
> >
> > The bug was reported to me by Jiri Jilek whose Debian system sometimes
> > failed to boot. He tested the patch and the bug did not happen anymore.
> >
> > Signed-off-by: Michal Schmidt <[email protected]>
>
> Stephen, an ACK possibly?

I wanted to test on real hardware, and am offsite this week.

2009-04-08 23:09:49

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] skge: fix occasional BUG during MTU change

From: Stephen Hemminger <[email protected]>
Date: Wed, 8 Apr 2009 16:06:21 -0700

> On Wed, 08 Apr 2009 16:01:52 -0700 (PDT)
> David Miller <[email protected]> wrote:
>
>> From: Michal Schmidt <[email protected]>
>> Date: Tue, 7 Apr 2009 18:36:23 +0200
>>
>> > The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
>> > was sometimes observed when setting MTU.
>> >
>> > skge_down() disables the TX queue, but then reenables it by mistake via
>> > skge_tx_clean().
>> > Fix it by moving the waking of the queue from skge_tx_clean() to the
>> > other caller. And to make sure start_xmit is not in progress on another
>> > CPU, skge_down() should call netif_tx_disable().
>> >
>> > The bug was reported to me by Jiri Jilek whose Debian system sometimes
>> > failed to boot. He tested the patch and the bug did not happen anymore.
>> >
>> > Signed-off-by: Michal Schmidt <[email protected]>
>>
>> Stephen, an ACK possibly?
>
> I wanted to test on real hardware, and am offsite this week.

Ok, I'll wait for that, thanks!

2009-04-10 05:01:50

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] skge: fix occasional BUG during MTU change

On Tue, 7 Apr 2009 18:36:23 +0200 Michal Schmidt <[email protected]> wrote:

> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
>
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
>
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.

It's conventional to add the reporter's "Reported-by:" tag to the
changelog in this situation.

> Signed-off-by: Michal Schmidt <[email protected]>

As the bug is present in 2.6.29 (and possibly earlier?) it's
appropriate to add a Cc: <[email protected]> too. This makes davem go
mad at you, but I prefer getting madded at over possibly losing bugfixes ;)

>
> diff --git a/drivers/net/skge.c b/drivers/net/skge.c
> index 952d37f..b2a05af 100644
> --- a/drivers/net/skge.c
> +++ b/drivers/net/skge.c
> @@ -2674,7 +2674,7 @@ static int skge_down(struct net_device *dev)
> if (netif_msg_ifdown(skge))
> printk(KERN_INFO PFX "%s: disabling interface\n", dev->name);
>
> - netif_stop_queue(dev);
> + netif_tx_disable(dev);
>
> if (hw->chip_id == CHIP_ID_GENESIS && hw->phy_type == SK_PHY_XMAC)
> del_timer_sync(&skge->link_timer);
> @@ -2881,7 +2881,6 @@ static void skge_tx_clean(struct net_device *dev)
> }
>
> skge->tx_ring.to_clean = e;
> - netif_wake_queue(dev);
> }
>
> static void skge_tx_timeout(struct net_device *dev)
> @@ -2893,6 +2892,7 @@ static void skge_tx_timeout(struct net_device *dev)
>
> skge_write8(skge->hw, Q_ADDR(txqaddr[skge->port], Q_CSR), CSR_STOP);
> skge_tx_clean(dev);
> + netif_wake_queue(dev);
> }
>
> static int skge_change_mtu(struct net_device *dev, int new_mtu)

2009-04-13 23:23:50

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] skge: fix occasional BUG during MTU change

From: Michal Schmidt <[email protected]>
Date: Tue, 7 Apr 2009 18:36:23 +0200

> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
>
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
>
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
>
> Signed-off-by: Michal Schmidt <[email protected]>

Stephen have you had a chance to test this yet?

Thanks.

2009-04-14 17:56:17

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] skge: fix occasional BUG during MTU change

On Tue, 7 Apr 2009 18:36:23 +0200
Michal Schmidt <[email protected]> wrote:

> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
>
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
>
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
>
> Signed-off-by: Michal Schmidt <[email protected]>
> ---
> drivers/net/skge.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)

Tested fine. This should go to stable as well.

Acked-by: Stephen Hemminger <[email protected]>

2009-04-14 22:17:31

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] skge: fix occasional BUG during MTU change

From: Stephen Hemminger <[email protected]>
Date: Tue, 14 Apr 2009 10:55:39 -0700

> On Tue, 7 Apr 2009 18:36:23 +0200
> Michal Schmidt <[email protected]> wrote:
>
>> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
>> was sometimes observed when setting MTU.
>>
>> skge_down() disables the TX queue, but then reenables it by mistake via
>> skge_tx_clean().
>> Fix it by moving the waking of the queue from skge_tx_clean() to the
>> other caller. And to make sure start_xmit is not in progress on another
>> CPU, skge_down() should call netif_tx_disable().
>>
>> The bug was reported to me by Jiri Jilek whose Debian system sometimes
>> failed to boot. He tested the patch and the bug did not happen anymore.
>>
>> Signed-off-by: Michal Schmidt <[email protected]>
>> ---
>> drivers/net/skge.c | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> Tested fine. This should go to stable as well.
>
> Acked-by: Stephen Hemminger <[email protected]>

Applied, thanks everyone.