The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
was sometimes observed when setting MTU.
skge_down() disables the TX queue, but then reenables it by mistake via
skge_tx_clean().
Fix it by moving the waking of the queue from skge_tx_clean() to the
other caller. And to make sure start_xmit is not in progress on another
CPU, skge_down() should call netif_tx_disable().
The bug was reported to me by Jiri Jilek whose Debian system sometimes
failed to boot. He tested the patch and the bug did not happen anymore.
Signed-off-by: Michal Schmidt <[email protected]>
---
drivers/net/skge.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 952d37f..b2a05af 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -2674,7 +2674,7 @@ static int skge_down(struct net_device *dev)
if (netif_msg_ifdown(skge))
printk(KERN_INFO PFX "%s: disabling interface\n", dev->name);
- netif_stop_queue(dev);
+ netif_tx_disable(dev);
if (hw->chip_id == CHIP_ID_GENESIS && hw->phy_type == SK_PHY_XMAC)
del_timer_sync(&skge->link_timer);
@@ -2881,7 +2881,6 @@ static void skge_tx_clean(struct net_device *dev)
}
skge->tx_ring.to_clean = e;
- netif_wake_queue(dev);
}
static void skge_tx_timeout(struct net_device *dev)
@@ -2893,6 +2892,7 @@ static void skge_tx_timeout(struct net_device *dev)
skge_write8(skge->hw, Q_ADDR(txqaddr[skge->port], Q_CSR), CSR_STOP);
skge_tx_clean(dev);
+ netif_wake_queue(dev);
}
static int skge_change_mtu(struct net_device *dev, int new_mtu)
--
1.6.2.2
From: Michal Schmidt <[email protected]>
Date: Tue, 7 Apr 2009 18:36:23 +0200
> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
>
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
>
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
>
> Signed-off-by: Michal Schmidt <[email protected]>
Stephen, an ACK possibly?
On Wed, 08 Apr 2009 16:01:52 -0700 (PDT)
David Miller <[email protected]> wrote:
> From: Michal Schmidt <[email protected]>
> Date: Tue, 7 Apr 2009 18:36:23 +0200
>
> > The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> > was sometimes observed when setting MTU.
> >
> > skge_down() disables the TX queue, but then reenables it by mistake via
> > skge_tx_clean().
> > Fix it by moving the waking of the queue from skge_tx_clean() to the
> > other caller. And to make sure start_xmit is not in progress on another
> > CPU, skge_down() should call netif_tx_disable().
> >
> > The bug was reported to me by Jiri Jilek whose Debian system sometimes
> > failed to boot. He tested the patch and the bug did not happen anymore.
> >
> > Signed-off-by: Michal Schmidt <[email protected]>
>
> Stephen, an ACK possibly?
I wanted to test on real hardware, and am offsite this week.
From: Stephen Hemminger <[email protected]>
Date: Wed, 8 Apr 2009 16:06:21 -0700
> On Wed, 08 Apr 2009 16:01:52 -0700 (PDT)
> David Miller <[email protected]> wrote:
>
>> From: Michal Schmidt <[email protected]>
>> Date: Tue, 7 Apr 2009 18:36:23 +0200
>>
>> > The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
>> > was sometimes observed when setting MTU.
>> >
>> > skge_down() disables the TX queue, but then reenables it by mistake via
>> > skge_tx_clean().
>> > Fix it by moving the waking of the queue from skge_tx_clean() to the
>> > other caller. And to make sure start_xmit is not in progress on another
>> > CPU, skge_down() should call netif_tx_disable().
>> >
>> > The bug was reported to me by Jiri Jilek whose Debian system sometimes
>> > failed to boot. He tested the patch and the bug did not happen anymore.
>> >
>> > Signed-off-by: Michal Schmidt <[email protected]>
>>
>> Stephen, an ACK possibly?
>
> I wanted to test on real hardware, and am offsite this week.
Ok, I'll wait for that, thanks!
On Tue, 7 Apr 2009 18:36:23 +0200 Michal Schmidt <[email protected]> wrote:
> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
>
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
>
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
It's conventional to add the reporter's "Reported-by:" tag to the
changelog in this situation.
> Signed-off-by: Michal Schmidt <[email protected]>
As the bug is present in 2.6.29 (and possibly earlier?) it's
appropriate to add a Cc: <[email protected]> too. This makes davem go
mad at you, but I prefer getting madded at over possibly losing bugfixes ;)
>
> diff --git a/drivers/net/skge.c b/drivers/net/skge.c
> index 952d37f..b2a05af 100644
> --- a/drivers/net/skge.c
> +++ b/drivers/net/skge.c
> @@ -2674,7 +2674,7 @@ static int skge_down(struct net_device *dev)
> if (netif_msg_ifdown(skge))
> printk(KERN_INFO PFX "%s: disabling interface\n", dev->name);
>
> - netif_stop_queue(dev);
> + netif_tx_disable(dev);
>
> if (hw->chip_id == CHIP_ID_GENESIS && hw->phy_type == SK_PHY_XMAC)
> del_timer_sync(&skge->link_timer);
> @@ -2881,7 +2881,6 @@ static void skge_tx_clean(struct net_device *dev)
> }
>
> skge->tx_ring.to_clean = e;
> - netif_wake_queue(dev);
> }
>
> static void skge_tx_timeout(struct net_device *dev)
> @@ -2893,6 +2892,7 @@ static void skge_tx_timeout(struct net_device *dev)
>
> skge_write8(skge->hw, Q_ADDR(txqaddr[skge->port], Q_CSR), CSR_STOP);
> skge_tx_clean(dev);
> + netif_wake_queue(dev);
> }
>
> static int skge_change_mtu(struct net_device *dev, int new_mtu)
From: Michal Schmidt <[email protected]>
Date: Tue, 7 Apr 2009 18:36:23 +0200
> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
>
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
>
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
>
> Signed-off-by: Michal Schmidt <[email protected]>
Stephen have you had a chance to test this yet?
Thanks.
On Tue, 7 Apr 2009 18:36:23 +0200
Michal Schmidt <[email protected]> wrote:
> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
>
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
>
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
>
> Signed-off-by: Michal Schmidt <[email protected]>
> ---
> drivers/net/skge.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
Tested fine. This should go to stable as well.
Acked-by: Stephen Hemminger <[email protected]>
From: Stephen Hemminger <[email protected]>
Date: Tue, 14 Apr 2009 10:55:39 -0700
> On Tue, 7 Apr 2009 18:36:23 +0200
> Michal Schmidt <[email protected]> wrote:
>
>> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
>> was sometimes observed when setting MTU.
>>
>> skge_down() disables the TX queue, but then reenables it by mistake via
>> skge_tx_clean().
>> Fix it by moving the waking of the queue from skge_tx_clean() to the
>> other caller. And to make sure start_xmit is not in progress on another
>> CPU, skge_down() should call netif_tx_disable().
>>
>> The bug was reported to me by Jiri Jilek whose Debian system sometimes
>> failed to boot. He tested the patch and the bug did not happen anymore.
>>
>> Signed-off-by: Michal Schmidt <[email protected]>
>> ---
>> drivers/net/skge.c | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> Tested fine. This should go to stable as well.
>
> Acked-by: Stephen Hemminger <[email protected]>
Applied, thanks everyone.