2008-07-21 16:20:27

by David Miller

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at net/core/dev.c:1328!

From: "Alessandro Guido" <[email protected]>
Date: Mon, 21 Jul 2008 18:14:28 +0200

linux-wireless and netdev added to CC:

> this is what I get when try to det up a wireless connection with
> current git (v2.6.26-5253-g14b395e):
>
>
> kernel BUG at net/core/dev.c:1328!
> invalid opcode: 0000 [#1] PREEMPT
> Modules linked in: ipw2200
>
> Pid: 5, comm: events/0 Not tainted (2.6.26-05253-g14b395e #1)
> EIP: 0060:[<c02e58ca>] EFLAGS: 00010246 CPU: 0
> EIP is at __netif_schedule+0x3a/0x40
> EAX: c043d380 EBX: eed84b40 ECX: c043d380 EDX: ffffffff
> ESI: eed84b44 EDI: ef808140 EBP: ef82bf64 ESP: ef82bf60
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> Process events/0 (pid: 5, ti=ef82b000 task=ef82cd40 task.ti=ef82b000)
> Stack: eed84b40 ef82bf80 f0939850 ef808140 ef82bf78 c02ef6c2 eed8584c ef808140
> ef82bfa8 c012b02f ef82ce9c ee24fc00 ef82ce9c ffff007b f09396f0 ef808148
> ef808140 ef82bfb0 ef82bfd0 c012b53f 00000000 ef82cd40 c012eaf0 ef82bfbc
> Call Trace:
> [<f0939850>] ? ipw_bg_link_up+0x160/0x1f0 [ipw2200]
> [<c02ef6c2>] ? rtnl_unlock+0x12/0x20
> [<c012b02f>] ? run_workqueue+0x6f/0x160
> [<f09396f0>] ? ipw_bg_link_up+0x0/0x1f0 [ipw2200]
> [<c012b53f>] ? worker_thread+0x7f/0xe0
> [<c012eaf0>] ? autoremove_wake_function+0x0/0x50
> [<c012b4c0>] ? worker_thread+0x0/0xe0
> [<c012e67a>] ? kthread+0x3a/0x70
> [<c012e640>] ? kthread+0x0/0x70
> [<c0103c8f>] ? kernel_thread_helper+0x7/0x18
> =======================
> Code: 24 0f ba 28 01 19 d2 85 d2 75 1d 9c 5b fa a1 c0 11 4a c0 89 41
> 3c b8 02 00 00 00 89 0d c0 11 4a c0 e8 4b ab e3 ff 53 9d 5b 5d c3 <0f>
> 0b eb fe 66 90 55 89 e5 53 89 c3 8d 40 2c 0f ba 28 01 19 d2
> EIP: [<c02e58ca>] __netif_schedule+0x3a/0x40 SS:ESP 0068:ef82bf60
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2008-07-24 06:19:08

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: [PATCH] skge: resolve tx multiqueue bug

On Wed, Jul 23, 2008 at 03:30:00PM -0700, David Miller wrote:
> From: Wang Chen <[email protected]>
> Date: Wed, 23 Jul 2008 23:21:14 +0800
>
> > Markus Trippelsdorf said the following on 2008-7-23 22:03:
> > > On Wed, Jul 23, 2008 at 12:18:27PM +0200, Markus Trippelsdorf wrote:
> > >> On Wed, Jul 23, 2008 at 04:50:13PM +0800, Wang Chen wrote:
> > >>> Markus Trippelsdorf said the following on 2008-7-23 13:40:
> > >>>> On Tue, Jul 22, 2008 at 11:54:26AM +0200, Alessandro Guido wrote:
> > >>>>> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.
> > >>>> Same thing here (latest git):
> > >>>>
> > >>>> skge eth1: enabling interface
> > >>>> skge eth1: disabling interface
> > >>>> ------------[ cut here ]------------
> > >>>> WARNING: at net/core/dev.c:1344 __netif_schedule+0x24/0x6d()
> > >>>> Pid: 1904, comm: ip Not tainted 2.6.26-06077-gc010b2f #33
> > >>>> [<ffffffff8020b3eb>] system_call_after_swapgs+0x7b/0x80
> > >> ...
> > >>>> ---[ end trace 92936ef183e09876 ]---
> > >>>> skge eth1: enabling interface
> > >>>> skge eth1: Link is up at 100 Mbps, full duplex, flow control both
> > >>>>
> > >>> Markus, please try this.
> > >>>
> > >>> - Add netif_start_queue() in ->open()
> > >>> - netif_carrier_*() is enough, remove netif_*_queue()
> > >> Unfortunately, your patch does not fix this. I still get the same warning.
> > >>
> > >
> > > This patch works for me:
> >
> > Your patch works for me too. So I think it's better than mine. :)
> >
> > Tested-by: Wang Chen <[email protected]>
> >
> > Dave, since Markus and me tested this patch, would you please apply it?
>
> I can't, it's whitespace damaged. All the tabs are turned into spaces,
> also there is no signoff from Markus.

Sorry, here is a correct version:

- Add netif_start_queue() in ->open()
- netif_carrier_*() is enough, remove netif_*_queue()

Signed-off-by: Markus Trippelsdorf <[email protected]>
Tested-by: Wang Chen <[email protected]>

--
Markus


Attachments:
(No filename) (1.95 kB)
patch_skge.patch (1.58 kB)
Download all attachments

2008-07-23 22:30:00

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] skge: resolve tx multiqueue bug

From: Wang Chen <[email protected]>
Date: Wed, 23 Jul 2008 23:21:14 +0800

> Markus Trippelsdorf said the following on 2008-7-23 22:03:
> > On Wed, Jul 23, 2008 at 12:18:27PM +0200, Markus Trippelsdorf wrote:
> >> On Wed, Jul 23, 2008 at 04:50:13PM +0800, Wang Chen wrote:
> >>> Markus Trippelsdorf said the following on 2008-7-23 13:40:
> >>>> On Tue, Jul 22, 2008 at 11:54:26AM +0200, Alessandro Guido wrote:
> >>>>> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.
> >>>> Same thing here (latest git):
> >>>>
> >>>> skge eth1: enabling interface
> >>>> skge eth1: disabling interface
> >>>> ------------[ cut here ]------------
> >>>> WARNING: at net/core/dev.c:1344 __netif_schedule+0x24/0x6d()
> >>>> Pid: 1904, comm: ip Not tainted 2.6.26-06077-gc010b2f #33
> >>>> [<ffffffff8020b3eb>] system_call_after_swapgs+0x7b/0x80
> >> ...
> >>>> ---[ end trace 92936ef183e09876 ]---
> >>>> skge eth1: enabling interface
> >>>> skge eth1: Link is up at 100 Mbps, full duplex, flow control both
> >>>>
> >>> Markus, please try this.
> >>>
> >>> - Add netif_start_queue() in ->open()
> >>> - netif_carrier_*() is enough, remove netif_*_queue()
> >> Unfortunately, your patch does not fix this. I still get the same warning.
> >>
> >
> > This patch works for me:
>
> Your patch works for me too. So I think it's better than mine. :)
>
> Tested-by: Wang Chen <[email protected]>
>
> Dave, since Markus and me tested this patch, would you please apply it?

I can't, it's whitespace damaged. All the tabs are turned into spaces,
also there is no signoff from Markus.


2008-07-23 13:16:06

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at net/core/dev.c:1328!

On Wed, Jul 23, 2008 at 9:57 AM, Alessandro Guido
<[email protected]> wrote:
> I'm testing the latest Linus' git which includes your patch
> (2.6.26-06077-gc010b2f) and the problem is fixed.
>
> Thank you!

-git10 also has this fixed for me. Thanks,

--alessandro

"Give me love / Or give me hate
Give me anything that's not just ok"

(Sophia, 'Weightless')

2008-07-24 01:03:08

by Wang Chen

[permalink] [raw]
Subject: Re: [PATCH] skge: resolve tx multiqueue bug

David Miller said the following on 2008-7-24 6:30:
> From: Wang Chen <[email protected]>
> Date: Wed, 23 Jul 2008 23:21:14 +0800
>
>> Markus Trippelsdorf said the following on 2008-7-23 22:03:
>>> On Wed, Jul 23, 2008 at 12:18:27PM +0200, Markus Trippelsdorf wrote:
>>>> On Wed, Jul 23, 2008 at 04:50:13PM +0800, Wang Chen wrote:
>>>>> Markus Trippelsdorf said the following on 2008-7-23 13:40:
>>>>>> On Tue, Jul 22, 2008 at 11:54:26AM +0200, Alessandro Guido wrote:
>>>>>>> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.
>>>>>> Same thing here (latest git):
>>>>>>
>>>>>> skge eth1: enabling interface
>>>>>> skge eth1: disabling interface
>>>>>> ------------[ cut here ]------------
>>>>>> WARNING: at net/core/dev.c:1344 __netif_schedule+0x24/0x6d()
>>>>>> Pid: 1904, comm: ip Not tainted 2.6.26-06077-gc010b2f #33
>>>>>> [<ffffffff8020b3eb>] system_call_after_swapgs+0x7b/0x80
>>>> ...
>>>>>> ---[ end trace 92936ef183e09876 ]---
>>>>>> skge eth1: enabling interface
>>>>>> skge eth1: Link is up at 100 Mbps, full duplex, flow control both
>>>>>>
>>>>> Markus, please try this.
>>>>>
>>>>> - Add netif_start_queue() in ->open()
>>>>> - netif_carrier_*() is enough, remove netif_*_queue()
>>>> Unfortunately, your patch does not fix this. I still get the same warning.
>>>>
>>> This patch works for me:
>> Your patch works for me too. So I think it's better than mine. :)
>>
>> Tested-by: Wang Chen <[email protected]>
>>
>> Dave, since Markus and me tested this patch, would you please apply it?
>
> I can't, it's whitespace damaged. All the tabs are turned into spaces,
> also there is no signoff from Markus.
>

OK. So let it be, since you removed the warning from __netif_schedule().
I will wait for your netif_tx_{free,unfreeze}() :)


2008-07-23 14:03:55

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: [PATCH] skge: resolve tx multiqueue bug

On Wed, Jul 23, 2008 at 12:18:27PM +0200, Markus Trippelsdorf wrote:
> On Wed, Jul 23, 2008 at 04:50:13PM +0800, Wang Chen wrote:
> > Markus Trippelsdorf said the following on 2008-7-23 13:40:
> > > On Tue, Jul 22, 2008 at 11:54:26AM +0200, Alessandro Guido wrote:
> > >> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.
> > >
> > > Same thing here (latest git):
> > >
> > > skge eth1: enabling interface
> > > skge eth1: disabling interface
> > > ------------[ cut here ]------------
> > > WARNING: at net/core/dev.c:1344 __netif_schedule+0x24/0x6d()
> > > Pid: 1904, comm: ip Not tainted 2.6.26-06077-gc010b2f #33
> > > [<ffffffff8020b3eb>] system_call_after_swapgs+0x7b/0x80
> ...
> > > ---[ end trace 92936ef183e09876 ]---
> > > skge eth1: enabling interface
> > > skge eth1: Link is up at 100 Mbps, full duplex, flow control both
> > >
> >
> > Markus, please try this.
> >
> > - Add netif_start_queue() in ->open()
> > - netif_carrier_*() is enough, remove netif_*_queue()
>
> Unfortunately, your patch does not fix this. I still get the same warning.
>

This patch works for me:

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 2e26dce..d761296 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -1069,7 +1069,6 @@ static void skge_link_up(struct skge_port *skge)
LED_BLK_OFF|LED_SYNC_OFF|LED_ON);

netif_carrier_on(skge->netdev);
- netif_wake_queue(skge->netdev);

if (netif_msg_link(skge)) {
printk(KERN_INFO PFX
@@ -1084,7 +1083,6 @@ static void skge_link_down(struct skge_port *skge)
{
skge_write8(skge->hw, SK_REG(skge->port, LNK_LED_REG), LED_OFF);
netif_carrier_off(skge->netdev);
- netif_stop_queue(skge->netdev);

if (netif_msg_link(skge))
printk(KERN_INFO PFX "%s: Link is down.\n", skge->netdev->name);
@@ -2450,7 +2448,6 @@ static void skge_phy_reset(struct skge_port *skge)
int port = skge->port;
struct net_device *dev = hw->dev[port];

- netif_stop_queue(skge->netdev);
netif_carrier_off(skge->netdev);

spin_lock_bh(&hw->phy_lock);
@@ -2640,6 +2637,7 @@ static int skge_up(struct net_device *dev)
spin_unlock_irq(&hw->hw_lock);

napi_enable(&skge->napi);
+ netif_start_queue(dev);
return 0;

free_rx_ring:
@@ -2673,8 +2671,6 @@ static int skge_down(struct net_device *dev)
if (netif_msg_ifdown(skge))
printk(KERN_INFO PFX "%s: disabling interface\n", dev->name);

- netif_stop_queue(dev);
-
if (hw->chip_id == CHIP_ID_GENESIS && hw->phy_type == SK_PHY_XMAC)
del_timer_sync(&skge->link_timer);

@@ -3863,7 +3859,6 @@ static struct net_device *skge_devinit(struct skge_hw *hw, int port,

/* device is off until link detection */
netif_carrier_off(dev);
- netif_stop_queue(dev);

return dev;
}


--
Markus

2008-07-22 22:56:11

by David Miller

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at net/core/dev.c:1328!

From: "Alessandro Guido" <[email protected]>
Date: Tue, 22 Jul 2008 11:54:26 +0200

> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.

Please try this patch:

ipw2200: Call netif_*_queue() interfaces properly.

netif_carrier_{on,off}() handles starting and stopping packet
flow into the driver. So there is no reason to invoke netif_stop_queue()
and netif_wake_queue() in response to link status events.

Signed-off-by: David S. Miller <[email protected]>

diff --git a/drivers/net/wireless/ipw2200.c b/drivers/net/wireless/ipw2200.c
index 6e70460..1acfbcd 100644
--- a/drivers/net/wireless/ipw2200.c
+++ b/drivers/net/wireless/ipw2200.c
@@ -4972,8 +4972,7 @@ static int ipw_queue_tx_reclaim(struct ipw_priv *priv,
}
done:
if ((ipw_tx_queue_space(q) > q->low_mark) &&
- (qindex >= 0) &&
- (priv->status & STATUS_ASSOCIATED) && netif_running(priv->net_dev))
+ (qindex >= 0))
netif_wake_queue(priv->net_dev);
used = q->first_empty - q->last_used;
if (used < 0)
@@ -10154,14 +10153,8 @@ static void init_sys_config(struct ipw_sys_config *sys_config)

static int ipw_net_open(struct net_device *dev)
{
- struct ipw_priv *priv = ieee80211_priv(dev);
IPW_DEBUG_INFO("dev->open\n");
- /* we should be verifying the device is ready to be opened */
- mutex_lock(&priv->mutex);
- if (!(priv->status & STATUS_RF_KILL_MASK) &&
- (priv->status & STATUS_ASSOCIATED))
- netif_start_queue(dev);
- mutex_unlock(&priv->mutex);
+ netif_start_queue(dev);
return 0;
}

@@ -10481,13 +10474,6 @@ static int ipw_net_hard_start_xmit(struct ieee80211_txb *txb,
IPW_DEBUG_TX("dev->xmit(%d bytes)\n", txb->payload_size);
spin_lock_irqsave(&priv->lock, flags);

- if (!(priv->status & STATUS_ASSOCIATED)) {
- IPW_DEBUG_INFO("Tx attempt while not associated.\n");
- priv->ieee->stats.tx_carrier_errors++;
- netif_stop_queue(dev);
- goto fail_unlock;
- }
-
#ifdef CONFIG_IPW2200_PROMISCUOUS
if (rtap_iface && netif_running(priv->prom_net_dev))
ipw_handle_promiscuous_tx(priv, txb);
@@ -10499,10 +10485,6 @@ static int ipw_net_hard_start_xmit(struct ieee80211_txb *txb,
spin_unlock_irqrestore(&priv->lock, flags);

return ret;
-
- fail_unlock:
- spin_unlock_irqrestore(&priv->lock, flags);
- return 1;
}

static struct net_device_stats *ipw_net_get_stats(struct net_device *dev)
@@ -10703,13 +10685,6 @@ static void ipw_link_up(struct ipw_priv *priv)
priv->last_packet_time = 0;

netif_carrier_on(priv->net_dev);
- if (netif_queue_stopped(priv->net_dev)) {
- IPW_DEBUG_NOTIF("waking queue\n");
- netif_wake_queue(priv->net_dev);
- } else {
- IPW_DEBUG_NOTIF("starting queue\n");
- netif_start_queue(priv->net_dev);
- }

cancel_delayed_work(&priv->request_scan);
cancel_delayed_work(&priv->request_direct_scan);
@@ -10739,7 +10714,6 @@ static void ipw_link_down(struct ipw_priv *priv)
{
ipw_led_link_down(priv);
netif_carrier_off(priv->net_dev);
- netif_stop_queue(priv->net_dev);
notify_wx_assoc_event(priv);

/* Cancel any queued work ... */
@@ -11419,7 +11393,6 @@ static void ipw_down(struct ipw_priv *priv)
/* Clear all bits but the RF Kill */
priv->status &= STATUS_RF_KILL_MASK | STATUS_EXIT_PENDING;
netif_carrier_off(priv->net_dev);
- netif_stop_queue(priv->net_dev);

ipw_stop_nic(priv);

@@ -11522,7 +11495,6 @@ static int ipw_prom_open(struct net_device *dev)

IPW_DEBUG_INFO("prom dev->open\n");
netif_carrier_off(dev);
- netif_stop_queue(dev);

if (priv->ieee->iw_mode != IW_MODE_MONITOR) {
priv->sys_config.accept_all_data_frames = 1;
@@ -11558,7 +11530,6 @@ static int ipw_prom_stop(struct net_device *dev)
static int ipw_prom_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
IPW_DEBUG_INFO("prom dev->xmit\n");
- netif_stop_queue(dev);
return -EOPNOTSUPP;
}


2008-07-23 10:18:32

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: [PATCH] skge: resolve tx multiqueue bug

On Wed, Jul 23, 2008 at 04:50:13PM +0800, Wang Chen wrote:
> Markus Trippelsdorf said the following on 2008-7-23 13:40:
> > On Tue, Jul 22, 2008 at 11:54:26AM +0200, Alessandro Guido wrote:
> >> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.
> >
> > Same thing here (latest git):
> >
> > skge eth1: enabling interface
> > skge eth1: disabling interface
> > ------------[ cut here ]------------
> > WARNING: at net/core/dev.c:1344 __netif_schedule+0x24/0x6d()
> > Pid: 1904, comm: ip Not tainted 2.6.26-06077-gc010b2f #33
> > [<ffffffff8020b3eb>] system_call_after_swapgs+0x7b/0x80
...
> > ---[ end trace 92936ef183e09876 ]---
> > skge eth1: enabling interface
> > skge eth1: Link is up at 100 Mbps, full duplex, flow control both
> >
>
> Markus, please try this.
>
> - Add netif_start_queue() in ->open()
> - netif_carrier_*() is enough, remove netif_*_queue()

Unfortunately, your patch does not fix this. I still get the same warning.

--
Markus

2008-07-22 20:18:45

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at net/core/dev.c:1328!

On Tue, Jul 22, 2008 at 11:54 AM, Alessandro Guido
<[email protected]> wrote:
> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.
>
>
> WARNING: at net/core/dev.c:1330 __netif_schedule+0x53/0x60()
> Modules linked in:
> Pid: 5, comm: events/0 Not tainted 2.6.26-05752-g93ded9b #1
> [<c011bc64>] warn_on_slowpath+0x54/0xa0
> [<c0135483>] ? getnstimeofday+0x53/0x100
> [<c0132d1c>] ? ktime_get_ts+0x4c/0x50
> [<c012ba67>] ? insert_work+0x57/0x70
> [<c012c129>] ? queue_work+0x39/0x60
> [<c012c1e5>] ? queue_delayed_work+0x25/0x30
> [<c012c201>] ? schedule_delayed_work+0x11/0x20
> [<c030398a>] ? linkwatch_schedule_work+0x3a/0xb0
> [<c02f78d3>] __netif_schedule+0x53/0x60
> [<c0262a80>] ipw_bg_link_up+0x160/0x1f0
> [<c035ae4e>] ? schedule+0x1de/0x3f0
> [<c012bdef>] run_workqueue+0x6f/0x160
> [<c0262920>] ? ipw_bg_link_up+0x0/0x1f0
> [<c012c08f>] worker_thread+0x7f/0xe0
> [<c012f5d0>] ? autoremove_wake_function+0x0/0x50
> [<c012c010>] ? worker_thread+0x0/0xe0
> [<c012f15a>] kthread+0x3a/0x70
> [<c012f120>] ? kthread+0x0/0x70
> [<c0103d77>] kernel_thread_helper+0x7/0x10
> =======================

I also hit a very similar one on my Dell D610 with onboard
ipw2200 wireless (already kerneloops'd, BTW):

WARNING: at net/core/dev.c:1330 __netif_schedule+0x2a/0x63()
Modules linked in: fuse sunrpc iptable_filter ip_tables
ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand ppdev pcmcia
dcdbas parport_pc parport snd_intel8x0 snd_intel8x0m snd_ac97_codec
snd_seq_dummy ac97_bus snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss pcspkr snd_mixer_oss snd_pcm snd_timer snd
ipw2200 soundcore snd_page_alloc i2c_i801 ieee80211 ieee80211_crypt
yenta_socket rsrc_nonstatic pcmcia_core video output pata_acpi
ata_piix uhci_hcd ehci_hcd [last unloaded: microcode]
Pid: 5, comm: events/0 Not tainted 2.6.26-git9 #5
[<c0118cda>] warn_on_slowpath+0x46/0x80
[<c0113a91>] ? __enqueue_entity+0xe3/0xeb
[<c0113754>] ? __wake_up+0x31/0x3b
[<c0125ebb>] ? __queue_work+0x2d/0x32
[<c0125f05>] ? queue_work+0x2a/0x34
[<c01261ff>] ? queue_delayed_work+0x11/0x23
[<c0126227>] ? schedule_delayed_work+0x16/0x1b
[<c02ce1d5>] ? linkwatch_schedule_work+0x58/0x99
[<c02ce278>] ? linkwatch_fire_event+0x62/0x66
[<c02c567f>] __netif_schedule+0x2a/0x63
[<f8990009>] ipw_bg_link_up+0xae/0x173 [ipw2200]
[<c0125a0f>] run_workqueue+0x7d/0xf6
[<f898ff5b>] ? ipw_bg_link_up+0x0/0x173 [ipw2200]
[<c0125cec>] worker_thread+0xb7/0xc3
[<c0128862>] ? autoremove_wake_function+0x0/0x38
[<c0125c35>] ? worker_thread+0x0/0xc3
[<c01285d3>] kthread+0x3e/0x63
[<c0128595>] ? kthread+0x0/0x63
[<c0103003>] kernel_thread_helper+0x7/0x10

But it works - to the point that I'm connected to my wireless
router without apparent issues.

--alessandro

"Give me love / Or give me hate
Give me anything that's not just ok"

(Sophia, 'Weightless')

2008-07-23 15:22:47

by Wang Chen

[permalink] [raw]
Subject: Re: [PATCH] skge: resolve tx multiqueue bug

Markus Trippelsdorf said the following on 2008-7-23 22:03:
> On Wed, Jul 23, 2008 at 12:18:27PM +0200, Markus Trippelsdorf wrote:
>> On Wed, Jul 23, 2008 at 04:50:13PM +0800, Wang Chen wrote:
>>> Markus Trippelsdorf said the following on 2008-7-23 13:40:
>>>> On Tue, Jul 22, 2008 at 11:54:26AM +0200, Alessandro Guido wrote:
>>>>> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.
>>>> Same thing here (latest git):
>>>>
>>>> skge eth1: enabling interface
>>>> skge eth1: disabling interface
>>>> ------------[ cut here ]------------
>>>> WARNING: at net/core/dev.c:1344 __netif_schedule+0x24/0x6d()
>>>> Pid: 1904, comm: ip Not tainted 2.6.26-06077-gc010b2f #33
>>>> [<ffffffff8020b3eb>] system_call_after_swapgs+0x7b/0x80
>> ...
>>>> ---[ end trace 92936ef183e09876 ]---
>>>> skge eth1: enabling interface
>>>> skge eth1: Link is up at 100 Mbps, full duplex, flow control both
>>>>
>>> Markus, please try this.
>>>
>>> - Add netif_start_queue() in ->open()
>>> - netif_carrier_*() is enough, remove netif_*_queue()
>> Unfortunately, your patch does not fix this. I still get the same warning.
>>
>
> This patch works for me:

Your patch works for me too. So I think it's better than mine. :)

Tested-by: Wang Chen <[email protected]>

Dave, since Markus and me tested this patch, would you please apply it?




2008-07-23 08:51:55

by Wang Chen

[permalink] [raw]
Subject: [PATCH] skge: resolve tx multiqueue bug

Markus Trippelsdorf said the following on 2008-7-23 13:40:
> On Tue, Jul 22, 2008 at 11:54:26AM +0200, Alessandro Guido wrote:
>> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.
>
> Same thing here (latest git):
>
> skge eth1: enabling interface
> skge eth1: disabling interface
> ------------[ cut here ]------------
> WARNING: at net/core/dev.c:1344 __netif_schedule+0x24/0x6d()
> Pid: 1904, comm: ip Not tainted 2.6.26-06077-gc010b2f #33
>
> Call Trace:
> [<ffffffff8022e5f4>] warn_on_slowpath+0x4c/0x87
> [<ffffffff802fc5cb>] delay_tsc+0x0/0x55
> [<ffffffff802fc5cb>] delay_tsc+0x0/0x55
> [<ffffffff802fc5e7>] delay_tsc+0x1c/0x55
> [<ffffffff8037e2ee>] gm_phy_write+0x50/0x90
> [<ffffffff8037e2f8>] gm_phy_write+0x5a/0x90
> [<ffffffff8037f03f>] skge_led+0x1fb/0x20b
> [<ffffffff804202e8>] __netif_schedule+0x24/0x6d
> [<ffffffff80381298>] skge_down+0x440/0x4cf
> [<ffffffff803814be>] skge_change_mtu+0x3d/0x61
> [<ffffffff80420775>] dev_set_mtu+0x46/0x79
> [<ffffffff80427a51>] do_setlink+0x1c5/0x31f
> [<ffffffff804287d3>] rtnl_newlink+0x2b9/0x3e3
> [<ffffffff804285a8>] rtnl_newlink+0x8e/0x3e3
> [<ffffffff80428d69>] rtnetlink_rcv_msg+0x5a/0x1ea
> [<ffffffff80428d0f>] rtnetlink_rcv_msg+0x0/0x1ea
> [<ffffffff8042efde>] netlink_rcv_skb+0x34/0x7e
> [<ffffffff80428d09>] rtnetlink_rcv+0x1f/0x25
> [<ffffffff8042eb97>] netlink_unicast+0x119/0x17f
> [<ffffffff8041b35d>] __alloc_skb+0x61/0x123
> [<ffffffff8042ee37>] netlink_sendmsg+0x23a/0x24d
> [<ffffffff804152e4>] sock_sendmsg+0xcb/0xe3
> [<ffffffff8023f670>] autoremove_wake_function+0x0/0x2e
> [<ffffffff80225643>] need_resched+0x1e/0x28
> [<ffffffff804147d8>] move_addr_to_kernel+0x25/0x36
> [<ffffffff8041c090>] verify_iovec+0x46/0x82
> [<ffffffff80415513>] sys_sendmsg+0x217/0x28a
> [<ffffffff8042e01e>] netlink_insert+0xfe/0x121
> [<ffffffff80414b40>] sockfd_lookup_light+0x1a/0x52
> [<ffffffff80261fe5>] __vma_link+0x58/0x61
> [<ffffffff80262062>] vma_link+0x74/0x99
> [<ffffffff802630ce>] do_brk+0x2c1/0x319
> [<ffffffff8020b3eb>] system_call_after_swapgs+0x7b/0x80
>
> ---[ end trace 92936ef183e09876 ]---
> skge eth1: enabling interface
> skge eth1: Link is up at 100 Mbps, full duplex, flow control both
>

Markus, please try this.

- Add netif_start_queue() in ->open()
- netif_carrier_*() is enough, remove netif_*_queue()

Signed-off-by: Wang Chen <[email protected]>
---
diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 2e26dce..7507585 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -1069,7 +1069,6 @@ static void skge_link_up(struct skge_port *skge)
LED_BLK_OFF|LED_SYNC_OFF|LED_ON);

netif_carrier_on(skge->netdev);
- netif_wake_queue(skge->netdev);

if (netif_msg_link(skge)) {
printk(KERN_INFO PFX
@@ -1084,7 +1083,6 @@ static void skge_link_down(struct skge_port *skge)
{
skge_write8(skge->hw, SK_REG(skge->port, LNK_LED_REG), LED_OFF);
netif_carrier_off(skge->netdev);
- netif_stop_queue(skge->netdev);

if (netif_msg_link(skge))
printk(KERN_INFO PFX "%s: Link is down.\n", skge->netdev->name);
@@ -2450,7 +2448,6 @@ static void skge_phy_reset(struct skge_port *skge)
int port = skge->port;
struct net_device *dev = hw->dev[port];

- netif_stop_queue(skge->netdev);
netif_carrier_off(skge->netdev);

spin_lock_bh(&hw->phy_lock);
@@ -2640,6 +2637,7 @@ static int skge_up(struct net_device *dev)
spin_unlock_irq(&hw->hw_lock);

napi_enable(&skge->napi);
+ netif_start_queue(dev);
return 0;

free_rx_ring:
@@ -3863,7 +3861,6 @@ static struct net_device *skge_devinit(struct skge_hw *hw, int port,

/* device is off until link detection */
netif_carrier_off(dev);
- netif_stop_queue(dev);

return dev;
}


2008-07-23 07:57:25

by Alessandro Guido

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at net/core/dev.c:1328!

I'm testing the latest Linus' git which includes your patch
(2.6.26-06077-gc010b2f) and the problem is fixed.

Thank you!

2008-07-23 05:53:51

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: WARNING: at net/core/dev.c:1328!

On Tue, Jul 22, 2008 at 11:54:26AM +0200, Alessandro Guido wrote:
> Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.

Same thing here (latest git):

skge eth1: enabling interface
skge eth1: disabling interface
------------[ cut here ]------------
WARNING: at net/core/dev.c:1344 __netif_schedule+0x24/0x6d()
Pid: 1904, comm: ip Not tainted 2.6.26-06077-gc010b2f #33

Call Trace:
[<ffffffff8022e5f4>] warn_on_slowpath+0x4c/0x87
[<ffffffff802fc5cb>] delay_tsc+0x0/0x55
[<ffffffff802fc5cb>] delay_tsc+0x0/0x55
[<ffffffff802fc5e7>] delay_tsc+0x1c/0x55
[<ffffffff8037e2ee>] gm_phy_write+0x50/0x90
[<ffffffff8037e2f8>] gm_phy_write+0x5a/0x90
[<ffffffff8037f03f>] skge_led+0x1fb/0x20b
[<ffffffff804202e8>] __netif_schedule+0x24/0x6d
[<ffffffff80381298>] skge_down+0x440/0x4cf
[<ffffffff803814be>] skge_change_mtu+0x3d/0x61
[<ffffffff80420775>] dev_set_mtu+0x46/0x79
[<ffffffff80427a51>] do_setlink+0x1c5/0x31f
[<ffffffff804287d3>] rtnl_newlink+0x2b9/0x3e3
[<ffffffff804285a8>] rtnl_newlink+0x8e/0x3e3
[<ffffffff80428d69>] rtnetlink_rcv_msg+0x5a/0x1ea
[<ffffffff80428d0f>] rtnetlink_rcv_msg+0x0/0x1ea
[<ffffffff8042efde>] netlink_rcv_skb+0x34/0x7e
[<ffffffff80428d09>] rtnetlink_rcv+0x1f/0x25
[<ffffffff8042eb97>] netlink_unicast+0x119/0x17f
[<ffffffff8041b35d>] __alloc_skb+0x61/0x123
[<ffffffff8042ee37>] netlink_sendmsg+0x23a/0x24d
[<ffffffff804152e4>] sock_sendmsg+0xcb/0xe3
[<ffffffff8023f670>] autoremove_wake_function+0x0/0x2e
[<ffffffff80225643>] need_resched+0x1e/0x28
[<ffffffff804147d8>] move_addr_to_kernel+0x25/0x36
[<ffffffff8041c090>] verify_iovec+0x46/0x82
[<ffffffff80415513>] sys_sendmsg+0x217/0x28a
[<ffffffff8042e01e>] netlink_insert+0xfe/0x121
[<ffffffff80414b40>] sockfd_lookup_light+0x1a/0x52
[<ffffffff80261fe5>] __vma_link+0x58/0x61
[<ffffffff80262062>] vma_link+0x74/0x99
[<ffffffff802630ce>] do_brk+0x2c1/0x319
[<ffffffff8020b3eb>] system_call_after_swapgs+0x7b/0x80

---[ end trace 92936ef183e09876 ]---
skge eth1: enabling interface
skge eth1: Link is up at 100 Mbps, full duplex, flow control both

--
Markus

2008-07-22 09:54:27

by Alessandro Guido

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at net/core/dev.c:1328!

Got a WARNING this morning (2.6.26-05752-g93ded9b) and I think it's related.


WARNING: at net/core/dev.c:1330 __netif_schedule+0x53/0x60()
Modules linked in:
Pid: 5, comm: events/0 Not tainted 2.6.26-05752-g93ded9b #1
[<c011bc64>] warn_on_slowpath+0x54/0xa0
[<c0135483>] ? getnstimeofday+0x53/0x100
[<c0132d1c>] ? ktime_get_ts+0x4c/0x50
[<c012ba67>] ? insert_work+0x57/0x70
[<c012c129>] ? queue_work+0x39/0x60
[<c012c1e5>] ? queue_delayed_work+0x25/0x30
[<c012c201>] ? schedule_delayed_work+0x11/0x20
[<c030398a>] ? linkwatch_schedule_work+0x3a/0xb0
[<c02f78d3>] __netif_schedule+0x53/0x60
[<c0262a80>] ipw_bg_link_up+0x160/0x1f0
[<c035ae4e>] ? schedule+0x1de/0x3f0
[<c012bdef>] run_workqueue+0x6f/0x160
[<c0262920>] ? ipw_bg_link_up+0x0/0x1f0
[<c012c08f>] worker_thread+0x7f/0xe0
[<c012f5d0>] ? autoremove_wake_function+0x0/0x50
[<c012c010>] ? worker_thread+0x0/0xe0
[<c012f15a>] kthread+0x3a/0x70
[<c012f120>] ? kthread+0x0/0x70
[<c0103d77>] kernel_thread_helper+0x7/0x10
=======================


Attachments:
(No filename) (1.00 kB)
congig2.gz (8.87 kB)
Download all attachments