stop_queues() is currently not usable, as it races with
the TX path and generates hard to debug system freezes.
In the driver we must be able to stop and flush (!) the
TX queues before we shutdown the device.
Otherwise the TX handlers race against the device shutdown.
But stop_queues lacks some locking against the TX handlers.
stop_queues is only safe to call inside of the TX handler,
but there's it's useless to call.
I have no idea what's needed to get this working properly,
so here's an untested patch for RFC.
Index: mac80211/net/mac80211/ieee80211.c
===================================================================
--- mac80211.orig/net/mac80211/ieee80211.c 2007-06-25 17:32:32.000000000 +0200
+++ mac80211/net/mac80211/ieee80211.c 2007-07-02 22:29:03.000000000 +0200
@@ -5063,10 +5063,14 @@ EXPORT_SYMBOL(ieee80211_start_queues);
void ieee80211_stop_queues(struct ieee80211_hw *hw)
{
+ struct ieee80211_local *local = hw_to_local(hw);
+ struct net_device *mdev = local->mdev;
int i;
+ netif_tx_lock_bh(mdev);
for (i = 0; i < hw->queues; i++)
ieee80211_stop_queue(hw, i);
+ netif_tx_unlock_bh(mdev);
}
EXPORT_SYMBOL(ieee80211_stop_queues);
--
Greetings Michael.
On Tuesday 03 July 2007 19:41:22 Michael Wu wrote:
> On Tuesday 03 July 2007 10:36, Michael Buesch wrote:
> > the _only_ place where it's sane to call stop_queues (to stop all queues)
> > is outside of the TX handler. Why would you stop all queues inside of
> > the TX path?
> isl38xx cards (p54) need the host to manage the memory that frames are copied
> to before the card transmits. If the card's memory gets filled up, all queues
> need to be stopped.
Ah, yeah. Well. I was mainly talking about modern cards with
modern DMA rings. ;) But Ok. This card may want to stop all
queues from the TX path.
On bcm43xx each queue is seperate and can run
concurrently, though.
--
Greetings Michael.
On Tuesday 03 July 2007 06:09:19 Michael Wu wrote:
> On Monday 02 July 2007 13:35, Michael Buesch wrote:
> > + netif_tx_lock_bh(mdev);
> > for (i = 0; i < hw->queues; i++)
> > ieee80211_stop_queue(hw, i);
> > + netif_tx_unlock_bh(mdev);
> Well, looks like this will break stopping all tx queues from the tx handler by
> deadlocking. It may be useless for bcm43xx to call ieee80211_stop_queue, but
> there are other drivers which rely on it.
Nobody said that. Of course bcm43xx needs stop_queue, too.
> I would prefer to guarantee that the stack will not allow any more frames to
> be queued before calling stop/remove_interface on the last virtual interface.
> That should be true right now since the master interface is taken down before
> calling stop and remove_interface, so ieee80211_stop_queues shouldn't be
> necessary on device down. If this isn't the case, it should be fixed.
No, that is not enough. For example for switching bands we need to reinitialize
the board completely. Before I take the board down I must ensure that
no traffic is going on any longer.
> Of course, the ieee80211_stop_queue deadlock should still get fixed
> eventually..
That's not needed, as it's only sane to call in the TX handler, where
it can't deadlock.
--
Greetings Michael.
On Tuesday 03 July 2007 10:36, Michael Buesch wrote:
> the _only_ place where it's sane to call stop_queues (to stop all queues)
> is outside of the TX handler. Why would you stop all queues inside of
> the TX path?
isl38xx cards (p54) need the host to manage the memory that frames are copied
to before the card transmits. If the card's memory gets filled up, all queues
need to be stopped.
-Michael Wu
Michael Buesch wrote:
> On Tuesday 03 July 2007 14:31:31 Patrick McHardy wrote:
>
>>The wireless multiqueue handling should be replaced by the generic
>>implementation in net-2.6.23 (using prio, seperating out the wme
>>classifier and killing the broken scheduler). You don't need to
>>stop individual subqueues for a full shutdown with that implementation,
>>you can just stop the global queue. Not sure if that really helps for
>>this case though since I didn't look to deep into this code.
>
>
> That's exactly what I want. Stop the "global queue".
> As there is no "global queue" in mac80211, I need to stop every queue.
I could help you take care of the scheduler part if someone
else takes care of the drivers and mac80211.
Roughly what they need to do is:
- use alloc_netdev_mq instead of alloc_netdev
- use netif_{start,stop,wake}_subqueue instead of the wireless
equivalents
- when all subqueues are stopped the global queue should be stopped
in the usual way (netif_stop_queue), when at least one is active
the global queue should be woken
- use skb->queue_mapping instead of skb->priority to get the HW
queue
The scheduler part mainly consists of offering a clean way to install
a different default qdisc than pfifo_fast and adding a default
classifier. I'm presuming the wme classifier would also be useful
for other (non-mac80211) wireless drivers that offer multiple queues,
so it should probably be completely seperated from mac80211 and moved
to net/sched.
Michael Buesch wrote:
> On Tuesday 03 July 2007 06:09:19 Michael Wu wrote:
>
>>On Monday 02 July 2007 13:35, Michael Buesch wrote:
>>
>>>+ netif_tx_lock_bh(mdev);
>>> for (i = 0; i < hw->queues; i++)
>>> ieee80211_stop_queue(hw, i);
>>>+ netif_tx_unlock_bh(mdev);
>>
>>Well, looks like this will break stopping all tx queues from the tx handler by
>>deadlocking. It may be useless for bcm43xx to call ieee80211_stop_queue, but
>>there are other drivers which rely on it.
The wireless multiqueue handling should be replaced by the generic
implementation in net-2.6.23 (using prio, seperating out the wme
classifier and killing the broken scheduler). You don't need to
stop individual subqueues for a full shutdown with that implementation,
you can just stop the global queue. Not sure if that really helps for
this case though since I didn't look to deep into this code.
On Tuesday 03 July 2007 19:15:19 Michael Wu wrote:
> On Tuesday 03 July 2007 01:39, Michael Buesch wrote:
> > On Tuesday 03 July 2007 06:09:19 Michael Wu wrote:
> > > On Monday 02 July 2007 13:35, Michael Buesch wrote:
> > > > + netif_tx_lock_bh(mdev);
> > > > for (i = 0; i < hw->queues; i++)
> > > > ieee80211_stop_queue(hw, i);
> > > > + netif_tx_unlock_bh(mdev);
> > >
> > > Well, looks like this will break stopping all tx queues from the tx
> > > handler by deadlocking. It may be useless for bcm43xx to call
> > > ieee80211_stop_queue, but there are other drivers which rely on it.
> >
> > Nobody said that. Of course bcm43xx needs stop_queue, too.
> >
> I mean stop_queues, in the tx path, which this patch would break.
It's useless to stop all queues from the TX handler.
If a queue overflows we should just stop _that_ one.
> > No, that is not enough. For example for switching bands we need to
> > reinitialize the board completely. Before I take the board down I must
> > ensure that no traffic is going on any longer.
> >
> Ok sure.
>
> However, ensuring that there is no attempt to TX while the channel is being
> changed is generally good so stopping the queues before every channel change
> in mac80211 would be preferred.
Yes. That may also be a mac80211 issue.
The driver should really be able to completely stop TX, too.
Otherwise it is forced to implement ugly locks and workarounds
to prevent concurrency issues. I'm not sure if it's easily possible
to workaround without races.
> > > Of course, the ieee80211_stop_queue deadlock should still get fixed
> > > eventually..
> >
> > That's not needed, as it's only sane to call in the TX handler, where
> > it can't deadlock.
> By stop_queue, I mean stop_queues, which you do seem to want to call outside
> of the tx_handler.
the _only_ place where it's sane to call stop_queues (to stop all queues)
is outside of the TX handler. Why would you stop all queues inside of
the TX path?
--
Greetings Michael.
On Tuesday 03 July 2007 14:31:31 Patrick McHardy wrote:
> Michael Buesch wrote:
> > On Tuesday 03 July 2007 06:09:19 Michael Wu wrote:
> >
> >>On Monday 02 July 2007 13:35, Michael Buesch wrote:
> >>
> >>>+ netif_tx_lock_bh(mdev);
> >>> for (i = 0; i < hw->queues; i++)
> >>> ieee80211_stop_queue(hw, i);
> >>>+ netif_tx_unlock_bh(mdev);
> >>
> >>Well, looks like this will break stopping all tx queues from the tx handler by
> >>deadlocking. It may be useless for bcm43xx to call ieee80211_stop_queue, but
> >>there are other drivers which rely on it.
>
>
> The wireless multiqueue handling should be replaced by the generic
> implementation in net-2.6.23 (using prio, seperating out the wme
> classifier and killing the broken scheduler). You don't need to
> stop individual subqueues for a full shutdown with that implementation,
> you can just stop the global queue. Not sure if that really helps for
> this case though since I didn't look to deep into this code.
That's exactly what I want. Stop the "global queue".
As there is no "global queue" in mac80211, I need to stop every queue.
--
Greetings Michael.
On Tuesday 03 July 2007 01:39, Michael Buesch wrote:
> On Tuesday 03 July 2007 06:09:19 Michael Wu wrote:
> > On Monday 02 July 2007 13:35, Michael Buesch wrote:
> > > + netif_tx_lock_bh(mdev);
> > > for (i = 0; i < hw->queues; i++)
> > > ieee80211_stop_queue(hw, i);
> > > + netif_tx_unlock_bh(mdev);
> >
> > Well, looks like this will break stopping all tx queues from the tx
> > handler by deadlocking. It may be useless for bcm43xx to call
> > ieee80211_stop_queue, but there are other drivers which rely on it.
>
> Nobody said that. Of course bcm43xx needs stop_queue, too.
>
I mean stop_queues, in the tx path, which this patch would break.
> > I would prefer to guarantee that the stack will not allow any more frames
> > to be queued before calling stop/remove_interface on the last virtual
> > interface. That should be true right now since the master interface is
> > taken down before calling stop and remove_interface, so
> > ieee80211_stop_queues shouldn't be necessary on device down. If this
> > isn't the case, it should be fixed.
>
> No, that is not enough. For example for switching bands we need to
> reinitialize the board completely. Before I take the board down I must
> ensure that no traffic is going on any longer.
>
Ok sure.
However, ensuring that there is no attempt to TX while the channel is being
changed is generally good so stopping the queues before every channel change
in mac80211 would be preferred.
> > Of course, the ieee80211_stop_queue deadlock should still get fixed
> > eventually..
>
> That's not needed, as it's only sane to call in the TX handler, where
> it can't deadlock.
By stop_queue, I mean stop_queues, which you do seem to want to call outside
of the tx_handler.
-Michael Wu
On Monday 02 July 2007 13:35, Michael Buesch wrote:
> + netif_tx_lock_bh(mdev);
> for (i = 0; i < hw->queues; i++)
> ieee80211_stop_queue(hw, i);
> + netif_tx_unlock_bh(mdev);
Well, looks like this will break stopping all tx queues from the tx handler by
deadlocking. It may be useless for bcm43xx to call ieee80211_stop_queue, but
there are other drivers which rely on it.
I would prefer to guarantee that the stack will not allow any more frames to
be queued before calling stop/remove_interface on the last virtual interface.
That should be true right now since the master interface is taken down before
calling stop and remove_interface, so ieee80211_stop_queues shouldn't be
necessary on device down. If this isn't the case, it should be fixed.
Of course, the ieee80211_stop_queue deadlock should still get fixed
eventually..
-Michael Wu