2010-01-05 08:38:08

by Lennert Buytenhek

[permalink] [raw]
Subject: infinite transmit buffering issue in 2.6.32 mac80211

Hi!

Since "mac80211: remove master netdev", mac80211 no longer propagates
TX queue full status (ieee80211_stop_queue et al) up. While the
underlying hardware's TX queue is stopped, mac80211 buffers frames
internally (in ieee80211_tx), but there's no upper limit on the number
of frames it will buffer, leading to badness when there is heavy TX
traffic on the wireless interface:

* It breaks TCP's packet drop-induced rate control. Instead, you'll
end up with much of the same effects as tunneling TCP in TCP like
some VPN apps do, where individual packets will see wildly varying
RTTs and you'll end up adding retransmits to the TX queue while
the original packet didn't even go out yet.

* If there is bulk data transfer going on, you end up with unbounded
and highly variable RTTs for concurrent traffic (say, pings).

* On the kind of machines I typically work on (embedded access point
type devices), more so than on big x86_64 machines, unbounded
packet buffering will typically lead to OOM very quickly. :-)


Routing from a wired interface to wireless, and flooding the wired
interface with traffic to be routed, say with a traffic generator (for
performance testing) can trigger OOM and cripple the box in seconds,
but I think (but haven't verified) that even just simple non-forwarded
bulk TCP upload should be able to trigger OOM as well on sufficiently
constrained machines.


Something like this makes the OOM and jitter issues go away:

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index eaa4118..f7d9033 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1294,6 +1298,8 @@ static void ieee80211_tx(struct ieee80211_sub_if_data *sda
goto drop;
/* fall through */
case IEEE80211_TX_PENDING:
+ goto drop;
+
skb = tx.skb;

spin_lock_irqsave(&local->queue_stop_reason_lock, flags);


However, TX queue status feedback is still broken with this, which is
problematic as per:

http://marc.info/?l=linux-netdev&m=121994203129939&w=2
http://marc.info/?l=linux-netdev&m=122004613003333&w=2


Propagating the queue stop to the higher-level interface (as per the
somewhat broken patch below) is closer into the right direction, but
Johannes voiced concerns that this is inefficient (which is demonstrated
e.g. by the first email referenced above), but also, it creates a new
problem, which is that of head-of-line blocking -- a low-priority flow
can now cause the wlanX interface's main queue to be stopped, leading
to queueing of high-priority traffic in the stack while the hardware's
high-priority traffic queue sits empty.


The only way I see to solve all of these issues cleanly is to convert
the AP/STA/etc subinterfaces to be multiqueue interfaces, with the same
number of transmit queues as the hardware has, so that there are
independently stoppable/resumable virtual output queues all the way
from userland to the actual hardware, and then to stop/resume those
queues in response to the hardware DMA queues filling up and draining.

Before I go ahead and do this -- thoughts?


thanks,
Lennert





(broken -- doesn't deal properly with stops/wakes on multiple queues)


diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index dc76267..5ac558f 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -296,8 +296,33 @@ void ieee80211_wake_queue_by_reason(struct ieee80211_hw *hw, int queue,

void ieee80211_wake_queue(struct ieee80211_hw *hw, int queue)
{
+ struct ieee80211_local *local = hw_to_local(hw);
+ struct ieee80211_sub_if_data *sdata;
+
ieee80211_wake_queue_by_reason(hw, queue,
IEEE80211_QUEUE_STOP_REASON_DRIVER);
+
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
+ switch (sdata->vif.type) {
+ case __NL80211_IFTYPE_AFTER_LAST:
+ case NL80211_IFTYPE_UNSPECIFIED:
+ case NL80211_IFTYPE_MONITOR:
+ case NL80211_IFTYPE_AP_VLAN:
+ continue;
+ case NL80211_IFTYPE_AP:
+ case NL80211_IFTYPE_STATION:
+ case NL80211_IFTYPE_ADHOC:
+ case NL80211_IFTYPE_WDS:
+ case NL80211_IFTYPE_MESH_POINT:
+ if (netif_running(sdata->dev))
+ netif_wake_queue(sdata->dev);
+ break;
+ }
+ }
+
+ rcu_read_unlock();
}
EXPORT_SYMBOL(ieee80211_wake_queue);

@@ -325,8 +350,33 @@ void ieee80211_stop_queue_by_reason(struct ieee80211_hw *hw, int queue,

void ieee80211_stop_queue(struct ieee80211_hw *hw, int queue)
{
+ struct ieee80211_local *local = hw_to_local(hw);
+ struct ieee80211_sub_if_data *sdata;
+
ieee80211_stop_queue_by_reason(hw, queue,
IEEE80211_QUEUE_STOP_REASON_DRIVER);
+
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
+ switch (sdata->vif.type) {
+ case __NL80211_IFTYPE_AFTER_LAST:
+ case NL80211_IFTYPE_UNSPECIFIED:
+ case NL80211_IFTYPE_MONITOR:
+ case NL80211_IFTYPE_AP_VLAN:
+ continue;
+ case NL80211_IFTYPE_AP:
+ case NL80211_IFTYPE_STATION:
+ case NL80211_IFTYPE_ADHOC:
+ case NL80211_IFTYPE_WDS:
+ case NL80211_IFTYPE_MESH_POINT:
+ if (netif_running(sdata->dev))
+ netif_stop_queue(sdata->dev);
+ break;
+ }
+ }
+
+ rcu_read_unlock();
}
EXPORT_SYMBOL(ieee80211_stop_queue);



2010-01-05 17:01:44

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: infinite transmit buffering issue in 2.6.32 mac80211

On Tue, Jan 5, 2010 at 12:38 AM, Lennert Buytenhek
<[email protected]> wrote:
> Hi!
>
> Since "mac80211: remove master netdev", mac80211 no longer propagates
> TX queue full status (ieee80211_stop_queue et al) up.  While the
> underlying hardware's TX queue is stopped, mac80211 buffers frames
> internally (in ieee80211_tx), but there's no upper limit on the number
> of frames it will buffer, leading to badness when there is heavy TX
> traffic on the wireless interface:
>
> * It breaks TCP's packet drop-induced rate control.  Instead, you'll
>  end up with much of the same effects as tunneling TCP in TCP like
>  some VPN apps do, where individual packets will see wildly varying
>  RTTs and you'll end up adding retransmits to the TX queue while
>  the original packet didn't even go out yet.
>
> * If there is bulk data transfer going on, you end up with unbounded
>  and highly variable RTTs for concurrent traffic (say, pings).
>
> * On the kind of machines I typically work on (embedded access point
>  type devices), more so than on big x86_64 machines, unbounded
>  packet buffering will typically lead to OOM very quickly. :-)

Felix, curious have you seen OOMs quickly in your setups with embedded
devices and current mac80211 drivers?

> Routing from a wired interface to wireless, and flooding the wired
> interface with traffic to be routed, say with a traffic generator (for
> performance testing) can trigger OOM and cripple the box in seconds,
> but I think (but haven't verified) that even just simple non-forwarded
> bulk TCP upload should be able to trigger OOM as well on sufficiently
> constrained machines.

Don't traffic generators typically cripple boxes though? How about
with plain iperf pusing 1gbit/s over the ethernet and routing out via
the wireless interface?

> Something like this makes the OOM and jitter issues go away:
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index eaa4118..f7d9033 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1294,6 +1298,8 @@ static void ieee80211_tx(struct ieee80211_sub_if_data *sda
>                        goto drop;
>                /* fall through */
>        case IEEE80211_TX_PENDING:
> +               goto drop;
> +
>                skb = tx.skb;
>
>                spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
>
>
> However, TX queue status feedback is still broken with this, which is
> problematic as per:
>
>        http://marc.info/?l=linux-netdev&m=121994203129939&w=2
>        http://marc.info/?l=linux-netdev&m=122004613003333&w=2
>
>
> Propagating the queue stop to the higher-level interface (as per the
> somewhat broken patch below) is closer into the right direction, but
> Johannes voiced concerns that this is inefficient (which is demonstrated
> e.g. by the first email referenced above), but also, it creates a new
> problem, which is that of head-of-line blocking -- a low-priority flow
> can now cause the wlanX interface's main queue to be stopped, leading
> to queueing of high-priority traffic in the stack while the hardware's
> high-priority traffic queue sits empty.
>
>
> The only way I see to solve all of these issues cleanly is to convert
> the AP/STA/etc subinterfaces to be multiqueue interfaces, with the same
> number of transmit queues as the hardware has, so that there are
> independently stoppable/resumable virtual output queues all the way
> from userland to the actual hardware, and then to stop/resume those
> queues in response to the hardware DMA queues filling up and draining.

How does this resolve the main OOM issues you are seeing though? I
don't see the link yet.

> Before I go ahead and do this -- thoughts?

So the way we had multiqueue support implemented in mac80211 did not
exactly reflect the actual hardware queues as implemented on ethernet
drivers. Each driver would still need to queue buffers themselves then
and we only used mac80211 to propagate frames to our driver's queue.
Adding multiqueue support back seems fine if it indeed resolves an
issue we cannot deal with right now but if we do, it'd be good to
allow for us to reconsider the way we implement it hopefully to ensure
we don't re do queuing on drivers and so that each netdev queue will
directly move buffers to hardware.

One downside worth mentioning to re-adding MQ support is doing so
would mean losing kernel compatibility back again for kernels <
2.6.27. In way should kernel compatility concerns hold back
development but if I'd before we go and re-add MQ support I'd like to
ensure it will really cure an issue we cannot resolve through other
means.

Luis

>
> thanks,
> Lennert
>
>
>
>
>
> (broken -- doesn't deal properly with stops/wakes on multiple queues)
>
>
> diff --git a/net/mac80211/util.c b/net/mac80211/util.c
> index dc76267..5ac558f 100644
> --- a/net/mac80211/util.c
> +++ b/net/mac80211/util.c
> @@ -296,8 +296,33 @@ void ieee80211_wake_queue_by_reason(struct ieee80211_hw *hw, int queue,
>
>  void ieee80211_wake_queue(struct ieee80211_hw *hw, int queue)
>  {
> +       struct ieee80211_local *local = hw_to_local(hw);
> +       struct ieee80211_sub_if_data *sdata;
> +
>        ieee80211_wake_queue_by_reason(hw, queue,
>                                       IEEE80211_QUEUE_STOP_REASON_DRIVER);
> +
> +       rcu_read_lock();
> +
> +       list_for_each_entry_rcu(sdata, &local->interfaces, list) {
> +               switch (sdata->vif.type) {
> +               case __NL80211_IFTYPE_AFTER_LAST:
> +               case NL80211_IFTYPE_UNSPECIFIED:
> +               case NL80211_IFTYPE_MONITOR:
> +               case NL80211_IFTYPE_AP_VLAN:
> +                       continue;
> +               case NL80211_IFTYPE_AP:
> +               case NL80211_IFTYPE_STATION:
> +               case NL80211_IFTYPE_ADHOC:
> +               case NL80211_IFTYPE_WDS:
> +               case NL80211_IFTYPE_MESH_POINT:
> +                       if (netif_running(sdata->dev))
> +                               netif_wake_queue(sdata->dev);
> +                       break;
> +               }
> +        }
> +
> +       rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(ieee80211_wake_queue);
>
> @@ -325,8 +350,33 @@ void ieee80211_stop_queue_by_reason(struct ieee80211_hw *hw, int queue,
>
>  void ieee80211_stop_queue(struct ieee80211_hw *hw, int queue)
>  {
> +       struct ieee80211_local *local = hw_to_local(hw);
> +       struct ieee80211_sub_if_data *sdata;
> +
>        ieee80211_stop_queue_by_reason(hw, queue,
>                                       IEEE80211_QUEUE_STOP_REASON_DRIVER);
> +
> +       rcu_read_lock();
> +
> +       list_for_each_entry_rcu(sdata, &local->interfaces, list) {
> +               switch (sdata->vif.type) {
> +               case __NL80211_IFTYPE_AFTER_LAST:
> +               case NL80211_IFTYPE_UNSPECIFIED:
> +               case NL80211_IFTYPE_MONITOR:
> +               case NL80211_IFTYPE_AP_VLAN:
> +                       continue;
> +               case NL80211_IFTYPE_AP:
> +               case NL80211_IFTYPE_STATION:
> +               case NL80211_IFTYPE_ADHOC:
> +               case NL80211_IFTYPE_WDS:
> +               case NL80211_IFTYPE_MESH_POINT:
> +                       if (netif_running(sdata->dev))
> +                               netif_stop_queue(sdata->dev);
> +                       break;
> +               }
> +        }
> +
> +       rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(ieee80211_stop_queue);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

2010-01-05 17:20:29

by Lennert Buytenhek

[permalink] [raw]
Subject: Re: infinite transmit buffering issue in 2.6.32 mac80211

Hi Luis,


On Tue, Jan 05, 2010 at 09:01:23AM -0800, Luis R. Rodriguez wrote:

> > Routing from a wired interface to wireless, and flooding the wired
> > interface with traffic to be routed, say with a traffic generator (for
> > performance testing) can trigger OOM and cripple the box in seconds,
> > but I think (but haven't verified) that even just simple non-forwarded
> > bulk TCP upload should be able to trigger OOM as well on sufficiently
> > constrained machines.
>
> Don't traffic generators typically cripple boxes though?

The way that the traffic generators I have access to will try to
establish wired routing performance is by determining the maximum
loss-free forwarding rate that a certain setup can handle, i.e. the
maximum data rate at which there is 0% packet loss. This tends to
be done by binary search -- transmitting data at 1000 Mb/s, 500 Mb/s,
250 Mb/s, 125 Mb/s, etc until there is no packet loss anymore, and
then increasing the transmit rate again, etc.

All hardware of course has its performance limits. If you stress it
beyond those limits, it should simply drop packets, and while it is
probably acceptable that it will become temporarily unresponsive during
the test, it should not crash or go OOM like 2.6.32 will happily do.


> How about with plain iperf pusing 1gbit/s over the ethernet and
> routing out via the wireless interface?

It won't manage to keep up. But before 2.6.32, at most 1000 packets
would accumulate in the wmaster0 qdisc, and any packets after that
would be dropped in net/sched/sch_generic.c:pfifo_fast_enqueue(). As
of 2.6.32, it will keep queuing packets in mac80211 ad infinitum.


> > The only way I see to solve all of these issues cleanly is to convert
> > the AP/STA/etc subinterfaces to be multiqueue interfaces, with the same
> > number of transmit queues as the hardware has, so that there are
> > independently stoppable/resumable virtual output queues all the way
> > from userland to the actual hardware, and then to stop/resume those
> > queues in response to the hardware DMA queues filling up and draining.
>
> How does this resolve the main OOM issues you are seeing though? I
> don't see the link yet.

Once you stop the queues on the devices via netif_stop_queue (or one
of the subqueue variants), the qdisc attached to the netdev will take
over and start queueing packets that are handed to netif_queue_xmit().

The default qdisc is pfifo_fast, and its default limit is 1000, so any
packets that we're trying to queue beyond the 1000th will be dropped
by pfifo_fast -- i.e. the in-stack queueing that will kick in once you
stop the netdev queue is limited to 1000 packets. (Which is probably
OK for gigabit but _way_ too many for typical wireless data rates --
but that's an issue for another day.)


cheers,
Lennert

2010-01-05 18:29:05

by Kalle Valo

[permalink] [raw]
Subject: Re: infinite transmit buffering issue in 2.6.32 mac80211

Lennert Buytenhek <[email protected]> writes:

> Hi!

Hello,

> Since "mac80211: remove master netdev", mac80211 no longer propagates
> TX queue full status (ieee80211_stop_queue et al) up. While the
> underlying hardware's TX queue is stopped, mac80211 buffers frames
> internally (in ieee80211_tx), but there's no upper limit on the number
> of frames it will buffer, leading to badness when there is heavy TX
> traffic on the wireless interface:

We have noticed very strange throughput degration both with wl1251 and
wl1271. With 2.6.28 both drivers were able to achieve 10 Mbit/s
throughput over TCP, but with 2.6.32 (and almost the same drivers) we
get less than 4 Mbit/s. Because we see it both with wl1251 and wl1271,
we are starting to suspect that it's a problem in mac80211.

We haven't started analysing this yet, but I'm hoping someone from our
team to start it soon. I have no idea if our problem is related to the
problem Lennert reports or not, but we will definitely try Johannes'
patch and see if it helps.

--
Kalle Valo