I'm using the latest iwlwifi-2.6.git from http://git.kernel.org/?p=linux/kernel/git/iwlwifi/iwlwifi-2.6.git
When using 802.11n aggregation and the other endpoint sends a BlockAck Request, many times the transfer will completely stall.
It looks like the relevant code is in net/mac80211/rx.c:ieee80211_rx_h_ctrl . I found that the sequence numbers used are invalid. If instead I linearize the SKB, then the sequence numbers become valid, so I believe it's a problem with the use of paged RX skbs. Mailing both lists since I'm not sure where the fix should go.
The paged RX SKBs are set up in drivers/net/wireless/iwlwifi:iwl-agn-lib.c:iwl_pass_packet_to_mac80211. I made a temporary fix at net/mac80211/rx.c:__ieee80211_rx_handle_packet
by changing
if (ieee80211_is_mgmt(fc))
err = skb_linearize(skb);
to
if (ieee80211_is_mgmt(fc) || ieee80211_is_ctl(fc))
err = skb_linearize(skb);
Can anyone more knowledgeable than I please tell me the right fix?
Thanks!
-Dan
On Fri, 2010-05-28 at 14:38 -0700, Daniel Halperin wrote:
> On May 28, 2010, at 1:34 PM, Johannes Berg wrote:
> > One thing I ask myself though is do we ever check that the frame is long
> > enough? In the patch below I will by checking the skb_copy_bits() return
> > value, but without that we don't, as far as I can tell?
>
> Good point.
> > + if (skb_copy_bits(skb, offsetof(struct ieee80211_bar, control),
> > + &bar_data, sizeof(bar_data)))
> > + return RX_DROP_MONITOR;
> > +
> > if (!rx->sta)
> > return RX_DROP_MONITOR;
>
> Maybe invert the order of these two exit conditions? Figure most CPUs
> will speculate anyway, but the second check seems a more efficient
> short-circuit.
Yeah, true. I think it probably makes more sense to just linearize
control frames like you did, and separately add a length check.
johannes
On Fri, 2010-05-28 at 12:46 -0700, Daniel Halperin wrote:
> I'm using the latest iwlwifi-2.6.git from
> http://git.kernel.org/?p=linux/kernel/git/iwlwifi/iwlwifi-2.6.git
>
> When using 802.11n aggregation and the other endpoint sends a BlockAck
> Request, many times the transfer will completely stall.
>
> It looks like the relevant code is in
> net/mac80211/rx.c:ieee80211_rx_h_ctrl . I found that the sequence
> numbers used are invalid. If instead I linearize the SKB, then the
> sequence numbers become valid, so I believe it's a problem with the
> use of paged RX skbs. Mailing both lists since I'm not sure where the
> fix should go.
>
> The paged RX SKBs are set up in
> drivers/net/wireless/iwlwifi:iwl-agn-lib.c:iwl_pass_packet_to_mac80211. I made a temporary fix at net/mac80211/rx.c:__ieee80211_rx_handle_packet
>
> by changing
>
> if (ieee80211_is_mgmt(fc))
> err = skb_linearize(skb);
>
> to
>
> if (ieee80211_is_mgmt(fc) || ieee80211_is_ctl(fc))
> err = skb_linearize(skb);
>
> Can anyone more knowledgeable than I please tell me the right fix?
Wow, good catch. FWIW, that seems like a perfectly appropriate fix,
since control frames are typically very short I don't see any problem in
linearizing them, in fact I'd think the skb should already have enough
space at this point. If you wanted to avoid that, you need a patch like
the one below.
One thing I ask myself though is do we ever check that the frame is long
enough? In the patch below I will by checking the skb_copy_bits() return
value, but without that we don't, as far as I can tell?
johannes
--- wireless-testing.orig/net/mac80211/rx.c 2010-05-28 22:25:07.000000000 +0200
+++ wireless-testing/net/mac80211/rx.c 2010-05-28 22:33:30.000000000 +0200
@@ -1819,17 +1819,26 @@ ieee80211_rx_h_ctrl(struct ieee80211_rx_
return RX_CONTINUE;
if (ieee80211_is_back_req(bar->frame_control)) {
+ struct {
+ __le16 control, start_seq_num;
+ } __packed bar_data;
+
+ if (skb_copy_bits(skb, offsetof(struct ieee80211_bar, control),
+ &bar_data, sizeof(bar_data)))
+ return RX_DROP_MONITOR;
+
if (!rx->sta)
return RX_DROP_MONITOR;
+
spin_lock(&rx->sta->lock);
- tid = le16_to_cpu(bar->control) >> 12;
+ tid = le16_to_cpu(bar_data.control) >> 12;
if (!rx->sta->ampdu_mlme.tid_active_rx[tid]) {
spin_unlock(&rx->sta->lock);
return RX_DROP_MONITOR;
}
tid_agg_rx = rx->sta->ampdu_mlme.tid_rx[tid];
- start_seq_num = le16_to_cpu(bar->start_seq_num) >> 4;
+ start_seq_num = le16_to_cpu(bar_data.start_seq_num) >> 4;
/* reset session timer */
if (tid_agg_rx->timeout)
On May 28, 2010, at 1:34 PM, Johannes Berg wrote:
> One thing I ask myself though is do we ever check that the frame is long
> enough? In the patch below I will by checking the skb_copy_bits() return
> value, but without that we don't, as far as I can tell?
Good point.
> --- wireless-testing.orig/net/mac80211/rx.c 2010-05-28 22:25:07.000000000 +0200
> +++ wireless-testing/net/mac80211/rx.c 2010-05-28 22:33:30.000000000 +0200
> @@ -1819,17 +1819,26 @@ ieee80211_rx_h_ctrl(struct ieee80211_rx_
> return RX_CONTINUE;
>
> if (ieee80211_is_back_req(bar->frame_control)) {
> + struct {
> + __le16 control, start_seq_num;
> + } __packed bar_data;
> +
> + if (skb_copy_bits(skb, offsetof(struct ieee80211_bar, control),
> + &bar_data, sizeof(bar_data)))
> + return RX_DROP_MONITOR;
> +
> if (!rx->sta)
> return RX_DROP_MONITOR;
Maybe invert the order of these two exit conditions? Figure most CPUs will speculate anyway, but the second check seems a more efficient short-circuit.
Dan