Commit f0425beda4d404a6e751439b562100b902ba9c98 "mac80211: retry sending
failed BAR frames later instead of tearing down aggr" caused regression
on rt2x00 hardware (connection hangs). This regression was fixed by
commit be03d4a45c09ee5100d3aaaedd087f19bc20d01 "rt2x00: Don't let
mac80211 send a BAR when an AMPDU subframe fails". But the letter
commit, caused yet another problem reported in
https://bugzilla.kernel.org/show_bug.cgi?id=42828#c22
After long discussion in this thread:
http://rt2x00.serialmonkey.com/pipermail/users_rt2x00.serialmonkey.com/2012-October/005349.html
and testing various alternative solutions, which failed on one or other
setup, we have no other good fix for the issues like just revert both
mentioned earlier commits.
To do not affect other hardware which benefit from commit
f0425beda4d404a6e751439b562100b902ba9c98, instead of reverting it,
introduce flag that when used will restore mac80211 behaviour before
the commit.
Cc: [email protected]
Signed-off-by: Stanislaw Gruszka <[email protected]>
---
It's fine to queue this to 3.8 and this will get 3.7 and older
releases through -stable.
include/net/mac80211.h | 5 +++++
net/mac80211/status.c | 6 +++++-
2 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 82558c8..d481cc6 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -1253,6 +1253,10 @@ struct ieee80211_tx_control {
* @IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF: Use the P2P Device address for any
* P2P Interface. This will be honoured even if more than one interface
* is supported.
+ *
+ * @IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL: On this hardware TX BA session
+ * should be tear down once BAR frame will not be acked.
+ *
*/
enum ieee80211_hw_flags {
IEEE80211_HW_HAS_RATE_CONTROL = 1<<0,
@@ -1281,6 +1285,7 @@ enum ieee80211_hw_flags {
IEEE80211_HW_TX_AMPDU_SETUP_IN_HW = 1<<23,
IEEE80211_HW_SCAN_WHILE_IDLE = 1<<24,
IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF = 1<<25,
+ IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL = 1<<26,
};
/**
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 101eb88..c511e9c 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -432,7 +432,11 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
IEEE80211_BAR_CTRL_TID_INFO_MASK) >>
IEEE80211_BAR_CTRL_TID_INFO_SHIFT;
- ieee80211_set_bar_pending(sta, tid, ssn);
+ if (local->hw.flags &
+ IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL)
+ ieee80211_stop_tx_ba_session(&sta->sta, tid);
+ else
+ ieee80211_set_bar_pending(sta, tid, ssn);
}
}
--
1.7.4.4
Johannes, maybe you should take this through the mac80211 tree,
since it depens on the other patch?
John
On Mon, Dec 03, 2012 at 12:59:04PM +0100, Stanislaw Gruszka wrote:
> This revert:
>
> commit be03d4a45c09ee5100d3aaaedd087f19bc20d01f
> Author: Andreas Hartmann <[email protected]>
> Date: Tue Apr 17 00:25:28 2012 +0200
>
> rt2x00: Don't let mac80211 send a BAR when an AMPDU subframe fails
>
> To fix problem workaround by above commit use
> IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL flag (see change log for
> "mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL" patch).
>
> Resolve: https://bugzilla.kernel.org/show_bug.cgi?id=42828
> Bisected-by: Francisco Pina Martins <[email protected]>
> Cc: [email protected]
> Signed-off-by: Stanislaw Gruszka <[email protected]>
> ---
> drivers/net/wireless/rt2x00/rt2800lib.c | 3 ++-
> drivers/net/wireless/rt2x00/rt2x00dev.c | 7 +++----
> 2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c b/drivers/net/wireless/rt2x00/rt2800lib.c
> index 59474ae..175a9b9 100644
> --- a/drivers/net/wireless/rt2x00/rt2800lib.c
> +++ b/drivers/net/wireless/rt2x00/rt2800lib.c
> @@ -5036,7 +5036,8 @@ static int rt2800_probe_hw_mode(struct rt2x00_dev *rt2x00dev)
> IEEE80211_HW_SUPPORTS_PS |
> IEEE80211_HW_PS_NULLFUNC_STACK |
> IEEE80211_HW_AMPDU_AGGREGATION |
> - IEEE80211_HW_REPORTS_TX_ACK_STATUS;
> + IEEE80211_HW_REPORTS_TX_ACK_STATUS |
> + IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL;
>
> /*
> * Don't set IEEE80211_HW_HOST_BROADCAST_PS_BUFFERING for USB devices
> diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
> index 69097d1..b0183d1 100644
> --- a/drivers/net/wireless/rt2x00/rt2x00dev.c
> +++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
> @@ -391,10 +391,9 @@ void rt2x00lib_txdone(struct queue_entry *entry,
> tx_info->flags |= IEEE80211_TX_STAT_AMPDU;
> tx_info->status.ampdu_len = 1;
> tx_info->status.ampdu_ack_len = success ? 1 : 0;
> - /*
> - * TODO: Need to tear down BA session here
> - * if not successful.
> - */
> +
> + if (!success)
> + tx_info->flags |= IEEE80211_TX_STAT_AMPDU_NO_BACK;
> }
>
> if (rate_flags & IEEE80211_TX_RC_USE_RTS_CTS) {
> --
> 1.7.4.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.
On Monday, December 03, 2012 03:13:55 PM Andreas Hartmann wrote:
> Stanislaw Gruszka wrote:
> > Commit f0425beda4d404a6e751439b562100b902ba9c98 "mac80211: retry sending
> > failed BAR frames later instead of tearing down aggr" caused regression
> > on rt2x00 hardware (connection hangs).
>
> This patch caused a problem, too, with carl9170
> (http://thread.gmane.org/gmane.linux.kernel.wireless.general/92203/focus=92376).
> How did they fix it (the thread unfortunately ends without any solution
> / patch).
This was fixed by: carl9170: fix HT peer BA session corruption
(c9122c0d63a50 in wireless-testing). The issue here is that the
hardware does not set the tx success bit when it receives a
BA for a sent BAR [looks like it is expecting a legacy ACK?!
but who knows - the original vendor driver [otus] didn't really
deal with BARs anyway].
So the driver has to do that job and currently it's done in
the following way:
When mac80211 sends a BAR, the driver will keep a reference of
the frame around in ar->bar_list, before it is sent on its way.
If the device receives a BA within the retry window then the
driver's rx-path will enter carl9170_ba_check. This function
sets the TX_STAT_ACK flag [for the BAR] if the incoming BA
matches.
While there are a few problems with this approach [added tx and rx
overhead due to flexible ba filtering and bar queuing, unwanted BAR
retries, ...], it was one of the better solution which was within
the capabilities of the hardware, firmware and driver.
Obviously, if the ar9170 hardware would just set that tx success
bit [ath9k mac does this properly], this wouldn't be necessary.
Regards,
Chr
On 2012-12-03 3:13 PM, Andreas Hartmann wrote:
> Hi Stanislaw!
>
> Stanislaw Gruszka wrote:
>> Commit f0425beda4d404a6e751439b562100b902ba9c98 "mac80211: retry sending
>> failed BAR frames later instead of tearing down aggr" caused regression
>> on rt2x00 hardware (connection hangs).
>
> This patch caused a problem, too, with carl9170
> (http://thread.gmane.org/gmane.linux.kernel.wireless.general/92203/focus=92376).
> How did they fix it (the thread unfortunately ends without any solution
> / patch).
>
>> This regression was fixed by
>> commit be03d4a45c09ee5100d3aaaedd087f19bc20d01 "rt2x00: Don't let
>> mac80211 send a BAR when an AMPDU subframe fails". But the letter
>> commit, caused yet another problem reported in
>> https://bugzilla.kernel.org/show_bug.cgi?id=42828#c22
>
> This already was a workaround as stated in the removed comment: "TODO:
> Need to tear down BA session here if not successful."
>
> My general question is:
> Is the behaviour of f0425beda spec conform? Is it implemented correctly
> and w/o demanding any special hardware feature? If both questions can be
> answered with yes, rt2x00 should be fixed to get the same behaviour working.
>
> If f0425beda isn't spec conform or if it expects special hardware
> features, it would be a more or less a ath9k specific "solution", which
> should be removed from mac80211 and should be moved to the driver. I'm
> thinking of this, because rt2x00 is not the only one having problems and
> Felix comment in
> http://news.gmane.org/find-root.php?group=gmane.linux.drivers.rt2x00.user&article=1383
>
> "If you want to tear down the BA session in rt2x00, either do it in the
> driver or add a proper flag to ensure that ath9k remains unaffected by
> the change."
>
> sounds to me really ath9k specific (what about other hardware)?
The commit f0425beda is aimed not specifically at Atheros hardware. It
is meant for any hardware that lets the driver track the Tx BA window
and has a reasonable MAC design.
The 'reasonable MAC design' part is where rt2x00 falls short. BAR Tx
status processing is just one of several places where the hardware
interface design is stupid and limiting, meaning the driver has to do
extra effort to work around that.
- Felix
On Mon, Dec 03, 2012 at 09:21:38PM +0100, Helmut Schaa wrote:
> Yep, this makes definitely sense! And since we can copy big parts of
> the carl9170 patch :D this shouldn't be too much work hopefully ...
But still something that will need to go -next rather than -stable, so
let's apply those patches, and remove them in -next when better patch
will come in.
Stanislaw
Hi,
On Mon, Dec 3, 2012 at 8:56 PM, Andreas Hartmann
<[email protected]> wrote:
> Christian Lamparter wrote:
>> On Monday, December 03, 2012 03:13:55 PM Andreas Hartmann wrote:
>>> Stanislaw Gruszka wrote:
>>>> Commit f0425beda4d404a6e751439b562100b902ba9c98 "mac80211: retry sending
>>>> failed BAR frames later instead of tearing down aggr" caused regression
>>>> on rt2x00 hardware (connection hangs).
>>>
>>> This patch caused a problem, too, with carl9170
>>> (http://thread.gmane.org/gmane.linux.kernel.wireless.general/92203/focus=92376).
>>> How did they fix it (the thread unfortunately ends without any solution
>>> / patch).
>>
>> This was fixed by: carl9170: fix HT peer BA session corruption
>> (c9122c0d63a50 in wireless-testing). The issue here is that the
>> hardware does not set the tx success bit when it receives a
>> BA for a sent BAR [looks like it is expecting a legacy ACK?!
>> but who knows - the original vendor driver [otus] didn't really
>> deal with BARs anyway].
>
> If I got Helmut correctly here
> (http://news.gmane.org/find-root.php?group=gmane.linux.kernel.wireless.general&article=83762),
> rt2x00pci could have a related problem (probably missing tx status).
>
> Wouldn't it be an idea to try a similar approach?
>
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/linville/wireless-testing/+/c9122c0d63a50bab0a97dc936a38c0f921b6930e^!/
Yep, this makes definitely sense! And since we can copy big parts of
the carl9170 patch :D this shouldn't be too much work hopefully ...
Helmut
Hi Stanislaw!
Stanislaw Gruszka wrote:
> Commit f0425beda4d404a6e751439b562100b902ba9c98 "mac80211: retry sending
> failed BAR frames later instead of tearing down aggr" caused regression
> on rt2x00 hardware (connection hangs).
This patch caused a problem, too, with carl9170
(http://thread.gmane.org/gmane.linux.kernel.wireless.general/92203/focus=92376).
How did they fix it (the thread unfortunately ends without any solution
/ patch).
> This regression was fixed by
> commit be03d4a45c09ee5100d3aaaedd087f19bc20d01 "rt2x00: Don't let
> mac80211 send a BAR when an AMPDU subframe fails". But the letter
> commit, caused yet another problem reported in
> https://bugzilla.kernel.org/show_bug.cgi?id=42828#c22
This already was a workaround as stated in the removed comment: "TODO:
Need to tear down BA session here if not successful."
My general question is:
Is the behaviour of f0425beda spec conform? Is it implemented correctly
and w/o demanding any special hardware feature? If both questions can be
answered with yes, rt2x00 should be fixed to get the same behaviour working.
If f0425beda isn't spec conform or if it expects special hardware
features, it would be a more or less a ath9k specific "solution", which
should be removed from mac80211 and should be moved to the driver. I'm
thinking of this, because rt2x00 is not the only one having problems and
Felix comment in
http://news.gmane.org/find-root.php?group=gmane.linux.drivers.rt2x00.user&article=1383
"If you want to tear down the BA session in rt2x00, either do it in the
driver or add a proper flag to ensure that ath9k remains unaffected by
the change."
sounds to me really ath9k specific (what about other hardware)?
> After long discussion in this thread:
> http://rt2x00.serialmonkey.com/pipermail/users_rt2x00.serialmonkey.com/2012-October/005349.html
> and testing various alternative solutions, which failed on one or other
> setup, we have no other good fix for the issues like just revert both
> mentioned earlier commits.
I'm scared of all the solutions proposed, which don't work (although
they should have worked), some of them even crash the machine of some
users (but not of all - e.g. see
http://news.gmane.org/find-root.php?group=gmane.linux.drivers.rt2x00.user&article=703).
My question is: why don't they work, or better: why don't they work for
all users? Obviously the driver seems to behave not always as expected,
iow: It's doing things, which are not known or even expected and which
are not planned. This really scares me, especially because I couldn't
see any answer explaining the unexpected behaviour.
That's why I think it would be really necessary to fix the real cause
instead of implementing another workaround (given f0425beda is correct).
I know that there should be a more or less fast fix, but I'm sure, too,
that most probably nobody will care about this problem any more (the
usual "out of sight, out of mind" effect) after this fire has been
turned off (again, given f0425beda isn't wrong).
> To do not affect other hardware which benefit from commit
> f0425beda4d404a6e751439b562100b902ba9c98, instead of reverting it,
> introduce flag that when used will restore mac80211 behaviour before
> the commit.
>
> Cc: [email protected]
> Signed-off-by: Stanislaw Gruszka <[email protected]>
> ---
> It's fine to queue this to 3.8 and this will get 3.7 and older
> releases through -stable.
>
> include/net/mac80211.h | 5 +++++
> net/mac80211/status.c | 6 +++++-
> 2 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index 82558c8..d481cc6 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -1253,6 +1253,10 @@ struct ieee80211_tx_control {
> * @IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF: Use the P2P Device address for any
> * P2P Interface. This will be honoured even if more than one interface
> * is supported.
> + *
> + * @IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL: On this hardware TX BA session
> + * should be tear down once BAR frame will not be acked.
> + *
> */
> enum ieee80211_hw_flags {
> IEEE80211_HW_HAS_RATE_CONTROL = 1<<0,
> @@ -1281,6 +1285,7 @@ enum ieee80211_hw_flags {
> IEEE80211_HW_TX_AMPDU_SETUP_IN_HW = 1<<23,
> IEEE80211_HW_SCAN_WHILE_IDLE = 1<<24,
> IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF = 1<<25,
> + IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL = 1<<26,
> };
>
> /**
> diff --git a/net/mac80211/status.c b/net/mac80211/status.c
> index 101eb88..c511e9c 100644
> --- a/net/mac80211/status.c
> +++ b/net/mac80211/status.c
> @@ -432,7 +432,11 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
> IEEE80211_BAR_CTRL_TID_INFO_MASK) >>
> IEEE80211_BAR_CTRL_TID_INFO_SHIFT;
>
> - ieee80211_set_bar_pending(sta, tid, ssn);
> + if (local->hw.flags &
> + IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL)
> + ieee80211_stop_tx_ba_session(&sta->sta, tid);
> + else
> + ieee80211_set_bar_pending(sta, tid, ssn);
> }
> }
>
Besides the fact that I'm not (yet) convinced about the way of fixing
the problem, both patches work for me (tested with
compat-wireless-3.5rc5 and Linksys WMP600N as AP using 802.11n on 2.4
GHz band / 40MHz / WPA2 / EAPTLS / AES with rt3572sta (Linksys WUSB600Nv2)).
Thanks,
kind regards,
Andreas
Hello Christian,
thanks for your explanation!
Christian Lamparter wrote:
> On Monday, December 03, 2012 03:13:55 PM Andreas Hartmann wrote:
>> Stanislaw Gruszka wrote:
>>> Commit f0425beda4d404a6e751439b562100b902ba9c98 "mac80211: retry sending
>>> failed BAR frames later instead of tearing down aggr" caused regression
>>> on rt2x00 hardware (connection hangs).
>>
>> This patch caused a problem, too, with carl9170
>> (http://thread.gmane.org/gmane.linux.kernel.wireless.general/92203/focus=92376).
>> How did they fix it (the thread unfortunately ends without any solution
>> / patch).
>
> This was fixed by: carl9170: fix HT peer BA session corruption
> (c9122c0d63a50 in wireless-testing). The issue here is that the
> hardware does not set the tx success bit when it receives a
> BA for a sent BAR [looks like it is expecting a legacy ACK?!
> but who knows - the original vendor driver [otus] didn't really
> deal with BARs anyway].
If I got Helmut correctly here
(http://news.gmane.org/find-root.php?group=gmane.linux.kernel.wireless.general&article=83762),
rt2x00pci could have a related problem (probably missing tx status).
Wouldn't it be an idea to try a similar approach?
https://kernel.googlesource.com/pub/scm/linux/kernel/git/linville/wireless-testing/+/c9122c0d63a50bab0a97dc936a38c0f921b6930e^!/
> So the driver has to do that job and currently it's done in
> the following way:
> When mac80211 sends a BAR, the driver will keep a reference of
> the frame around in ar->bar_list, before it is sent on its way.
> If the device receives a BA within the retry window then the
> driver's rx-path will enter carl9170_ba_check. This function
> sets the TX_STAT_ACK flag [for the BAR] if the incoming BA
> matches.
>
> While there are a few problems with this approach [added tx and rx
> overhead due to flexible ba filtering and bar queuing, unwanted BAR
> retries, ...], it was one of the better solution which was within
> the capabilities of the hardware, firmware and driver.
> Obviously, if the ar9170 hardware would just set that tx success
> bit [ath9k mac does this properly], this wouldn't be necessary.
>
> Regards,
> Chr
Thanks,
kind regards,
Andreas
This revert:
commit be03d4a45c09ee5100d3aaaedd087f19bc20d01f
Author: Andreas Hartmann <[email protected]>
Date: Tue Apr 17 00:25:28 2012 +0200
rt2x00: Don't let mac80211 send a BAR when an AMPDU subframe fails
To fix problem workaround by above commit use
IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL flag (see change log for
"mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL" patch).
Resolve: https://bugzilla.kernel.org/show_bug.cgi?id=42828
Bisected-by: Francisco Pina Martins <[email protected]>
Cc: [email protected]
Signed-off-by: Stanislaw Gruszka <[email protected]>
---
drivers/net/wireless/rt2x00/rt2800lib.c | 3 ++-
drivers/net/wireless/rt2x00/rt2x00dev.c | 7 +++----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c b/drivers/net/wireless/rt2x00/rt2800lib.c
index 59474ae..175a9b9 100644
--- a/drivers/net/wireless/rt2x00/rt2800lib.c
+++ b/drivers/net/wireless/rt2x00/rt2800lib.c
@@ -5036,7 +5036,8 @@ static int rt2800_probe_hw_mode(struct rt2x00_dev *rt2x00dev)
IEEE80211_HW_SUPPORTS_PS |
IEEE80211_HW_PS_NULLFUNC_STACK |
IEEE80211_HW_AMPDU_AGGREGATION |
- IEEE80211_HW_REPORTS_TX_ACK_STATUS;
+ IEEE80211_HW_REPORTS_TX_ACK_STATUS |
+ IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL;
/*
* Don't set IEEE80211_HW_HOST_BROADCAST_PS_BUFFERING for USB devices
diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
index 69097d1..b0183d1 100644
--- a/drivers/net/wireless/rt2x00/rt2x00dev.c
+++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
@@ -391,10 +391,9 @@ void rt2x00lib_txdone(struct queue_entry *entry,
tx_info->flags |= IEEE80211_TX_STAT_AMPDU;
tx_info->status.ampdu_len = 1;
tx_info->status.ampdu_ack_len = success ? 1 : 0;
- /*
- * TODO: Need to tear down BA session here
- * if not successful.
- */
+
+ if (!success)
+ tx_info->flags |= IEEE80211_TX_STAT_AMPDU_NO_BACK;
}
if (rate_flags & IEEE80211_TX_RC_USE_RTS_CTS) {
--
1.7.4.4