2010-05-13 21:49:50

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 0/2] iwlwifi fixes for 2.6.34

We include two late fixes hoping to make it into 2.6.34.

"iwlwifi: fix internal scan race" fixes a kernel crash caused by
accessing a NULL pointer. More can be read in
https://bugzilla.kernel.org/show_bug.cgi?id=15824

"iwlagn: work around rate scaling reset delay" is already in
wireless-next-2.6 but has since been shown to be an issue in 2.6.34 also.
This fixes spurious firmware restarts that has been documented in
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2173 , but it is the
community report tracked in
https://bugzilla.kernel.org/show_bug.cgi?id=15782 that prompted us sending
this fix to wireless-2.6 also.

These patches are also available from wireless-2.6 branch on
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-2.6.git

Reinette Chatre (2):
iwlwifi: fix internal scan race
iwlagn: work around rate scaling reset delay

drivers/net/wireless/iwlwifi/iwl-scan.c | 21 +++++++++++++++--
drivers/net/wireless/iwlwifi/iwl-sta.c | 37 ++++++++++++++++++++++++++++++-
2 files changed, 54 insertions(+), 4 deletions(-)



2010-05-13 22:58:02

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] iwlwifi fixes for 2.6.34

Sorry, I was confused by the comments in [1] and I am especially
interested in the internal scans stuff:

Port following patch to 3945.

"commit 90c4162ff59a3281b6d2f7206740be6217bd6758
Author: Johannes Berg <[email protected]>
Date: Wed Apr 7 00:21:36 2010 -0700
iwlwifi: fix scan races"

The above mentionned patch is already accepted to upstream (2.6.34)
[2] and iwlagn _is_ already using internal scans. So why is iwl3945
different in 2.6.34 especially in that case?

On first sight, I can't see the correlation of "iwl3945: add plcp
error checking" [3] and "iwl3945: fix scan races" [1].

- Sedat -

[1] https://patchwork.kernel.org/patch/98326/
[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=88be026490ed89c2ffead81a52531fbac5507e01
[3] http://git.kernel.org/?p=linux/kernel/git/iwlwifi/iwlwifi-2.6.git;a=commit;h=a29576a7844326c5223f4d4adbfd3f4d64173d4c

On Fri, May 14, 2010 at 12:15 AM, reinette chatre
<[email protected]> wrote:
> On Thu, 2010-05-13 at 14:54 -0700, Sedat Dilek wrote:
>> Whats with "iwl3945: fix scan races"?
>
> hmmm ... cryptic indeed ... I assume you are asking "Why is "iwl3945:
> fix scan races" not part of a submission to 2.6.34?
>
> If that is the case then yes, indeed , we did not submit "iwl3945: fix
> scan races" to 2.6.34 since the scan races being fixed is between normal
> (mac80211 initiated) and internal (as part of rf reset) scans. Like I
> mention in the cover letter of the submission that includes that patch
> (http://thread.gmane.org/gmane.linux.kernel.wireless.general/50651 ) we
> introduce RF reset usage to 3945 through the new "plcp error checking"
> patch and thus need the scan races fix for that. Before that patch
> nothing in iwl3945 was using RF reset and thus no internal scanning that
> could trigger a race.
>
> Reinette
>
>
>
>
>

2010-05-13 21:49:50

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 2/2] iwlagn: work around rate scaling reset delay

From: Reinette Chatre <[email protected]>

When station is using an HT channel to communicate to AP and communication
is lost then driver will first be notified that channel is not an HT
channel anymore before AP station is removed. A consequence of that is that
the driver will know that it is not communicating on HT anymore, but the
rate scaling table is still under the impression it is operating in HT. Any
time after driver has been notified channel is not HT anymore there will
thus be a firmware SYSASSERT when the current active LQ command is sent.

A workaround for this issue is to not send a LQ command in the short time between
being notified channel is not HT anymore and rate scaling table being
updated.

This fixes http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2173

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/net/wireless/iwlwifi/iwl-sta.c | 37 +++++++++++++++++++++++++++++++-
1 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-sta.c b/drivers/net/wireless/iwlwifi/iwl-sta.c
index 4a6686f..702672d 100644
--- a/drivers/net/wireless/iwlwifi/iwl-sta.c
+++ b/drivers/net/wireless/iwlwifi/iwl-sta.c
@@ -948,6 +948,39 @@ static inline void iwl_dump_lq_cmd(struct iwl_priv *priv,
}
#endif

+/**
+ * is_lq_table_valid() - Test one aspect of LQ cmd for validity
+ *
+ * It sometimes happens when a HT rate has been in use and we
+ * loose connectivity with AP then mac80211 will first tell us that the
+ * current channel is not HT anymore before removing the station. In such a
+ * scenario the RXON flags will be updated to indicate we are not
+ * communicating HT anymore, but the LQ command may still contain HT rates.
+ * Test for this to prevent driver from sending LQ command between the time
+ * RXON flags are updated and when LQ command is updated.
+ */
+static bool is_lq_table_valid(struct iwl_priv *priv,
+ struct iwl_link_quality_cmd *lq)
+{
+ int i;
+ struct iwl_ht_config *ht_conf = &priv->current_ht_config;
+
+ if (ht_conf->is_ht)
+ return true;
+
+ IWL_DEBUG_INFO(priv, "Channel %u is not an HT channel\n",
+ priv->active_rxon.channel);
+ for (i = 0; i < LINK_QUAL_MAX_RETRY_NUM; i++) {
+ if (le32_to_cpu(lq->rs_table[i].rate_n_flags) & RATE_MCS_HT_MSK) {
+ IWL_DEBUG_INFO(priv,
+ "index %d of LQ expects HT channel\n",
+ i);
+ return false;
+ }
+ }
+ return true;
+}
+
int iwl_send_lq_cmd(struct iwl_priv *priv,
struct iwl_link_quality_cmd *lq, u8 flags)
{
@@ -967,7 +1000,9 @@ int iwl_send_lq_cmd(struct iwl_priv *priv,

iwl_dump_lq_cmd(priv, lq);

- if (iwl_is_associated(priv) && priv->assoc_station_added)
+ if (iwl_is_associated(priv) &&
+ priv->assoc_station_added &&
+ is_lq_table_valid(priv, lq))
return iwl_send_cmd(priv, &cmd);

return 0;
--
1.6.3.3


2010-05-13 21:54:49

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] iwlwifi fixes for 2.6.34

Whats with "iwl3945: fix scan races"?

- Sedat -

[1] https://patchwork.kernel.org/patch/98326/

On Thu, May 13, 2010 at 11:49 PM, Reinette Chatre
<[email protected]> wrote:
> We include two late fixes hoping to make it into 2.6.34.
>
> "iwlwifi: fix internal scan race" fixes a kernel crash caused by
> accessing a NULL pointer. More can be read in
> https://bugzilla.kernel.org/show_bug.cgi?id=15824
>
> "iwlagn: work around rate scaling reset delay" is already in
> wireless-next-2.6 but has since been shown to be an issue in 2.6.34 also.
> This fixes spurious firmware restarts that has been documented in
> http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2173 , but it is the
> community report tracked in
> https://bugzilla.kernel.org/show_bug.cgi?id=15782 that prompted us sending
> this fix to wireless-2.6 also.
>
> These patches are also available from wireless-2.6 branch on
> git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-2.6.git
>
> Reinette Chatre (2):
>  iwlwifi: fix internal scan race
>  iwlagn: work around rate scaling reset delay
>
>  drivers/net/wireless/iwlwifi/iwl-scan.c |   21 +++++++++++++++--
>  drivers/net/wireless/iwlwifi/iwl-sta.c  |   37 ++++++++++++++++++++++++++++++-
>  2 files changed, 54 insertions(+), 4 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

2010-05-13 22:15:33

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 0/2] iwlwifi fixes for 2.6.34

On Thu, 2010-05-13 at 14:54 -0700, Sedat Dilek wrote:
> Whats with "iwl3945: fix scan races"?

hmmm ... cryptic indeed ... I assume you are asking "Why is "iwl3945:
fix scan races" not part of a submission to 2.6.34?

If that is the case then yes, indeed , we did not submit "iwl3945: fix
scan races" to 2.6.34 since the scan races being fixed is between normal
(mac80211 initiated) and internal (as part of rf reset) scans. Like I
mention in the cover letter of the submission that includes that patch
(http://thread.gmane.org/gmane.linux.kernel.wireless.general/50651 ) we
introduce RF reset usage to 3945 through the new "plcp error checking"
patch and thus need the scan races fix for that. Before that patch
nothing in iwl3945 was using RF reset and thus no internal scanning that
could trigger a race.

Reinette





2010-05-13 21:49:50

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 1/2] iwlwifi: fix internal scan race

From: Reinette Chatre <[email protected]>

It is possible for internal scan to race against itself if the device is
not returning the scan results from first requests. What happens in this
case is the cleanup done during the abort of the first internal scan also
cleans up part of the new scan, causing it to access memory it shouldn't.

Here are details:
* First internal scan is triggered and scan command sent to device.
* After seven seconds there is no scan results so the watchdog timer
triggers a scan abort.
* The scan abort succeeds and a SCAN_COMPLETE_NOTIFICATION is received for
failed scan.
* During processing of SCAN_COMPLETE_NOTIFICATION we clear STATUS_SCANNING
and queue the "scan_completed" work.
** At this time, since the problem that caused the internal scan in first
place is still present, a new internal scan is triggered.
The behavior at this point is a bit different between 2.6.34 and 2.6.35
since 2.6.35 has a lot of this synchronized. The rest of the race
description will thus be generalized.
** As part of preparing for the scan "is_internal_short_scan" is set to
true.
* At this point the completion work for fist scan is run. As part of this
there is some locking missing around the "is_internal_short_scan"
variable and it is set to "false".
** Now the second scan runs and it considers itself a real (not internal0
scan and thus causes problems with wrong memory being accessed.

The fix is twofold.
* Since "is_internal_short_scan" should be protected by mutex, fix this in
scan completion work so that changes to it can be serialized.
* Do not queue a new internal scan if one is in progress.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=15824

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/net/wireless/iwlwifi/iwl-scan.c | 21 ++++++++++++++++++---
1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-scan.c b/drivers/net/wireless/iwlwifi/iwl-scan.c
index 2367286..a2c4855 100644
--- a/drivers/net/wireless/iwlwifi/iwl-scan.c
+++ b/drivers/net/wireless/iwlwifi/iwl-scan.c
@@ -560,6 +560,11 @@ static void iwl_bg_start_internal_scan(struct work_struct *work)

mutex_lock(&priv->mutex);

+ if (priv->is_internal_short_scan == true) {
+ IWL_DEBUG_SCAN(priv, "Internal scan already in progress\n");
+ goto unlock;
+ }
+
if (!iwl_is_ready_rf(priv)) {
IWL_DEBUG_SCAN(priv, "not ready or exit pending\n");
goto unlock;
@@ -957,17 +962,27 @@ void iwl_bg_scan_completed(struct work_struct *work)
{
struct iwl_priv *priv =
container_of(work, struct iwl_priv, scan_completed);
+ bool internal = false;

IWL_DEBUG_SCAN(priv, "SCAN complete scan\n");

cancel_delayed_work(&priv->scan_check);

- if (!priv->is_internal_short_scan)
- ieee80211_scan_completed(priv->hw, false);
- else {
+ mutex_lock(&priv->mutex);
+ if (priv->is_internal_short_scan) {
priv->is_internal_short_scan = false;
IWL_DEBUG_SCAN(priv, "internal short scan completed\n");
+ internal = true;
}
+ mutex_unlock(&priv->mutex);
+
+ /*
+ * Do not hold mutex here since this will cause mac80211 to call
+ * into driver again into functions that will attempt to take
+ * mutex.
+ */
+ if (!internal)
+ ieee80211_scan_completed(priv->hw, false);

if (test_bit(STATUS_EXIT_PENDING, &priv->status))
return;
--
1.6.3.3