2012-03-13 15:10:41

by Stanislaw Gruszka

[permalink] [raw]
Subject: [PATCH] iwlwifi: do not nulify ctx->vif on reset

ctx->vif is dereferenced in different part of iwlwifi code, so do not
nullify it.

This should address at least one of the possible reasons of WARNING at
iwlagn_mac_remove_interface, and perhaps some random crashes when
firmware reset is performed.

Cc: [email protected]
Signed-off-by: Stanislaw Gruszka <[email protected]>
---
drivers/net/wireless/iwlwifi/iwl-agn.c | 3 ---
drivers/net/wireless/iwlwifi/iwl-mac80211.c | 10 +++++++++-
2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c b/drivers/net/wireless/iwlwifi/iwl-agn.c
index 28422c0..6375e5a 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn.c
@@ -862,7 +862,6 @@ static void iwl_bg_run_time_calib_work(struct work_struct *work)

void iwlagn_prepare_restart(struct iwl_priv *priv)
{
- struct iwl_rxon_context *ctx;
bool bt_full_concurrent;
u8 bt_ci_compliance;
u8 bt_load;
@@ -871,8 +870,6 @@ void iwlagn_prepare_restart(struct iwl_priv *priv)

lockdep_assert_held(&priv->mutex);

- for_each_context(priv, ctx)
- ctx->vif = NULL;
priv->is_open = 0;

/*
diff --git a/drivers/net/wireless/iwlwifi/iwl-mac80211.c b/drivers/net/wireless/iwlwifi/iwl-mac80211.c
index 9212ee3..4e8786d 100644
--- a/drivers/net/wireless/iwlwifi/iwl-mac80211.c
+++ b/drivers/net/wireless/iwlwifi/iwl-mac80211.c
@@ -1241,6 +1241,7 @@ static int iwlagn_mac_add_interface(struct ieee80211_hw *hw,
struct iwl_rxon_context *tmp, *ctx = NULL;
int err;
enum nl80211_iftype viftype = ieee80211_vif_type_p2p(vif);
+ bool reset = false;

IWL_DEBUG_MAC80211(priv, "enter: type %d, addr %pM\n",
viftype, vif->addr);
@@ -1262,6 +1263,13 @@ static int iwlagn_mac_add_interface(struct ieee80211_hw *hw,
tmp->interface_modes | tmp->exclusive_interface_modes;

if (tmp->vif) {
+ /* On reset we need to add the same interface again */
+ if (tmp->vif == vif) {
+ reset = true;
+ ctx = tmp;
+ break;
+ }
+
/* check if this busy context is exclusive */
if (tmp->exclusive_interface_modes &
BIT(tmp->vif->type)) {
@@ -1288,7 +1296,7 @@ static int iwlagn_mac_add_interface(struct ieee80211_hw *hw,
ctx->vif = vif;

err = iwl_setup_interface(priv, ctx);
- if (!err)
+ if (!err || reset)
goto out;

ctx->vif = NULL;
--
1.7.1



2012-03-14 06:58:31

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Wed, 2012-03-14 at 07:25 +0100, Stanislaw Gruszka wrote:

> > Have you seen crashes due to this? I have seen the warning in the past
> > but in some recent firmware reset testing I never ran into it again...
>
> I have vif == NULL crashes reports on old RHEL6, that _possibly_ are
> caused by this issue, but I'm not 100% sure, RHEL6 include own vif related
> changes needed by backport.
>
> I'm not able to reproduce vif crashes on force_reset on current
> wireless-testing, but when I transmit data and trigger a reset,
> device stops working. There are some messages [1] (20:00.0 is 6300
> adapter, but on others I tested 5300 and 100 reset does not work too).
>
> So hw reset is completely broken at present, perhaps is time to remove
> watchdog and related infrastructure.

I wish :-)
But last I tried (after refactoring queue stop/wake), I could
successfully recover from hw restart in the middle of aggregation
sessions -- which code did you try with?

johannes


2012-03-26 14:31:50

by Wey-Yi Guy

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Mon, 2012-03-26 at 11:08 +0200, Johannes Berg wrote:
> On Wed, 2012-03-14 at 09:14 +0100, Stanislaw Gruszka wrote:
> > On Wed, Mar 14, 2012 at 08:07:48AM +0100, Stanislaw Gruszka wrote:
> > > I'll update and recheck again.
> >
> > On updated tree (head 035364916f75151b4b91ea53968c6beba7545317) devices
> > stop working here on any forced reset during TX as well. There are
> > messages like below in dmesg:
> >
> > wlan3: dropped data frame to not associated station 00:00:00:00:00:00
> >
> > And one time crash happened too, in:
> >
> > (gdb) l *(iwl_remove_dynamic_key+0x1f0)
> > 0x15e20 is in iwl_remove_dynamic_key (drivers/net/wireless/iwlwifi/iwl-agn-sta.c:1105).
> > 1100 /*
> > 1101 * The device expects GTKs for station interfaces to be
> > 1102 * installed as GTKs for the AP station. If we have no
> > 1103 * station ID, then use the ap_sta_id in that case.
> > 1104 */
> > 1105 if (vif->type == NL80211_IFTYPE_STATION && vif_priv->ctx)
> > 1106 return vif_priv->ctx->ap_sta_id;
> > 1107
> > 1108 return IWL_INVALID_STATION;
> > 1109 }
> >
> > To reproduce problems, I'm doing "ping -f 192.168.1.1" on one console
> > and run script [1] on other console.
>
> Ok so I finally got some time to look into this in some detail. It seems
> that the driver assumes a few calls from mac80211 like key settings can
> only happen while the vif pointers are valid, which is correct under
> normal circumstances but due to races with HW reset it's not really true
> there.
>
> We could fix all of those, but it seems your patch is the better
> approach. Wey, can you pick it up?

yes

Wey
>



2012-03-16 15:16:24

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Fri, 2012-03-16 at 15:37 +0100, Stanislaw Gruszka wrote:

> > To reproduce problems, I'm doing "ping -f 192.168.1.1" on one console
> > and run script [1] on other console.
>
> So what we do here? Patch fix possible crash on firmware restart.

Still working on it. I am still recovering from being sick and spending
much of the last couple of days in bed.

> We have a few places where ctx->vif is dereferences without a NULL
> check. Perhaps those places should be verified instead. What protects
> to dereference ctx->vif after iwl_mac_remove_interface() ?

Exactly my point -- if a place uses ctx->vif then it's quite possibly a
bug anyway, even not considering restart.

> Regarding restart, this apparently is broken - hung the adapter such the
> module reload is needed. To mitigate the problems You just disable
> watchdog on various adapters.
>
> I do not think restart should be fixed. I think better would be disable
> restart by default (set fw_restart=0) and start to solve real problems :-)

> This is of course not easy, but perhaps could be done using tracing.

I don't think this is an option since it's essentially also a safety-net
or workaround for microcode bugs. Preparing and testing microcode for
release is a rather long process so we can't re-release for all bug
fixes, as nice as that'd be. Sometimes we can't even get a simple fix
developed when the versions have diverged.

> It
> offer a nice feature that allow to enable and disable logging on
> runtime, (tracing_on(), tracing_off() functions). So offer a compile
> option to make IWL_DEBUG use trace_printk(), start tracing on module
> load, disable it on microcode error or queue stuck. All of that should
> help with debugging inimitable problems, as ftrace log data into memory
> and use ring buffer, hence allow to see (lot of) latest actions before
> the problem occurs. I could write the patches, but since you are
> constantly refactoring the iwlwifi driver better if you will do this.
> How about that?

I don't think we can do this, there'd be a ton of data in it (all TX/RX)
and most of that isn't relevant -- quite likely that the relevant data
is out of the buffers already by the time we need it.

johannes


2012-03-14 06:56:50

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Tue, Mar 13, 2012 at 04:15:30PM +0100, Johannes Berg wrote:
> On Tue, 2012-03-13 at 16:10 +0100, Stanislaw Gruszka wrote:
> > ctx->vif is dereferenced in different part of iwlwifi code, so do not
> > nullify it.
> >
> > This should address at least one of the possible reasons of WARNING at
> > iwlagn_mac_remove_interface, and perhaps some random crashes when
> > firmware reset is performed.
>
> I'm not completely convinced --
Me too :-)

> there are parts of the driver that are
> active even when no interface has ever been added.

Yes, sometimes vif is checked and behaviour is different when it is
NULL. I'm not sure how about iwlwifi, because I did not analyse all of
ctx->vif usage, but on iwlegacy we have code which do not check vif
against NULL or do check, but this is not synchronized by il->mutex.

> Have you seen crashes due to this? I have seen the warning in the past
> but in some recent firmware reset testing I never ran into it again...

I have vif == NULL crashes reports on old RHEL6, that _possibly_ are
caused by this issue, but I'm not 100% sure, RHEL6 include own vif related
changes needed by backport.

I'm not able to reproduce vif crashes on force_reset on current
wireless-testing, but when I transmit data and trigger a reset,
device stops working. There are some messages [1] (20:00.0 is 6300
adapter, but on others I tested 5300 and 100 reset does not work too).

So hw reset is completely broken at present, perhaps is time to remove
watchdog and related infrastructure.

Stanislaw

[1]

[ 973.242040] iwlwifi 0000:a0:00.0: On demand firmware reload
[ 973.242130] iwlwifi 0000:40:00.0: On demand firmware reload
[ 973.242317] iwlwifi 0000:20:00.0: On demand firmware reload
[ 973.242571] ieee80211 phy0: Hardware restart was requested
[ 973.249575] ieee80211 phy2: Hardware restart was requested
[ 973.249638] iwlwifi 0000:a0:00.0: L1 Disabled; Enabling L0S
[ 973.249755] ieee80211 phy1: Hardware restart was requested
[ 973.252883] iwlwifi 0000:a0:00.0: Radio type=0x0-0x2-0x0
[ 973.305621] iwlwifi 0000:40:00.0: L1 Disabled; Enabling L0S
[ 973.368360] wlan18: dropped data frame to not associated station 00:00:00:00:00:00
[ 973.368467] iwlwifi 0000:20:00.0: L1 Disabled; Enabling L0S
[ 973.375227] iwlwifi 0000:20:00.0: Radio type=0x0-0x3-0x1
[ 975.467038] iwlwifi 0000:20:00.0: Error sending REPLY_TX_LINK_QUALITY_CMD: time out after 2000ms.
[ 975.467043] iwlwifi 0000:20:00.0: Current CMD queue read_ptr 11 write_ptr 12
[ 975.943046] iwlwifi 0000:20:00.0: Queue 4 stuck for 2000 ms.
[ 975.943050] iwlwifi 0000:20:00.0: Current SW read_ptr 11 write_ptr 13
[ 975.949740] iwlwifi 0000:20:00.0: Current HW read_ptr 13 write_ptr 13
[ 976.450021] iwlwifi 0000:20:00.0: Queue 4 stuck for 2000 ms.
[ 976.450026] iwlwifi 0000:20:00.0: Current SW read_ptr 11 write_ptr 13
[ 976.456709] iwlwifi 0000:20:00.0: Current HW read_ptr 13 write_ptr 13
[ 976.957012] iwlwifi 0000:20:00.0: Queue 4 stuck for 2000 ms.
[ 976.957016] iwlwifi 0000:20:00.0: Current SW read_ptr 11 write_ptr 13
[ 976.963689] iwlwifi 0000:20:00.0: Current HW read_ptr 13 write_ptr 13
[ 977.464014] iwlwifi 0000:20:00.0: Queue 4 stuck for 2000 ms.
[ 977.464020] iwlwifi 0000:20:00.0: Current SW read_ptr 11 write_ptr 13
[ 977.470701] iwlwifi 0000:20:00.0: Current HW read_ptr 13 write_ptr 13
[ 977.471024] iwlwifi 0000:20:00.0: Error sending REPLY_CT_KILL_CONFIG_CMD: time out after 2000ms.
[ 977.471028] iwlwifi 0000:20:00.0: Current CMD queue read_ptr 11 write_ptr 13
[ 977.471032] iwlwifi 0000:20:00.0: REPLY_CT_KILL_CONFIG_CMD failed
[ 977.971017] iwlwifi 0000:20:00.0: Queue 4 stuck for 2000 ms.
[ 977.971023] iwlwifi 0000:20:00.0: Current SW read_ptr 11 write_ptr 14
[ 977.977697] iwlwifi 0000:20:00.0: Current HW read_ptr 14 write_ptr 14
[ 978.478020] iwlwifi 0000:20:00.0: Queue 4 stuck for 2000 ms.
[ 978.478024] iwlwifi 0000:20:00.0: Current SW read_ptr 11 write_ptr 14
[ 978.484706] iwlwifi 0000:20:00.0: Current HW read_ptr 14 write_ptr 14
[ 978.484710] iwlwifi 0000:20:00.0: On demand firmware reload
[ 978.485017] ------------[ cut here ]------------
[ 978.485038] WARNING: at drivers/net/wireless/iwlwifi/iwl-mac80211.c:325 iwlagn_mac_start+0x10e/0x110 [iwlwifi]()
[ 978.485041] Hardware name: HP xw8600 Workstation
[ 978.485044] Modules linked in: aes_generic arc4 iwlwifi mac80211 cfg80211 fuse autofs4 cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext3 jbd dm_mirror dm_region_hash dm_log dm_mod uinput sg hp_wmi sparse_keymap serio_raw microcode shpchp i5400_edac edac_core ext4 mbcache jbd2 firewire_ohci firewire_core crc_itu_t sd_mod crc_t10dif sr_mod cdrom mptsas mptscsih mptbase scsi_transport_sas ahci libahci pata_acpi ata_generic ata_piix nouveau ttm drm_kms_helper drm hwmon i2c_core mxm_wmi video wmi [last unloaded: scsi_wait_scan]
[ 978.485137] Pid: 4212, comm: kworker/2:2 Not tainted 3.3.0-rc6-wl+ #7
[ 978.485140] Call Trace:
[ 978.485151] [<ffffffff8105104f>] warn_slowpath_common+0x7f/0xc0
[ 978.485156] [<ffffffff810510aa>] warn_slowpath_null+0x1a/0x20
[ 978.485166] [<ffffffffa0510bfe>] iwlagn_mac_start+0x10e/0x110 [iwlwifi]
[ 978.485196] [<ffffffffa0496377>] ieee80211_reconfig+0x1b7/0xcd0 [mac80211]
[ 978.485203] [<ffffffff814e366e>] ? mutex_unlock+0xe/0x10
[ 978.485217] [<ffffffffa0471b79>] ieee80211_restart_work+0x89/0xb0 [mac80211]
[ 978.485222] [<ffffffff8106ecae>] process_one_work+0x1ae/0x500
[ 978.485226] [<ffffffff8106ec3f>] ? process_one_work+0x13f/0x500
[ 978.485239] [<ffffffffa0471af0>] ? ieee80211_recalc_smps_work+0x50/0x50 [mac80211]
[ 978.485244] [<ffffffff810713db>] worker_thread+0x17b/0x3b0
[ 978.485249] [<ffffffff81071260>] ? manage_workers+0x120/0x120
[ 978.485253] [<ffffffff810768c6>] kthread+0xc6/0xd0
[ 978.485259] [<ffffffff81084d9b>] ? finish_task_switch+0x4b/0xf0
[ 978.485264] [<ffffffff814f01b4>] kernel_thread_helper+0x4/0x10
[ 978.485269] [<ffffffff814e6874>] ? retint_restore_args+0x13/0x13
[ 978.485273] [<ffffffff81076800>] ? __init_kthread_worker+0x70/0x70
[ 978.485277] [<ffffffff814f01b0>] ? gs_change+0x13/0x13
[

2012-03-26 18:52:52

by Wey-Yi Guy

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

Hi John,

On Mon, 2012-03-26 at 14:42 -0400, John W. Linville wrote:
> On Mon, Mar 26, 2012 at 07:30:32AM -0700, Guy, Wey-Yi wrote:
> > On Mon, 2012-03-26 at 11:08 +0200, Johannes Berg wrote:
> > > On Wed, 2012-03-14 at 09:14 +0100, Stanislaw Gruszka wrote:
> > > > On Wed, Mar 14, 2012 at 08:07:48AM +0100, Stanislaw Gruszka wrote:
> > > > > I'll update and recheck again.
> > > >
> > > > On updated tree (head 035364916f75151b4b91ea53968c6beba7545317) devices
> > > > stop working here on any forced reset during TX as well. There are
> > > > messages like below in dmesg:
> > > >
> > > > wlan3: dropped data frame to not associated station 00:00:00:00:00:00
> > > >
> > > > And one time crash happened too, in:
> > > >
> > > > (gdb) l *(iwl_remove_dynamic_key+0x1f0)
> > > > 0x15e20 is in iwl_remove_dynamic_key (drivers/net/wireless/iwlwifi/iwl-agn-sta.c:1105).
> > > > 1100 /*
> > > > 1101 * The device expects GTKs for station interfaces to be
> > > > 1102 * installed as GTKs for the AP station. If we have no
> > > > 1103 * station ID, then use the ap_sta_id in that case.
> > > > 1104 */
> > > > 1105 if (vif->type == NL80211_IFTYPE_STATION && vif_priv->ctx)
> > > > 1106 return vif_priv->ctx->ap_sta_id;
> > > > 1107
> > > > 1108 return IWL_INVALID_STATION;
> > > > 1109 }
> > > >
> > > > To reproduce problems, I'm doing "ping -f 192.168.1.1" on one console
> > > > and run script [1] on other console.
> > >
> > > Ok so I finally got some time to look into this in some detail. It seems
> > > that the driver assumes a few calls from mac80211 like key settings can
> > > only happen while the vif pointers are valid, which is correct under
> > > normal circumstances but due to races with HW reset it's not really true
> > > there.
> > >
> > > We could fix all of those, but it seems your patch is the better
> > > approach. Wey, can you pick it up?
> >
> > yes
>
> Should I just take it and try to get it merged for -rc1?

I am taking it into our internal tree for our regression, if it is ok
for you, could you wait a bit, I will push it with all our other
patches.

btw, I have pretty large queue of patches again (~50+), this time I will
break it up and push separated :-)

Thanks
Wey



2012-03-14 08:14:57

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Wed, Mar 14, 2012 at 08:07:48AM +0100, Stanislaw Gruszka wrote:
> I'll update and recheck again.

On updated tree (head 035364916f75151b4b91ea53968c6beba7545317) devices
stop working here on any forced reset during TX as well. There are
messages like below in dmesg:

wlan3: dropped data frame to not associated station 00:00:00:00:00:00

And one time crash happened too, in:

(gdb) l *(iwl_remove_dynamic_key+0x1f0)
0x15e20 is in iwl_remove_dynamic_key (drivers/net/wireless/iwlwifi/iwl-agn-sta.c:1105).
1100 /*
1101 * The device expects GTKs for station interfaces to be
1102 * installed as GTKs for the AP station. If we have no
1103 * station ID, then use the ap_sta_id in that case.
1104 */
1105 if (vif->type == NL80211_IFTYPE_STATION && vif_priv->ctx)
1106 return vif_priv->ctx->ap_sta_id;
1107
1108 return IWL_INVALID_STATION;
1109 }

To reproduce problems, I'm doing "ping -f 192.168.1.1" on one console
and run script [1] on other console.

Stanislaw

[1]

#!/bin/bash

mount -t debugfs debugfs /sys/kernel/debug
cd /sys/kernel/debug/ieee80211

while true; do

sleep 20
for i in `ls -X ` ; do echo 1 > $i/iwlwifi/debug/force_reset ; done

done

2012-03-14 07:07:57

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Wed, Mar 14, 2012 at 07:58:24AM +0100, Johannes Berg wrote:
> But last I tried (after refactoring queue stop/wake), I could
> successfully recover from hw restart in the middle of aggregation
> sessions -- which code did you try with?

commit 2c32d37f937092466d7f32e369854581caebe03d
Merge: 8da63e5 1745e44
Author: John W. Linville <[email protected]>
Date: Fri Mar 9 14:41:43 2012 -0500

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next

I'll update and recheck again.

Stanislaw

2012-03-26 09:08:31

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Wed, 2012-03-14 at 09:14 +0100, Stanislaw Gruszka wrote:
> On Wed, Mar 14, 2012 at 08:07:48AM +0100, Stanislaw Gruszka wrote:
> > I'll update and recheck again.
>
> On updated tree (head 035364916f75151b4b91ea53968c6beba7545317) devices
> stop working here on any forced reset during TX as well. There are
> messages like below in dmesg:
>
> wlan3: dropped data frame to not associated station 00:00:00:00:00:00
>
> And one time crash happened too, in:
>
> (gdb) l *(iwl_remove_dynamic_key+0x1f0)
> 0x15e20 is in iwl_remove_dynamic_key (drivers/net/wireless/iwlwifi/iwl-agn-sta.c:1105).
> 1100 /*
> 1101 * The device expects GTKs for station interfaces to be
> 1102 * installed as GTKs for the AP station. If we have no
> 1103 * station ID, then use the ap_sta_id in that case.
> 1104 */
> 1105 if (vif->type == NL80211_IFTYPE_STATION && vif_priv->ctx)
> 1106 return vif_priv->ctx->ap_sta_id;
> 1107
> 1108 return IWL_INVALID_STATION;
> 1109 }
>
> To reproduce problems, I'm doing "ping -f 192.168.1.1" on one console
> and run script [1] on other console.

Ok so I finally got some time to look into this in some detail. It seems
that the driver assumes a few calls from mac80211 like key settings can
only happen while the vif pointers are valid, which is correct under
normal circumstances but due to races with HW reset it's not really true
there.

We could fix all of those, but it seems your patch is the better
approach. Wey, can you pick it up?

johannes


2012-03-26 18:47:19

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Mon, Mar 26, 2012 at 07:30:32AM -0700, Guy, Wey-Yi wrote:
> On Mon, 2012-03-26 at 11:08 +0200, Johannes Berg wrote:
> > On Wed, 2012-03-14 at 09:14 +0100, Stanislaw Gruszka wrote:
> > > On Wed, Mar 14, 2012 at 08:07:48AM +0100, Stanislaw Gruszka wrote:
> > > > I'll update and recheck again.
> > >
> > > On updated tree (head 035364916f75151b4b91ea53968c6beba7545317) devices
> > > stop working here on any forced reset during TX as well. There are
> > > messages like below in dmesg:
> > >
> > > wlan3: dropped data frame to not associated station 00:00:00:00:00:00
> > >
> > > And one time crash happened too, in:
> > >
> > > (gdb) l *(iwl_remove_dynamic_key+0x1f0)
> > > 0x15e20 is in iwl_remove_dynamic_key (drivers/net/wireless/iwlwifi/iwl-agn-sta.c:1105).
> > > 1100 /*
> > > 1101 * The device expects GTKs for station interfaces to be
> > > 1102 * installed as GTKs for the AP station. If we have no
> > > 1103 * station ID, then use the ap_sta_id in that case.
> > > 1104 */
> > > 1105 if (vif->type == NL80211_IFTYPE_STATION && vif_priv->ctx)
> > > 1106 return vif_priv->ctx->ap_sta_id;
> > > 1107
> > > 1108 return IWL_INVALID_STATION;
> > > 1109 }
> > >
> > > To reproduce problems, I'm doing "ping -f 192.168.1.1" on one console
> > > and run script [1] on other console.
> >
> > Ok so I finally got some time to look into this in some detail. It seems
> > that the driver assumes a few calls from mac80211 like key settings can
> > only happen while the vif pointers are valid, which is correct under
> > normal circumstances but due to races with HW reset it's not really true
> > there.
> >
> > We could fix all of those, but it seems your patch is the better
> > approach. Wey, can you pick it up?
>
> yes

Should I just take it and try to get it merged for -rc1?

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2012-03-13 15:15:34

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Tue, 2012-03-13 at 16:10 +0100, Stanislaw Gruszka wrote:
> ctx->vif is dereferenced in different part of iwlwifi code, so do not
> nullify it.
>
> This should address at least one of the possible reasons of WARNING at
> iwlagn_mac_remove_interface, and perhaps some random crashes when
> firmware reset is performed.

I'm not completely convinced -- there are parts of the driver that are
active even when no interface has ever been added.

Have you seen crashes due to this? I have seen the warning in the past
but in some recent firmware reset testing I never ran into it again...

johannes

> Cc: [email protected]
> Signed-off-by: Stanislaw Gruszka <[email protected]>
> ---
> drivers/net/wireless/iwlwifi/iwl-agn.c | 3 ---
> drivers/net/wireless/iwlwifi/iwl-mac80211.c | 10 +++++++++-
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c b/drivers/net/wireless/iwlwifi/iwl-agn.c
> index 28422c0..6375e5a 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c
> @@ -862,7 +862,6 @@ static void iwl_bg_run_time_calib_work(struct work_struct *work)
>
> void iwlagn_prepare_restart(struct iwl_priv *priv)
> {
> - struct iwl_rxon_context *ctx;
> bool bt_full_concurrent;
> u8 bt_ci_compliance;
> u8 bt_load;
> @@ -871,8 +870,6 @@ void iwlagn_prepare_restart(struct iwl_priv *priv)
>
> lockdep_assert_held(&priv->mutex);
>
> - for_each_context(priv, ctx)
> - ctx->vif = NULL;
> priv->is_open = 0;
>
> /*
> diff --git a/drivers/net/wireless/iwlwifi/iwl-mac80211.c b/drivers/net/wireless/iwlwifi/iwl-mac80211.c
> index 9212ee3..4e8786d 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-mac80211.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-mac80211.c
> @@ -1241,6 +1241,7 @@ static int iwlagn_mac_add_interface(struct ieee80211_hw *hw,
> struct iwl_rxon_context *tmp, *ctx = NULL;
> int err;
> enum nl80211_iftype viftype = ieee80211_vif_type_p2p(vif);
> + bool reset = false;
>
> IWL_DEBUG_MAC80211(priv, "enter: type %d, addr %pM\n",
> viftype, vif->addr);
> @@ -1262,6 +1263,13 @@ static int iwlagn_mac_add_interface(struct ieee80211_hw *hw,
> tmp->interface_modes | tmp->exclusive_interface_modes;
>
> if (tmp->vif) {
> + /* On reset we need to add the same interface again */
> + if (tmp->vif == vif) {
> + reset = true;
> + ctx = tmp;
> + break;
> + }
> +
> /* check if this busy context is exclusive */
> if (tmp->exclusive_interface_modes &
> BIT(tmp->vif->type)) {
> @@ -1288,7 +1296,7 @@ static int iwlagn_mac_add_interface(struct ieee80211_hw *hw,
> ctx->vif = vif;
>
> err = iwl_setup_interface(priv, ctx);
> - if (!err)
> + if (!err || reset)
> goto out;
>
> ctx->vif = NULL;



2012-03-16 15:08:19

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: do not nulify ctx->vif on reset

On Wed, Mar 14, 2012 at 09:14:24AM +0100, Stanislaw Gruszka wrote:
> On Wed, Mar 14, 2012 at 08:07:48AM +0100, Stanislaw Gruszka wrote:
> > I'll update and recheck again.
>
> On updated tree (head 035364916f75151b4b91ea53968c6beba7545317) devices
> stop working here on any forced reset during TX as well. There are
> messages like below in dmesg:
>
> wlan3: dropped data frame to not associated station 00:00:00:00:00:00
>
> And one time crash happened too, in:
>
> (gdb) l *(iwl_remove_dynamic_key+0x1f0)
> 0x15e20 is in iwl_remove_dynamic_key (drivers/net/wireless/iwlwifi/iwl-agn-sta.c:1105).
> 1100 /*
> 1101 * The device expects GTKs for station interfaces to be
> 1102 * installed as GTKs for the AP station. If we have no
> 1103 * station ID, then use the ap_sta_id in that case.
> 1104 */
> 1105 if (vif->type == NL80211_IFTYPE_STATION && vif_priv->ctx)
> 1106 return vif_priv->ctx->ap_sta_id;
> 1107
> 1108 return IWL_INVALID_STATION;
> 1109 }
>
> To reproduce problems, I'm doing "ping -f 192.168.1.1" on one console
> and run script [1] on other console.

So what we do here? Patch fix possible crash on firmware restart.
We have a few places where ctx->vif is dereferences without a NULL
check. Perhaps those places should be verified instead. What protects
to dereference ctx->vif after iwl_mac_remove_interface() ?

Regarding restart, this apparently is broken - hung the adapter such the
module reload is needed. To mitigate the problems You just disable
watchdog on various adapters.

I do not think restart should be fixed. I think better would be disable
restart by default (set fw_restart=0) and start to solve real problems :-)

This is of course not easy, but perhaps could be done using tracing. It
offer a nice feature that allow to enable and disable logging on
runtime, (tracing_on(), tracing_off() functions). So offer a compile
option to make IWL_DEBUG use trace_printk(), start tracing on module
load, disable it on microcode error or queue stuck. All of that should
help with debugging inimitable problems, as ftrace log data into memory
and use ring buffer, hence allow to see (lot of) latest actions before
the problem occurs. I could write the patches, but since you are
constantly refactoring the iwlwifi driver better if you will do this.
How about that?

Stanislaw