LinuxLists.cc - [PATCH] wl12xx: trigger recovery for failures in sched scan stop

2011-12-21 12:06:35

Subject: [PATCH] wl12xx: trigger recovery for failures in sched scan stop

mac80211 requires that drv_sched_scan_stop
will always succeed. We have a few possible errors
such as OOM, failure to ELP wakeup, fail in the FW command.
Instead of no-op , trigger a recovery which would
mark sched scan as stopped and clear up any bad FW state.

Signed-off-by: Eyal Shapira <[email protected]>
---
drivers/net/wireless/wl12xx/main.c | 4 +++-
drivers/net/wireless/wl12xx/scan.c | 14 +++++++++-----
2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/wl12xx/main.c b/drivers/net/wireless/wl12xx/main.c
index d5f55a1..0d3de3e 100644
--- a/drivers/net/wireless/wl12xx/main.c
+++ b/drivers/net/wireless/wl12xx/main.c
@@ -3137,8 +3137,10 @@ static void wl1271_op_sched_scan_stop(struct ieee80211_hw *hw,
mutex_lock(&wl->mutex);

ret = wl1271_ps_elp_wakeup(wl);
- if (ret < 0)
+ if (ret < 0) {
+ wl12xx_queue_recovery_work(wl);
goto out;
+ }

wl1271_scan_sched_scan_stop(wl);

diff --git a/drivers/net/wireless/wl12xx/scan.c b/drivers/net/wireless/wl12xx/scan.c
index e24111e..108bed4 100644
--- a/drivers/net/wireless/wl12xx/scan.c
+++ b/drivers/net/wireless/wl12xx/scan.c
@@ -739,22 +739,26 @@ void wl1271_scan_sched_scan_stop(struct wl1271 *wl)

wl1271_debug(DEBUG_CMD, "cmd periodic scan stop");

- /* FIXME: what to do if alloc'ing to stop fails? */
stop = kzalloc(sizeof(*stop), GFP_KERNEL);
if (!stop) {
wl1271_error("failed to alloc memory to send sched scan stop");
- return;
+ goto recovery;
}

stop->tag = WL1271_SCAN_DEFAULT_TAG;

ret = wl1271_cmd_send(wl, CMD_STOP_PERIODIC_SCAN, stop,
sizeof(*stop), 0);
+
+ kfree(stop);
+
if (ret < 0) {
wl1271_error("failed to send sched scan stop command");
- goto out_free;
+ goto recovery;
}

-out_free:
- kfree(stop);
+ return;
+
+recovery:
+ wl12xx_queue_recovery_work(wl);
}
--
1.7.4.1

2011-12-23 09:13:14

by Luciano Coelho

[permalink] [raw]

Subject: Re: [PATCH] wl12xx: trigger recovery for failures in sched scan stop

On Fri, 2011-12-23 at 03:11 +0200, Eyal Shapira wrote:
> On Fri, Dec 23, 2011 at 3:07 AM, Eyal Shapira <[email protected]> wrote:
> > On Thu, Dec 22, 2011 at 10:59 AM, Eliad Peller <[email protected]> wrote:
> >> On Wed, Dec 21, 2011 at 2:06 PM, Eyal Shapira <[email protected]> wrote:
> >> > mac80211 requires that drv_sched_scan_stop
> >> > will always succeed. We have a few possible errors
> >> > such as OOM, failure to ELP wakeup, fail in the FW command.
> >> > Instead of no-op , trigger a recovery which would
> >> > mark sched scan as stopped and clear up any bad FW state.
> >> >
> >> > Signed-off-by: Eyal Shapira <[email protected]>
> >> > ---
> >> this patch seems a bit redundant.
> >>
> >> > ret = wl1271_ps_elp_wakeup(wl);
> >> > - if (ret < 0)
> >> > + if (ret < 0) {
> >> > + wl12xx_queue_recovery_work(wl);
> >> > goto out;
> >> > + }
> >> >
> >> we enqueue recovery in wl1271_ps_elp_wakeup() in most error paths.
> >>
> ok. agreed. what about the path of completion error in wl1271_ps_elp_wakeup() ?
> Why not trigger recovery in that case as well ?

If the wake up times out, the firmware has very likely crashed or is
stuck, so recovery is needed.

If the wake up _fails_, on the other hand, things are probably not too
bad. We could probably recover without recovery. :P In most cases this
will generate a cascade of failures and things may go wrong, but in
other cases, such as debugfs, a failure is not bad.

> >>
> >> > - /* FIXME: what to do if alloc'ing to stop fails? */
> >> > stop = kzalloc(sizeof(*stop), GFP_KERNEL);
> >> > if (!stop) {
> >> > wl1271_error("failed to alloc memory to send sched scan stop");
> >> > - return;
> >> > + goto recovery;
> >> > }
> >> not sure if initiating recovery here will really help. it'll probably
> >> fail as well, and good chances we are going to panic soon anyway :)
> >>
> Luca pointed out the same thing so I'm dropping this.

Okay, I will drop it for now and wait for v2 or a different solution.

> The question is what should we do with failing alloc as
> op_sched_scan_stop should be void.

We could change the op_sched_scan_stop to return int instead. At least
in this case there's a good reason for doing it. We don't need to
return the error to the userspace (as it happens now in case of
failure), but we can at least use that to trigger the deallocation in
mac80211.

> Due to other changes in the sched_scan_stop path in mac/cfg80211 it's
> more important to
> call ieee80211_sched_scan_stoppeD() as otherwise allocs done in
> mac80211 won't be freed as well as the userspace
> won't be notified of sched scan stop.

Let's see what comes out of our discussion in the other thread. ;)

> Options:
> 1. Call ieee80211_sched_scan_stopped() for any failure (OOM,
> elp_wakeup, ...). This would free up the allocs in mac80211 and notify
> userspace
> but would leave the FW out of sync with the upper layers. The next
> sched scan initiated would trigger a recovery given that the previous
> sched
> scan wasn't stopped.

This also depends on what happens to the changes in cfg/mac80211.

> 2. No op - Just ignore kzalloc failure. Sched scan can't be stopped in
> this case and the userspace won't get any completion event and alloc
> in
> mac80211 will remain.

I don't think this is an option. It will leak memory, so the OOM will
just get worse. The userspace might have a timeout and retry the stop
after sometime, which would make things even worse.

> 3. Variation of 1 - Call stopped() but propagate to userspace through
> the NL event that we couldn't really stop.
> I don't think that would fly as it's more of an API change to userspace.

Not so good. As I mentioned in the other thread, maybe we could delay
the stop command completion?

> I understand that option #2 is what we're going for lacking a better
> alternative.
> Any other ideas / opinions ?

Let's wait and see what comes out of the cfg/mac80211 discussion.

> >> anyway, the major thing i don't like here is handling the sched_stop
> >> in a different manner than the rest of the commands.

This was exactly one of my concerns when I mentioned privately to Eyal
that this looked a bit weird. I don't see a good reason why
sched_scan_stop would have more dramatic failure consequences than other
commands.

--
Cheers,
Luca.

2011-12-22 08:59:56

by Eliad Peller

[permalink] [raw]

Subject: Re: [PATCH] wl12xx: trigger recovery for failures in sched scan stop

On Wed, Dec 21, 2011 at 2:06 PM, Eyal Shapira <[email protected]> wrote:
> mac80211 requires that drv_sched_scan_stop
> will always succeed. We have a few possible errors
> such as OOM, failure to ELP wakeup, fail in the FW command.
> Instead of no-op , trigger a recovery which would
> mark sched scan as stopped and clear up any bad FW state.
>
> Signed-off-by: Eyal Shapira <[email protected]>
> ---
this patch seems a bit redundant.

> ? ? ? ?ret = wl1271_ps_elp_wakeup(wl);
> - ? ? ? if (ret < 0)
> + ? ? ? if (ret < 0) {
> + ? ? ? ? ? ? ? wl12xx_queue_recovery_work(wl);
> ? ? ? ? ? ? ? ?goto out;
> + ? ? ? }
>
we enqueue recovery in wl1271_ps_elp_wakeup() in most error paths.

> - ? ? ? /* FIXME: what to do if alloc'ing to stop fails? */
> ? ? ? ?stop = kzalloc(sizeof(*stop), GFP_KERNEL);
> ? ? ? ?if (!stop) {
> ? ? ? ? ? ? ? ?wl1271_error("failed to alloc memory to send sched scan stop");
> - ? ? ? ? ? ? ? return;
> + ? ? ? ? ? ? ? goto recovery;
> ? ? ? ?}
not sure if initiating recovery here will really help. it'll probably
fail as well, and good chances we are going to panic soon anyway :)

> ? ? ? ?ret = wl1271_cmd_send(wl, CMD_STOP_PERIODIC_SCAN, stop,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sizeof(*stop), 0);
> +
> + ? ? ? kfree(stop);
> +
> ? ? ? ?if (ret < 0) {
> ? ? ? ? ? ? ? ?wl1271_error("failed to send sched scan stop command");
> - ? ? ? ? ? ? ? goto out_free;
> + ? ? ? ? ? ? ? goto recovery;
> ? ? ? ?}
>
wl1271_cmd_send enqueues recovery work on error as well.

anyway, the major thing i don't like here is handling the sched_stop
in a different manner than the rest of the commands.

Eliad.

2011-12-23 01:11:30

by Eyal Shapira

[permalink] [raw]

Subject: Re: [PATCH] wl12xx: trigger recovery for failures in sched scan stop

On Fri, Dec 23, 2011 at 3:07 AM, Eyal Shapira <[email protected]> wrote:
>
>
>
> On Thu, Dec 22, 2011 at 10:59 AM, Eliad Peller <[email protected]> wrote:
>>
>> On Wed, Dec 21, 2011 at 2:06 PM, Eyal Shapira <[email protected]> wrote:
>> > mac80211 requires that drv_sched_scan_stop
>> > will always succeed. We have a few possible errors
>> > such as OOM, failure to ELP wakeup, fail in the FW command.
>> > Instead of no-op , trigger a recovery which would
>> > mark sched scan as stopped and clear up any bad FW state.
>> >
>> > Signed-off-by: Eyal Shapira <[email protected]>
>> > ---
>> this patch seems a bit redundant.
>>
>> > ? ? ? ?ret = wl1271_ps_elp_wakeup(wl);
>> > - ? ? ? if (ret < 0)
>> > + ? ? ? if (ret < 0) {
>> > + ? ? ? ? ? ? ? wl12xx_queue_recovery_work(wl);
>> > ? ? ? ? ? ? ? ?goto out;
>> > + ? ? ? }
>> >
>> we enqueue recovery in wl1271_ps_elp_wakeup() in most error paths.
>>
ok. agreed. what about the path of completion error in wl1271_ps_elp_wakeup() ?
Why not trigger recovery in that case as well ?
>
>>
>> > - ? ? ? /* FIXME: what to do if alloc'ing to stop fails? */
>> > ? ? ? ?stop = kzalloc(sizeof(*stop), GFP_KERNEL);
>> > ? ? ? ?if (!stop) {
>> > ? ? ? ? ? ? ? ?wl1271_error("failed to alloc memory to send sched scan stop");
>> > - ? ? ? ? ? ? ? return;
>> > + ? ? ? ? ? ? ? goto recovery;
>> > ? ? ? ?}
>> not sure if initiating recovery here will really help. it'll probably
>> fail as well, and good chances we are going to panic soon anyway :)
>>
Luca pointed out the same thing so I'm dropping this.
The question is what should we do with failing alloc?as
op_sched_scan_stop ?should be void.
Due to other changes in the sched_scan_stop path in mac/cfg80211 it's
more important to
call ieee80211_sched_scan_stoppeD() as otherwise allocs done in
mac80211 won't be freed as well as the userspace
won't be notified of sched scan stop.
Options:
1. Call ieee80211_sched_scan_stopped() for any failure (OOM,
elp_wakeup, ...). This would free up the allocs in mac80211 and notify
userspace
but would leave the FW out of sync with the upper layers. The next
sched scan initiated would trigger a recovery given that the previous
sched
scan wasn't stopped.

2. No op - Just ignore kzalloc failure. Sched scan can't be stopped in
this case and the userspace won't get any completion event and alloc
in
mac80211 will remain.

3. Variation of 1 - Call stopped() but propagate to userspace through
the NL event that we couldn't really stop.
I don't think that would fly as it's more of an API change to userspace.

I understand that option #2 is what we're going for lacking a better
alternative.
Any other ideas / opinions ?

>> > ? ? ? ?ret = wl1271_cmd_send(wl, CMD_STOP_PERIODIC_SCAN, stop,
>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sizeof(*stop), 0);
>> > +
>> > + ? ? ? kfree(stop);
>> > +
>> > ? ? ? ?if (ret < 0) {
>> > ? ? ? ? ? ? ? ?wl1271_error("failed to send sched scan stop command");
>> > - ? ? ? ? ? ? ? goto out_free;
>> > + ? ? ? ? ? ? ? goto recovery;
>> > ? ? ? ?}
>> >
>> wl1271_cmd_send enqueues recovery work on error as well.
>>

got it. will be dropped.

>>
>> anyway, the major thing i don't like here is handling the sched_stop
>> in a different manner than the rest of the commands.
>>
>> Eliad.
>
>