2010-11-18 13:19:25

by Juuso Oikarinen

[permalink] [raw]
Subject: [PATCH] wl12xx: Fix kernel crash related to hw recovery and interface shutdown

From: Juuso Oikarinen <[email protected]>

It is possible that the op_remove_interface function is invoked exactly at
the same time has hw recovery is started. In this case it is possible for the
interface to be already removed in the op_remove_interface call, which
currently leads to a kernel warning and a subsequent kernel crash.

Fix this by ignoring the op_remove_interface call if the interface is already
down at that point.

Signed-off-by: Juuso Oikarinen <[email protected]>
---
drivers/net/wireless/wl12xx/main.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/wl12xx/main.c b/drivers/net/wireless/wl12xx/main.c
index 31f0e2f..11b0477 100644
--- a/drivers/net/wireless/wl12xx/main.c
+++ b/drivers/net/wireless/wl12xx/main.c
@@ -1157,10 +1157,16 @@ static void wl1271_op_remove_interface(struct ieee80211_hw *hw,
struct wl1271 *wl = hw->priv;

mutex_lock(&wl->mutex);
- WARN_ON(wl->vif != vif);
- __wl1271_op_remove_interface(wl);
- mutex_unlock(&wl->mutex);
+ /*
+ * wl->vif can be null here if someone shuts down the interface
+ * just when hardware recovery has been started.
+ */
+ if (wl->vif) {
+ WARN_ON(wl->vif != vif);
+ __wl1271_op_remove_interface(wl);
+ }

+ mutex_unlock(&wl->mutex);
cancel_work_sync(&wl->recovery_work);
}

--
1.7.1



2010-11-19 06:58:25

by Tuomas Katila

[permalink] [raw]
Subject: Re: [PATCH] wl12xx: Fix kernel crash related to hw recovery and interface shutdown

On Thu, 2010-11-18 at 14:19 +0100, Oikarinen Juuso (Nokia-MS/Tampere)
wrote:
> From: Juuso Oikarinen <[email protected]>
>
> It is possible that the op_remove_interface function is invoked exactly at
> the same time has hw recovery is started. In this case it is possible for the
> interface to be already removed in the op_remove_interface call, which
> currently leads to a kernel warning and a subsequent kernel crash.
>
> Fix this by ignoring the op_remove_interface call if the interface is already
> down at that point.
>
> Signed-off-by: Juuso Oikarinen <[email protected]>
> ---

Tested-by: Tuomas Katila <[email protected]>

I couldn't see the crash anymore with this patch.

-Tuomas



2010-11-23 05:51:22

by Juuso Oikarinen

[permalink] [raw]
Subject: Re: [PATCH] wl12xx: Fix kernel crash related to hw recovery and interface shutdown

On Mon, 2010-11-22 at 14:52 +0200, Luciano Coelho wrote:
> On Thu, 2010-11-18 at 15:19 +0200, [email protected] wrote:
> > From: Juuso Oikarinen <[email protected]>
> >
> > It is possible that the op_remove_interface function is invoked exactly at
> > the same time has hw recovery is started. In this case it is possible for the
> > interface to be already removed in the op_remove_interface call, which
> > currently leads to a kernel warning and a subsequent kernel crash.
> >
> > Fix this by ignoring the op_remove_interface call if the interface is already
> > down at that point.
> >
> > Signed-off-by: Juuso Oikarinen <[email protected]>
> > ---
>
> [...]
>
> > diff --git a/drivers/net/wireless/wl12xx/main.c b/drivers/net/wireless/wl12xx/main.c
> > index 31f0e2f..11b0477 100644
> > --- a/drivers/net/wireless/wl12xx/main.c
> > +++ b/drivers/net/wireless/wl12xx/main.c
> > @@ -1157,10 +1157,16 @@ static void wl1271_op_remove_interface(struct ieee80211_hw *hw,
> > struct wl1271 *wl = hw->priv;
> >
> > mutex_lock(&wl->mutex);
> > - WARN_ON(wl->vif != vif);
> > - __wl1271_op_remove_interface(wl);
> > - mutex_unlock(&wl->mutex);
> > + /*
> > + * wl->vif can be null here if someone shuts down the interface
> > + * just when hardware recovery has been started.
> > + */
> > + if (wl->vif) {
> > + WARN_ON(wl->vif != vif);
> > + __wl1271_op_remove_interface(wl);
> > + }
>
> Should you still remove the interface if the vif you received is wrong?
> Surely, something is totally wrong if you get a different vif, but maybe
> removing the interface here will just confuse things even more?

Dunno if it would be better to remove or leave unremoved - probably does
not matter. If the vif is wrong, there is some serious bug somewhere,
and probably the result is serious instability anyway.

The WARN_ON is there just to validate the assumption the driver makes
about the vif, and this function call. If there is a bug somewhere, or
the assumption somehow changes, the warning will be a clear indication
of it.

-Juuso

>
> > + mutex_unlock(&wl->mutex);
> > cancel_work_sync(&wl->recovery_work);
> > }
> >
>
>



2010-11-23 08:47:35

by Luciano Coelho

[permalink] [raw]
Subject: Re: [PATCH] wl12xx: Fix kernel crash related to hw recovery and interface shutdown

On Tue, 2010-11-23 at 07:51 +0200, Juuso Oikarinen wrote:
> On Mon, 2010-11-22 at 14:52 +0200, Luciano Coelho wrote:
> > On Thu, 2010-11-18 at 15:19 +0200, [email protected] wrote:
> > > From: Juuso Oikarinen <[email protected]>
> > >
> > > It is possible that the op_remove_interface function is invoked exactly at
> > > the same time has hw recovery is started. In this case it is possible for the
> > > interface to be already removed in the op_remove_interface call, which
> > > currently leads to a kernel warning and a subsequent kernel crash.
> > >
> > > Fix this by ignoring the op_remove_interface call if the interface is already
> > > down at that point.
> > >
> > > Signed-off-by: Juuso Oikarinen <[email protected]>
> > > ---
> >
> > [...]
> >
> > > diff --git a/drivers/net/wireless/wl12xx/main.c b/drivers/net/wireless/wl12xx/main.c
> > > index 31f0e2f..11b0477 100644
> > > --- a/drivers/net/wireless/wl12xx/main.c
> > > +++ b/drivers/net/wireless/wl12xx/main.c
> > > @@ -1157,10 +1157,16 @@ static void wl1271_op_remove_interface(struct ieee80211_hw *hw,
> > > struct wl1271 *wl = hw->priv;
> > >
> > > mutex_lock(&wl->mutex);
> > > - WARN_ON(wl->vif != vif);
> > > - __wl1271_op_remove_interface(wl);
> > > - mutex_unlock(&wl->mutex);
> > > + /*
> > > + * wl->vif can be null here if someone shuts down the interface
> > > + * just when hardware recovery has been started.
> > > + */
> > > + if (wl->vif) {
> > > + WARN_ON(wl->vif != vif);
> > > + __wl1271_op_remove_interface(wl);
> > > + }
> >
> > Should you still remove the interface if the vif you received is wrong?
> > Surely, something is totally wrong if you get a different vif, but maybe
> > removing the interface here will just confuse things even more?
>
> Dunno if it would be better to remove or leave unremoved - probably does
> not matter. If the vif is wrong, there is some serious bug somewhere,
> and probably the result is serious instability anyway.
>
> The WARN_ON is there just to validate the assumption the driver makes
> about the vif, and this function call. If there is a bug somewhere, or
> the assumption somehow changes, the warning will be a clear indication
> of it.

Yep, it's okay like this. And it was like that earlier anyway. I was
just wondering if it would be possible for a similar thing as wl->vif
being NULL (which your patch addresses) happen, such as wl->vif changing
before we had the chance to fully remove the interface. It will
probably not happen and, as you say, the WARN_ON will be a good
indication that something went wrong.

Applied to wl12xx/master. Thanks!


--
Cheers,
Luca.


2010-11-22 12:52:58

by Luciano Coelho

[permalink] [raw]
Subject: Re: [PATCH] wl12xx: Fix kernel crash related to hw recovery and interface shutdown

On Thu, 2010-11-18 at 15:19 +0200, [email protected] wrote:
> From: Juuso Oikarinen <[email protected]>
>
> It is possible that the op_remove_interface function is invoked exactly at
> the same time has hw recovery is started. In this case it is possible for the
> interface to be already removed in the op_remove_interface call, which
> currently leads to a kernel warning and a subsequent kernel crash.
>
> Fix this by ignoring the op_remove_interface call if the interface is already
> down at that point.
>
> Signed-off-by: Juuso Oikarinen <[email protected]>
> ---

[...]

> diff --git a/drivers/net/wireless/wl12xx/main.c b/drivers/net/wireless/wl12xx/main.c
> index 31f0e2f..11b0477 100644
> --- a/drivers/net/wireless/wl12xx/main.c
> +++ b/drivers/net/wireless/wl12xx/main.c
> @@ -1157,10 +1157,16 @@ static void wl1271_op_remove_interface(struct ieee80211_hw *hw,
> struct wl1271 *wl = hw->priv;
>
> mutex_lock(&wl->mutex);
> - WARN_ON(wl->vif != vif);
> - __wl1271_op_remove_interface(wl);
> - mutex_unlock(&wl->mutex);
> + /*
> + * wl->vif can be null here if someone shuts down the interface
> + * just when hardware recovery has been started.
> + */
> + if (wl->vif) {
> + WARN_ON(wl->vif != vif);
> + __wl1271_op_remove_interface(wl);
> + }

Should you still remove the interface if the vif you received is wrong?
Surely, something is totally wrong if you get a different vif, but maybe
removing the interface here will just confuse things even more?


> + mutex_unlock(&wl->mutex);
> cancel_work_sync(&wl->recovery_work);
> }
>


--
Cheers,
Luca.