2017-02-24 13:29:45

by Amitkumar Karwar

[permalink] [raw]
Subject: [PATCH] mwifiex: fix kernel crash after shutdown command timeout

We observed a SHUTDOWN command timeout during reboot stress test due
to a corner case firmware bug. It leads to use-after-free on adapter
structure pointer and crash.

We already have a cancel_work_sync() call in teardown thread. This
issue is fixed by having this call just before mwifiex_remove_card().
At this point no further work will be scheduled.

Signed-off-by: Amitkumar Karwar <[email protected]>
Signed-off-by: Cathy Luo <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 3 +--
drivers/net/wireless/marvell/mwifiex/sdio.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index a0d9180..f31c5ea 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -294,8 +294,6 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
if (!adapter || !adapter->priv_num)
return;

- cancel_work_sync(&card->work);
-
reg = card->pcie.reg;
if (reg)
ret = mwifiex_read_reg(adapter, reg->fw_status, &fw_status);
@@ -312,6 +310,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
}

+ cancel_work_sync(&card->work);
mwifiex_remove_card(adapter);
}

diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
index a4b356d..9534b47 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -387,8 +387,6 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter)
if (!adapter || !adapter->priv_num)
return;

- cancel_work_sync(&card->work);
-
mwifiex_dbg(adapter, INFO, "info: SDIO func num=%d\n", func->num);

ret = mwifiex_sdio_read_fw_status(adapter, &firmware_stat);
@@ -400,6 +398,7 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter)
mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
}

+ cancel_work_sync(&card->work);
mwifiex_remove_card(adapter);
}

--
1.9.1


2017-03-14 18:33:13

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: fix kernel crash after shutdown command timeout

Hi Amit,

You managed to CC several other Google folks, but not me, the one who
actually reviewed most of this! You might consider including me in the
future :)

On Fri, Feb 24, 2017 at 06:59:24PM +0530, Amitkumar Karwar wrote:
> We observed a SHUTDOWN command timeout during reboot stress test due
> to a corner case firmware bug. It leads to use-after-free on adapter
> structure pointer and crash.
>
> We already have a cancel_work_sync() call in teardown thread. This
> issue is fixed by having this call just before mwifiex_remove_card().
> At this point no further work will be scheduled.
>
> Signed-off-by: Amitkumar Karwar <[email protected]>
> Signed-off-by: Cathy Luo <[email protected]>

I'm testing this artificially by testing things like this concurrently:

rmmod mwifiex_pcie &
cat /sys/kernel/debug/mwifiex/mlan0/device_dump

I'm using a 4.4-based kernel (plus quite a few backports) at the moment
and I'm having problems (I can retest on upstream if really needed), and
pretty sure this patch is buggy.

> ---
> drivers/net/wireless/marvell/mwifiex/pcie.c | 3 +--
> drivers/net/wireless/marvell/mwifiex/sdio.c | 3 +--
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index a0d9180..f31c5ea 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -294,8 +294,6 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
> if (!adapter || !adapter->priv_num)
> return;
>
> - cancel_work_sync(&card->work);
> -
> reg = card->pcie.reg;
> if (reg)
> ret = mwifiex_read_reg(adapter, reg->fw_status, &fw_status);
> @@ -312,6 +310,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
> mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
> }
>
> + cancel_work_sync(&card->work);

I don't think we want to move the cancellation to be this far; see the
mwifiex_init_shutdown_fw() above! If I add a msleep(3000) below this,
then run:

rmmod mwifiex_pcie & sleep 0.5; cat /sys/kernel/debug/mwifiex/mlan0/device_dump

I can trigger an abort in mwifiex_pcie_rdwr_firmware(). The problem is
that you still allow a command timeout + firmware dump worker to still
race with the shutdown -- in this case, I think it's
mwifiex_init_shutdown_fw() that's disabling the device.

I think the real solution is to, somewhere before we shutdown the
firmware, *really* prevent any further work to be scheduled to
&card->work. Maybe that means adding another flag so that the worker
will just abort quickly in that case? So it's something like:

card->worker_flags |= DONT_RUN_ANY_MORE;
cancel_work_sync(&card->work);

... (this can be done either above the FIRMWARE_READY_PCIE
check, or else you need to write a different version for
FIRMWARE_READY_PCIE vs. !FIRMWARE_READY_PCIE) ... but definitely
before mwifiex_init_shutdown_fw() ) ...

And in mwifiex_pcie_work():

if (card->worker_flags & DONT_RUN_ANY_MORE)
return;

IOW, NAK to this patch.

Brian

> mwifiex_remove_card(adapter);
> }
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
> index a4b356d..9534b47 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sdio.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
> @@ -387,8 +387,6 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter)
> if (!adapter || !adapter->priv_num)
> return;
>
> - cancel_work_sync(&card->work);
> -
> mwifiex_dbg(adapter, INFO, "info: SDIO func num=%d\n", func->num);
>
> ret = mwifiex_sdio_read_fw_status(adapter, &firmware_stat);
> @@ -400,6 +398,7 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter)
> mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
> }
>
> + cancel_work_sync(&card->work);
> mwifiex_remove_card(adapter);
> }
>
> --
> 1.9.1
>

2017-03-15 14:11:03

by Amitkumar Karwar

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: fix kernel crash after shutdown command timeout

> From: Brian Norris [mailto:[email protected]]
> Sent: Wednesday, March 15, 2017 12:03 AM
> To: Amitkumar Karwar
> Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> [email protected]; [email protected]
> Subject: [EXT] Re: [PATCH] mwifiex: fix kernel crash after shutdown
> command timeout
>
> On Fri, Feb 24, 2017 at 06:59:24PM +0530, Amitkumar Karwar wrote:
> > We observed a SHUTDOWN command timeout during reboot stress test due
> > to a corner case firmware bug. It leads to use-after-free on adapter
> > structure pointer and crash.
> >
> > We already have a cancel_work_sync() call in teardown thread. This
> > issue is fixed by having this call just before mwifiex_remove_card().
> > At this point no further work will be scheduled.
> >
> > Signed-off-by: Amitkumar Karwar <[email protected]>
> > Signed-off-by: Cathy Luo <[email protected]>
>
> I'm testing this artificially by testing things like this concurrently:
>
> rmmod mwifiex_pcie &
> cat /sys/kernel/debug/mwifiex/mlan0/device_dump
>
> I'm using a 4.4-based kernel (plus quite a few backports) at the moment
> and I'm having problems (I can retest on upstream if really needed),
> and pretty sure this patch is buggy.
>
> > ---
> > drivers/net/wireless/marvell/mwifiex/pcie.c | 3 +--
> > drivers/net/wireless/marvell/mwifiex/sdio.c | 3 +--
> > 2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > index a0d9180..f31c5ea 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > @@ -294,8 +294,6 @@ static void mwifiex_pcie_remove(struct pci_dev
> *pdev)
> > if (!adapter || !adapter->priv_num)
> > return;
> >
> > - cancel_work_sync(&card->work);
> > -
> > reg = card->pcie.reg;
> > if (reg)
> > ret = mwifiex_read_reg(adapter, reg->fw_status,
> &fw_status); @@
> > -312,6 +310,7 @@ static void mwifiex_pcie_remove(struct pci_dev
> *pdev)
> > mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
> > }
> >
> > + cancel_work_sync(&card->work);
>
> I don't think we want to move the cancellation to be this far; see the
> mwifiex_init_shutdown_fw() above! If I add a msleep(3000) below this,
> then run:
>
> rmmod mwifiex_pcie & sleep 0.5; cat
> /sys/kernel/debug/mwifiex/mlan0/device_dump
>
> I can trigger an abort in mwifiex_pcie_rdwr_firmware(). The problem is
> that you still allow a command timeout + firmware dump worker to still
> race with the shutdown -- in this case, I think it's
> mwifiex_init_shutdown_fw() that's disabling the device.
>
> I think the real solution is to, somewhere before we shutdown the
> firmware, *really* prevent any further work to be scheduled to &card-
> >work. Maybe that means adding another flag so that the worker will
> just abort quickly in that case? So it's something like:
>
> card->worker_flags |= DONT_RUN_ANY_MORE;
> cancel_work_sync(&card->work);
>
> ... (this can be done either above the FIRMWARE_READY_PCIE
> check, or else you need to write a different version for
> FIRMWARE_READY_PCIE vs. !FIRMWARE_READY_PCIE) ... but definitely
> before mwifiex_init_shutdown_fw() ) ...
>
> And in mwifiex_pcie_work():
>
> if (card->worker_flags & DONT_RUN_ANY_MORE)
> return;
>

Thanks for the review.
You are right. This can be cleanly fixed with a extra worker flag(DONT_RUN_ANY_MORE)
I will submit updated version with this approach.

Regards,
Amitkumar

2017-03-14 18:36:57

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: fix kernel crash after shutdown command timeout

On Tue, Mar 14, 2017 at 11:33 AM, Brian Norris <[email protected]> wrote:
>
> Hi Amit,
>
> You managed to CC several other Google folks, but not me, the one who
> actually reviewed most of this! You might consider including me in the
> future :)


Oops, in fact you did CC me... my bad. But I usually use my
@chromium.org for this stuff.