From: Xinming Hu <[email protected]>
card->adapter gets initialized during device registration.
As it's not cleared, we may end up accessing invalid memory
in some corner cases. This patch fixes the problem.
Signed-off-by: Xinming Hu <[email protected]>
Signed-off-by: Amitkumar Karwar <[email protected]>
---
v2: Same as v1
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 1 +
drivers/net/wireless/marvell/mwifiex/sdio.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index f1eeb73..ba9e068 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
pci_disable_msi(pdev);
}
}
+ card->adapter = NULL;
}
/* This function initializes the PCI-E host memory space, WCB rings, etc.
diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
index 8718950..4cad1c2 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -2066,6 +2066,7 @@ mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
struct sdio_mmc_card *card = adapter->card;
if (adapter->card) {
+ card->adapter = NULL;
sdio_claim_host(card->func);
sdio_disable_func(card->func);
sdio_release_host(card->func);
--
1.9.1
Hi Brian,
> From: [email protected] [mailto:linux-wireless-
> [email protected]] On Behalf Of Brian Norris
> Sent: Wednesday, October 05, 2016 10:00 PM
> To: Amitkumar Karwar
> Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> [email protected]; Xinming Hu
> Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device
> unregister
>
> Hi,
>
> On Wed, Oct 05, 2016 at 02:04:53PM +0000, Amitkumar Karwar wrote:
> > > From: Brian Norris [mailto:[email protected]]
> > > Sent: Wednesday, October 05, 2016 3:28 AM
> > > To: Amitkumar Karwar
> > > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> > > [email protected]; [email protected]; Xinming Hu
> > > Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during
> > > device unregister
> > >
> > > Hi,
> > >
> > > On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:
>
> > > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct
> > > mwifiex_adapter *adapter)
> > > > pci_disable_msi(pdev);
> > > > }
> > > > }
> > > > + card->adapter = NULL;
> > >
> > > I think you have a similar problem here as in patch 2; there is no
> > > locking to protect fields in struct pcie_service_card or struct
> > > sdio_mmc_card below. That problem kind of already exists, except
> > > that you only write the value of card->adapter once at registration
> > > time, so it's not actually unsafe. But now that you're introducing a
> > > second write, you have a problem.
> > >
> > > Brian
> > >
> >
> > We have a global "add_remove_card_sem" semaphore in our code for
> > synchronizing initialization and teardown threads. Ideally "init +
> > teardown/reboot" should not have a race issue with this logic
> >
> > Later there was a loophole introduced in this after using async
> > firmware download API. During initialization, firmware downloads
> > asynchronously in a separate thread where might have released the
> > semaphore. I am working on a patch to fix this.
> >
> > So "card->adapter" doesn't need to have locking. Even if we have two
> > write operations, those two threads can't run simultaneously due to
> > above mentioned logic.
>
> What about writes racing with reads? You have lots of unsynchronized
> cases that read this, although most of them should be halted by now
> (e.g., cmd processing). I was looking at suspend() in particular, which
> I thought you were looking at in this patch series.
Please note that "card->adapter" is used only in pcie.c/sdio.c/usb.c files
Writes won't have race with reads.
1) write 1 --- "card->adapter = adapter;" in mwifiex_register_dev()
This place is at the beginning of initialization.
mwifiex_pcie_probe() -> mwifiex_add_card() -> adapter->if_ops.register_dev()
There is no chance that "card->adapter" is read anywhere at this point. FW is not yet downloaded
2) write 2 ---- "card->adapter = NULL;" in mwifiex_unregister_dev()
This place the end of teardown phase.
Interrupts are disabled and all cleanup is done. We have "card->adapter" NULL checks at entry point of suspend/remove/resume, if they get called after this.
Regards,
Amitkumar
(I think Dmitry noticed the same while I wrote this.)
On Mon, Oct 10, 2016 at 04:47:08PM -0700, Brian Norris wrote:
> [*] The other cases are in error handling cases. I guess I should make
> sure those didn't race too...
Ah, well I think I missed one case:
For the async FW request code path, the callback mwifiex_fw_dpc() can
fail to load FW, and so it unwinds with:
if (adapter->if_ops.unregister_dev)
adapter->if_ops.unregister_dev(adapter);
This all happens after probe() is finished, so we can have a race on
card->adapter being read in suspend() and written in the $subject patch.
Can I revoke my Reviewed-by? (Or make it Reviewed-and-found-wanting-by?)
Brian
Hi Brian,
Thanks for your thorough review.
> From: Brian Norris [mailto:[email protected]]
> Sent: Wednesday, October 05, 2016 2:35 AM
> To: Amitkumar Karwar
> Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> [email protected]; [email protected]; Xinming Hu
> Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> resume handlers
>
> Hi,
>
> On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > From: Xinming Hu <[email protected]>
> >
> > We have observed a kernel crash when system immediately suspends after
> > booting. There is a race between suspend and driver initialization
> > paths.
> > This patch adds hw_status checks to fix the problem
> >
> > Signed-off-by: Xinming Hu <[email protected]>
> > Signed-off-by: Amitkumar Karwar <[email protected]>
> > ---
> > v2: Return failure in suspend/resume handler in this scenario.
> > ---
> > drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > index ba9e068..fa6bf85 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > *dev)
> >
> > if (pdev) {
> > card = pci_get_drvdata(pdev);
> > - if (!card || !card->adapter) {
> > + if (!card || !card->adapter ||
> > + card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
>
> Wait, is there no locking on the 'hw_status' field? That is inherently
> an unsafe race all on its own; you're not guaranteed that this will be
> read/written atomically. And you also aren't guaranteed that writes to
> this happen in the order they appear in the code -- in other words,
> reading this flag doesn't necessarily guarantee that initialization is
> actually complete (even if that's very likely to be true, given that
> it's probably just a single-instruction word-access, and any prior HW
> polling or interrupts likely have done some synchronization and can't be
> reordered).
> actually complete
Here is the brief info on how "hw_status" flag is updated.
1) It gets changed incrementally during initialization.
MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE -> MWIFIEX_HW_STATUS_READY
2) Status will remain READY once driver+firmware is up and running.
3) Below is status during teardown
MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET -> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
As the events occur one after another, we don't expect a race and don't need locking here for the flag. Flag status MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.
In worst case scenario, only first system suspend attempt issued immediately after system boot will be aborted with BUSY error. I think, that should be fine.
Let me know if you have any concerns.
>
> This is probably better than nothing, but it's not very good.
>
> > pr_err("Card or adapter structure is not valid\n");
> > - return 0;
> > + return -EBUSY;
>
> So the above cases all mean that the driver hasn't finished loading,
> right?
>
> !card => can't happen (PCIe probe() would have failed
> mwifiex_add_card()), but fine to check
NULL "card" or "card->adapter" pointers are expected in below cases
1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
2) Race of teardown + suspend
>
> !card->adapter => only happens after patch 1; i.e., when tearing down
> the device and detaching it from the driver
>
> card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not loaded
> (i.e., in the process of starting or stopping FW?)
>
> I guess all of those cases make sense to be -EBUSY.
Yes. we can keep -EBUSY for all of the cases.
>
> > }
> > } else {
> > pr_err("PCIE device is not specified\n");
>
> I was going to complain about this branch (!pdev) returning 0, since
> that looked asymmetric. But this case will never happen, actually, since
> to_pci_dev() is just doing struct offset arithmetic on struct device,
> and struct device is never going to be NULL. It probably makes more
> sense to just remove this branch entirely, but that's for another patch.
Thanks for pointing this out. I will create separate patch for this.
>
> > @@ -166,9 +167,10 @@ static int mwifiex_pcie_resume(struct device
> > *dev)
> >
> > if (pdev) {
> > card = pci_get_drvdata(pdev);
> > - if (!card || !card->adapter) {
> > + if (!card || !card->adapter ||
> > + card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
>
> Same complaint, except if you've screwed up here, you probably already
> screwed up in suspend().
>
> Brian
>
> > pr_err("Card or adapter structure is not valid\n");
> > - return 0;
> > + return -EBUSY;
> > }
> > } else {
> > pr_err("PCIE device is not specified\n");
> > --
> > 1.9.1
> >
Regards,
Amitkumar
Hi Amit,
On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:[email protected]]
> > Sent: Wednesday, October 05, 2016 2:35 AM
> > To: Amitkumar Karwar
> > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> > [email protected]; [email protected]; Xinming Hu
> > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> > resume handlers
> >
> > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <[email protected]>
> > >
> > > We have observed a kernel crash when system immediately suspends after
> > > booting. There is a race between suspend and driver initialization
> > > paths.
> > > This patch adds hw_status checks to fix the problem
> > >
> > > Signed-off-by: Xinming Hu <[email protected]>
> > > Signed-off-by: Amitkumar Karwar <[email protected]>
> > > ---
> > > v2: Return failure in suspend/resume handler in this scenario.
> > > ---
> > > drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index ba9e068..fa6bf85 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > *dev)
> > >
> > > if (pdev) {
> > > card = pci_get_drvdata(pdev);
> > > - if (!card || !card->adapter) {
> > > + if (!card || !card->adapter ||
> > > + card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> >
> > Wait, is there no locking on the 'hw_status' field? That is inherently
> > an unsafe race all on its own; you're not guaranteed that this will be
> > read/written atomically. And you also aren't guaranteed that writes to
> > this happen in the order they appear in the code -- in other words,
> > reading this flag doesn't necessarily guarantee that initialization is
> > actually complete (even if that's very likely to be true, given that
> > it's probably just a single-instruction word-access, and any prior HW
> > polling or interrupts likely have done some synchronization and can't be
> > reordered).
> > actually complete
>
> Here is the brief info on how "hw_status" flag is updated.
> 1) It gets changed incrementally during initialization.
> MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE -> MWIFIEX_HW_STATUS_READY
>
> 2) Status will remain READY once driver+firmware is up and running.
>
> 3) Below is status during teardown
> MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET -> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
>
> As the events occur one after another, we don't expect a race and
> don't need locking here for the flag. Flag status
> MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.
It seems like, as with patch 1, you're mostly arguing about the writes
to this variable. But writes race with reads as well; how do you
guarantee that you're not seeing incorrect values of 'hw_status' here in
suspend() -- e.g., either old or new values of it, or even
partially-written values, if for some reason the compiler decides it
can't read/write this all in one go?
> In worst case scenario, only first system suspend attempt issued
> immediately after system boot will be aborted with BUSY error. I
> think, that should be fine.
(For the record, my concern about -EBUSY is separate from my concern
about the potential race condition.)
> Let me know if you have any concerns.
Sorry, I probably didn't completely flesh out my thought here.
I think I was concerned about a failed initialization causing the system
to never enter suspend again. So specifically: what happens if (e.g.)
the firmware fails to load? AFAICT, the device doesn't actually unbind
itself from the driver, so instead, you have a device in limbo that will
always return -EBUSY in suspend(), and your system can never again enter
suspend. Am I correct? If so, that doesn't sound great.
> > This is probably better than nothing, but it's not very good.
> >
> > > pr_err("Card or adapter structure is not valid\n");
> > > - return 0;
> > > + return -EBUSY;
> >
> > So the above cases all mean that the driver hasn't finished loading,
> > right?
> >
> > !card => can't happen (PCIe probe() would have failed
> > mwifiex_add_card()), but fine to check
>
> NULL "card" or "card->adapter" pointers are expected in below cases
> 1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
> 2) Race of teardown + suspend
>
> >
> > !card->adapter => only happens after patch 1; i.e., when tearing down
> > the device and detaching it from the driver
> >
> > card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not loaded
> > (i.e., in the process of starting or stopping FW?)
> >
> > I guess all of those cases make sense to be -EBUSY.
^^ I'm second-guessing my claim here then.
> Yes. we can keep -EBUSY for all of the cases.
Brian
Hi Amit,
On Thu, Oct 06, 2016 at 01:03:02PM +0000, Amitkumar Karwar wrote:
> > From: [email protected] [mailto:linux-wireless-
> > [email protected]] On Behalf Of Brian Norris
> > Sent: Wednesday, October 05, 2016 10:00 PM
> > To: Amitkumar Karwar
> > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> > [email protected]; Xinming Hu
> > Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device
> > unregister
> >
> > On Wed, Oct 05, 2016 at 02:04:53PM +0000, Amitkumar Karwar wrote:
> > > > From: Brian Norris [mailto:[email protected]]
> > > > Sent: Wednesday, October 05, 2016 3:28 AM
> > > > To: Amitkumar Karwar
> > > > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> > > > [email protected]; [email protected]; Xinming Hu
> > > > Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during
> > > > device unregister
> > > >
> > > > On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:
> >
> > > > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > > @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct
> > > > mwifiex_adapter *adapter)
> > > > > pci_disable_msi(pdev);
> > > > > }
> > > > > }
> > > > > + card->adapter = NULL;
...
> > What about writes racing with reads? You have lots of unsynchronized
> > cases that read this, although most of them should be halted by now
> > (e.g., cmd processing). I was looking at suspend() in particular, which
> > I thought you were looking at in this patch series.
>
> Please note that "card->adapter" is used only in pcie.c/sdio.c/usb.c files
>
> Writes won't have race with reads.
>
> 1) write 1 --- "card->adapter = adapter;" in mwifiex_register_dev()
> This place is at the beginning of initialization.
> mwifiex_pcie_probe() -> mwifiex_add_card() -> adapter->if_ops.register_dev()
> There is no chance that "card->adapter" is read anywhere at this point. FW is not yet downloaded
Sure.
> 2) write 2 ---- "card->adapter = NULL;" in mwifiex_unregister_dev()
> This place the end of teardown phase.
> Interrupts are disabled and all cleanup is done. We have
> "card->adapter" NULL checks at entry point of
> suspend/remove/resume, if they get called after this.
I guess the question boils down to: can driver suspend() race with
mwifiex_unregister_dev() here, then?
And I guess the answer is "no", because unregistration only happens via
PCIe driver remove() -> mwifiex_remove_card() -> adapter->if_ops.unregister_dev()
and I think the device driver core guarantees that suspend() and
remove() won't race.
In that case:
Reviewed-by: Brian Norris <[email protected]>
I'll reply to the latest revision too, since it's identical.
Thanks for the explanations.
Regards,
Brian
Hi,
On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:
> From: Xinming Hu <[email protected]>
>
> card->adapter gets initialized during device registration.
> As it's not cleared, we may end up accessing invalid memory
> in some corner cases. This patch fixes the problem.
>
> Signed-off-by: Xinming Hu <[email protected]>
> Signed-off-by: Amitkumar Karwar <[email protected]>
> ---
> v2: Same as v1
> ---
> drivers/net/wireless/marvell/mwifiex/pcie.c | 1 +
> drivers/net/wireless/marvell/mwifiex/sdio.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index f1eeb73..ba9e068 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
> pci_disable_msi(pdev);
> }
> }
> + card->adapter = NULL;
I think you have a similar problem here as in patch 2; there is no
locking to protect fields in struct pcie_service_card or struct
sdio_mmc_card below. That problem kind of already exists, except that
you only write the value of card->adapter once at registration time, so
it's not actually unsafe. But now that you're introducing a second
write, you have a problem.
Brian
> }
>
> /* This function initializes the PCI-E host memory space, WCB rings, etc.
> diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
> index 8718950..4cad1c2 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sdio.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
> @@ -2066,6 +2066,7 @@ mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
> struct sdio_mmc_card *card = adapter->card;
>
> if (adapter->card) {
> + card->adapter = NULL;
> sdio_claim_host(card->func);
> sdio_disable_func(card->func);
> sdio_release_host(card->func);
> --
> 1.9.1
>
Hi Brain,
> From: Brian Norris [mailto:[email protected]]
> Sent: Wednesday, October 05, 2016 10:12 PM
> To: Amitkumar Karwar
> Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> [email protected]; Xinming Hu
> Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> resume handlers
>
> Hi Amit,
>
> On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > > From: Brian Norris [mailto:[email protected]]
> > > Sent: Wednesday, October 05, 2016 2:35 AM
> > > To: Amitkumar Karwar
> > > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> > > [email protected]; [email protected]; Xinming Hu
> > > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> > > resume handlers
> > >
> > > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > > From: Xinming Hu <[email protected]>
> > > >
> > > > We have observed a kernel crash when system immediately suspends
> > > > after booting. There is a race between suspend and driver
> > > > initialization paths.
> > > > This patch adds hw_status checks to fix the problem
> > > >
> > > > Signed-off-by: Xinming Hu <[email protected]>
> > > > Signed-off-by: Amitkumar Karwar <[email protected]>
> > > > ---
> > > > v2: Return failure in suspend/resume handler in this scenario.
> > > > ---
> > > > drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > index ba9e068..fa6bf85 100644
> > > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > > *dev)
> > > >
> > > > if (pdev) {
> > > > card = pci_get_drvdata(pdev);
> > > > - if (!card || !card->adapter) {
> > > > + if (!card || !card->adapter ||
> > > > + card->adapter->hw_status !=
> MWIFIEX_HW_STATUS_READY) {
> > >
> > > Wait, is there no locking on the 'hw_status' field? That is
> > > inherently an unsafe race all on its own; you're not guaranteed that
> > > this will be read/written atomically. And you also aren't guaranteed
> > > that writes to this happen in the order they appear in the code --
> > > in other words, reading this flag doesn't necessarily guarantee that
> > > initialization is actually complete (even if that's very likely to
> > > be true, given that it's probably just a single-instruction
> > > word-access, and any prior HW polling or interrupts likely have done
> > > some synchronization and can't be reordered).
> > > actually complete
> >
> > Here is the brief info on how "hw_status" flag is updated.
> > 1) It gets changed incrementally during initialization.
> > MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE ->
> > MWIFIEX_HW_STATUS_READY
> >
> > 2) Status will remain READY once driver+firmware is up and running.
> >
> > 3) Below is status during teardown
> > MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET ->
> > MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
> >
> > As the events occur one after another, we don't expect a race and
> > don't need locking here for the flag. Flag status
> > MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.
>
> It seems like, as with patch 1, you're mostly arguing about the writes
> to this variable. But writes race with reads as well; how do you
> guarantee that you're not seeing incorrect values of 'hw_status' here in
> suspend() -- e.g., either old or new values of it, or even partially-
> written values, if for some reason the compiler decides it can't
> read/write this all in one go?
Got it. I think, we need to define "hw_status" as atomic variable to resolve the issue. I will create separate patch for this.
>
> > In worst case scenario, only first system suspend attempt issued
> > immediately after system boot will be aborted with BUSY error. I
> > think, that should be fine.
>
> (For the record, my concern about -EBUSY is separate from my concern
> about the potential race condition.)
>
> > Let me know if you have any concerns.
>
> Sorry, I probably didn't completely flesh out my thought here.
>
> I think I was concerned about a failed initialization causing the system
> to never enter suspend again. So specifically: what happens if (e.g.)
> the firmware fails to load? AFAICT, the device doesn't actually unbind
> itself from the driver, so instead, you have a device in limbo that will
> always return -EBUSY in suspend(), and your system can never again enter
> suspend. Am I correct? If so, that doesn't sound great.
You are right. I will add an extra check for this so that both the cases would be handled
Case 1:
Firmware has gone bad or has failed to initialize. We will return zero here, so that it won't block subsequent suspend attempts.
Case 2:
Init or teardown is in process. We will return -EBUSY here. Next suspend attempt would be successful.
I will submit v3 with these changes shortly.
Regards,
Amitkumar
Hi Brian,
On Mon, Oct 10, 2016 at 4:47 PM, Brian Norris <[email protected]> wrote:
> Hi Dmitry,
>
> On Mon, Oct 10, 2016 at 04:43:06PM -0700, Dmitry Torokhov wrote:
>> On Thu, Oct 6, 2016 at 6:03 AM, Amitkumar Karwar <[email protected]> wrote:
>> >> From: [email protected] [mailto:linux-wireless-
>> >> [email protected]] On Behalf Of Brian Norris
>> >> Sent: Wednesday, October 05, 2016 10:00 PM
>> >> To: Amitkumar Karwar
>> >> Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
>> >> [email protected]; Xinming Hu
>> >> Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device
>> >> unregister
>> >>
>> >> On Wed, Oct 05, 2016 at 02:04:53PM +0000, Amitkumar Karwar wrote:
>> >> > > From: Brian Norris [mailto:[email protected]]
>> >> > > Sent: Wednesday, October 05, 2016 3:28 AM
>> >> > > To: Amitkumar Karwar
>> >> > > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
>> >> > > [email protected]; [email protected]; Xinming Hu
>> >> > > Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during
>> >> > > device unregister
>> >> > >
>> >> > > On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:
>> >>
>> >> > > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
>> >> > > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> >> > > > @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct
>> >> > > mwifiex_adapter *adapter)
>> >> > > > pci_disable_msi(pdev);
>> >> > > > }
>> >> > > > }
>> >> > > > + card->adapter = NULL;
>> >> > >
>> >> > > I think you have a similar problem here as in patch 2; there is no
>> >> > > locking to protect fields in struct pcie_service_card or struct
>> >> > > sdio_mmc_card below. That problem kind of already exists, except
>> >> > > that you only write the value of card->adapter once at registration
>> >> > > time, so it's not actually unsafe. But now that you're introducing a
>> >> > > second write, you have a problem.
>> >> > >
>> >> > > Brian
>> >> > >
>> >> >
>> >> > We have a global "add_remove_card_sem" semaphore in our code for
>> >> > synchronizing initialization and teardown threads. Ideally "init +
>> >> > teardown/reboot" should not have a race issue with this logic
>> >> >
>> >> > Later there was a loophole introduced in this after using async
>> >> > firmware download API. During initialization, firmware downloads
>> >> > asynchronously in a separate thread where might have released the
>> >> > semaphore. I am working on a patch to fix this.
>> >> >
>> >> > So "card->adapter" doesn't need to have locking. Even if we have two
>> >> > write operations, those two threads can't run simultaneously due to
>> >> > above mentioned logic.
>> >>
>> >> What about writes racing with reads? You have lots of unsynchronized
>> >> cases that read this, although most of them should be halted by now
>> >> (e.g., cmd processing). I was looking at suspend() in particular, which
>> >> I thought you were looking at in this patch series.
>> >
>> > Please note that "card->adapter" is used only in pcie.c/sdio.c/usb.c files
>> >
>> > Writes won't have race with reads.
>> >
>> > 1) write 1 --- "card->adapter = adapter;" in mwifiex_register_dev()
>> > This place is at the beginning of initialization.
>> > mwifiex_pcie_probe() -> mwifiex_add_card() -> adapter->if_ops.register_dev()
>> > There is no chance that "card->adapter" is read anywhere at this point. FW is not yet downloaded
>> >
>> > 2) write 2 ---- "card->adapter = NULL;" in mwifiex_unregister_dev()
>> > This place the end of teardown phase.
>> > Interrupts are disabled and all cleanup is done. We have "card->adapter" NULL checks at entry point of suspend/remove/resume, if they get called after this.
>>
>> How exactly are you preventing ->suspend() from being called *while*
>> you are running mwifiex_unregister_dev()? I.e. user slamming the lid
>> of a laptop and throwing it in their backpack?
>
> As I already commented, isn't this primarily [*] called from the driver
> remove() callback? remove() doesn't race with suspend(), does it?
Nope, there is request_firmware_nowait() path that is asynchronous
with device registration/unregistration and power management. Maybe
the right way is to move the driver to async probing and switch that
request_firmware_nowait() into plain request_firmware(). I haven't
looked that closely.
The wording "we may end up accessing invalid memory in some corner
cases" which is a synonym for "the code is racy" caught my eye, thus
the response on the patch.
Thanks.
--
Dmitry
Hi Dmitry,
On Mon, Oct 10, 2016 at 04:43:06PM -0700, Dmitry Torokhov wrote:
> On Thu, Oct 6, 2016 at 6:03 AM, Amitkumar Karwar <[email protected]> wrote:
> >> From: [email protected] [mailto:linux-wireless-
> >> [email protected]] On Behalf Of Brian Norris
> >> Sent: Wednesday, October 05, 2016 10:00 PM
> >> To: Amitkumar Karwar
> >> Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> >> [email protected]; Xinming Hu
> >> Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device
> >> unregister
> >>
> >> On Wed, Oct 05, 2016 at 02:04:53PM +0000, Amitkumar Karwar wrote:
> >> > > From: Brian Norris [mailto:[email protected]]
> >> > > Sent: Wednesday, October 05, 2016 3:28 AM
> >> > > To: Amitkumar Karwar
> >> > > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> >> > > [email protected]; [email protected]; Xinming Hu
> >> > > Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during
> >> > > device unregister
> >> > >
> >> > > On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:
> >>
> >> > > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> >> > > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> >> > > > @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct
> >> > > mwifiex_adapter *adapter)
> >> > > > pci_disable_msi(pdev);
> >> > > > }
> >> > > > }
> >> > > > + card->adapter = NULL;
> >> > >
> >> > > I think you have a similar problem here as in patch 2; there is no
> >> > > locking to protect fields in struct pcie_service_card or struct
> >> > > sdio_mmc_card below. That problem kind of already exists, except
> >> > > that you only write the value of card->adapter once at registration
> >> > > time, so it's not actually unsafe. But now that you're introducing a
> >> > > second write, you have a problem.
> >> > >
> >> > > Brian
> >> > >
> >> >
> >> > We have a global "add_remove_card_sem" semaphore in our code for
> >> > synchronizing initialization and teardown threads. Ideally "init +
> >> > teardown/reboot" should not have a race issue with this logic
> >> >
> >> > Later there was a loophole introduced in this after using async
> >> > firmware download API. During initialization, firmware downloads
> >> > asynchronously in a separate thread where might have released the
> >> > semaphore. I am working on a patch to fix this.
> >> >
> >> > So "card->adapter" doesn't need to have locking. Even if we have two
> >> > write operations, those two threads can't run simultaneously due to
> >> > above mentioned logic.
> >>
> >> What about writes racing with reads? You have lots of unsynchronized
> >> cases that read this, although most of them should be halted by now
> >> (e.g., cmd processing). I was looking at suspend() in particular, which
> >> I thought you were looking at in this patch series.
> >
> > Please note that "card->adapter" is used only in pcie.c/sdio.c/usb.c files
> >
> > Writes won't have race with reads.
> >
> > 1) write 1 --- "card->adapter = adapter;" in mwifiex_register_dev()
> > This place is at the beginning of initialization.
> > mwifiex_pcie_probe() -> mwifiex_add_card() -> adapter->if_ops.register_dev()
> > There is no chance that "card->adapter" is read anywhere at this point. FW is not yet downloaded
> >
> > 2) write 2 ---- "card->adapter = NULL;" in mwifiex_unregister_dev()
> > This place the end of teardown phase.
> > Interrupts are disabled and all cleanup is done. We have "card->adapter" NULL checks at entry point of suspend/remove/resume, if they get called after this.
>
> How exactly are you preventing ->suspend() from being called *while*
> you are running mwifiex_unregister_dev()? I.e. user slamming the lid
> of a laptop and throwing it in their backpack?
As I already commented, isn't this primarily [*] called from the driver
remove() callback? remove() doesn't race with suspend(), does it?
[*] The other cases are in error handling cases. I guess I should make
sure those didn't race too...
Brian
On Thu, Oct 6, 2016 at 6:03 AM, Amitkumar Karwar <[email protected]> wrote:
> Hi Brian,
>
>> From: [email protected] [mailto:linux-wireless-
>> [email protected]] On Behalf Of Brian Norris
>> Sent: Wednesday, October 05, 2016 10:00 PM
>> To: Amitkumar Karwar
>> Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
>> [email protected]; Xinming Hu
>> Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device
>> unregister
>>
>> Hi,
>>
>> On Wed, Oct 05, 2016 at 02:04:53PM +0000, Amitkumar Karwar wrote:
>> > > From: Brian Norris [mailto:[email protected]]
>> > > Sent: Wednesday, October 05, 2016 3:28 AM
>> > > To: Amitkumar Karwar
>> > > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
>> > > [email protected]; [email protected]; Xinming Hu
>> > > Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during
>> > > device unregister
>> > >
>> > > Hi,
>> > >
>> > > On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:
>>
>> > > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
>> > > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> > > > @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct
>> > > mwifiex_adapter *adapter)
>> > > > pci_disable_msi(pdev);
>> > > > }
>> > > > }
>> > > > + card->adapter = NULL;
>> > >
>> > > I think you have a similar problem here as in patch 2; there is no
>> > > locking to protect fields in struct pcie_service_card or struct
>> > > sdio_mmc_card below. That problem kind of already exists, except
>> > > that you only write the value of card->adapter once at registration
>> > > time, so it's not actually unsafe. But now that you're introducing a
>> > > second write, you have a problem.
>> > >
>> > > Brian
>> > >
>> >
>> > We have a global "add_remove_card_sem" semaphore in our code for
>> > synchronizing initialization and teardown threads. Ideally "init +
>> > teardown/reboot" should not have a race issue with this logic
>> >
>> > Later there was a loophole introduced in this after using async
>> > firmware download API. During initialization, firmware downloads
>> > asynchronously in a separate thread where might have released the
>> > semaphore. I am working on a patch to fix this.
>> >
>> > So "card->adapter" doesn't need to have locking. Even if we have two
>> > write operations, those two threads can't run simultaneously due to
>> > above mentioned logic.
>>
>> What about writes racing with reads? You have lots of unsynchronized
>> cases that read this, although most of them should be halted by now
>> (e.g., cmd processing). I was looking at suspend() in particular, which
>> I thought you were looking at in this patch series.
>
> Please note that "card->adapter" is used only in pcie.c/sdio.c/usb.c files
>
> Writes won't have race with reads.
>
> 1) write 1 --- "card->adapter = adapter;" in mwifiex_register_dev()
> This place is at the beginning of initialization.
> mwifiex_pcie_probe() -> mwifiex_add_card() -> adapter->if_ops.register_dev()
> There is no chance that "card->adapter" is read anywhere at this point. FW is not yet downloaded
>
> 2) write 2 ---- "card->adapter = NULL;" in mwifiex_unregister_dev()
> This place the end of teardown phase.
> Interrupts are disabled and all cleanup is done. We have "card->adapter" NULL checks at entry point of suspend/remove/resume, if they get called after this.
How exactly are you preventing ->suspend() from being called *while*
you are running mwifiex_unregister_dev()? I.e. user slamming the lid
of a laptop and throwing it in their backpack?
Thanks.
--
Dmitry
Hi,
On Wed, Oct 05, 2016 at 02:04:53PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:[email protected]]
> > Sent: Wednesday, October 05, 2016 3:28 AM
> > To: Amitkumar Karwar
> > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> > [email protected]; [email protected]; Xinming Hu
> > Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device
> > unregister
> >
> > Hi,
> >
> > On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct
> > mwifiex_adapter *adapter)
> > > pci_disable_msi(pdev);
> > > }
> > > }
> > > + card->adapter = NULL;
> >
> > I think you have a similar problem here as in patch 2; there is no
> > locking to protect fields in struct pcie_service_card or struct
> > sdio_mmc_card below. That problem kind of already exists, except that
> > you only write the value of card->adapter once at registration time, so
> > it's not actually unsafe. But now that you're introducing a second
> > write, you have a problem.
> >
> > Brian
> >
>
> We have a global "add_remove_card_sem" semaphore in our code for
> synchronizing initialization and teardown threads. Ideally "init +
> teardown/reboot" should not have a race issue with this logic
>
> Later there was a loophole introduced in this after using async
> firmware download API. During initialization, firmware downloads
> asynchronously in a separate thread where might have released the
> semaphore. I am working on a patch to fix this.
>
> So "card->adapter" doesn't need to have locking. Even if we have two
> write operations, those two threads can't run simultaneously due to
> above mentioned logic.
What about writes racing with reads? You have lots of unsynchronized
cases that read this, although most of them should be halted by now
(e.g., cmd processing). I was looking at suspend() in particular, which
I thought you were looking at in this patch series.
Brian
Hi Brian,
> From: Brian Norris [mailto:[email protected]]
> Sent: Wednesday, October 05, 2016 3:28 AM
> To: Amitkumar Karwar
> Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> [email protected]; [email protected]; Xinming Hu
> Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device
> unregister
>
> Hi,
>
> On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:
> > From: Xinming Hu <[email protected]>
> >
> > card->adapter gets initialized during device registration.
> > As it's not cleared, we may end up accessing invalid memory in some
> > corner cases. This patch fixes the problem.
> >
> > Signed-off-by: Xinming Hu <[email protected]>
> > Signed-off-by: Amitkumar Karwar <[email protected]>
> > ---
> > v2: Same as v1
> > ---
> > drivers/net/wireless/marvell/mwifiex/pcie.c | 1 +
> > drivers/net/wireless/marvell/mwifiex/sdio.c | 1 +
> > 2 files changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > index f1eeb73..ba9e068 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct
> mwifiex_adapter *adapter)
> > pci_disable_msi(pdev);
> > }
> > }
> > + card->adapter = NULL;
>
> I think you have a similar problem here as in patch 2; there is no
> locking to protect fields in struct pcie_service_card or struct
> sdio_mmc_card below. That problem kind of already exists, except that
> you only write the value of card->adapter once at registration time, so
> it's not actually unsafe. But now that you're introducing a second
> write, you have a problem.
>
> Brian
>
We have a global "add_remove_card_sem" semaphore in our code for synchronizing initialization and teardown threads. Ideally "init + teardown/reboot" should not have a race issue with this logic
Later there was a loophole introduced in this after using async firmware download API. During initialization, firmware downloads asynchronously in a separate thread where might have released the semaphore. I am working on a patch to fix this.
So "card->adapter" doesn't need to have locking. Even if we have two write operations, those two threads can't run simultaneously due to above mentioned logic.
Regards,
Amitkumar
Hi Brian
I think your concern is right, we will update patch to handle the case you mentioned below.
If our firmware is dead or we init failure, we should allow the system to suspend.
We have following hardware status supported in driver. We should return -EBUSY only when hardware status is
MWIFIEX_HW_STATUS_INITIALIZING/MWIFIEX_HW_STATUS_INIT_DONE/ MWIFIEX_HW_STATUS_CLOSING.
enum MWIFIEX_HARDWARE_STATUS {
MWIFIEX_HW_STATUS_READY,
MWIFIEX_HW_STATUS_INITIALIZING,
MWIFIEX_HW_STATUS_INIT_DONE,
MWIFIEX_HW_STATUS_RESET,
MWIFIEX_HW_STATUS_CLOSING,
MWIFIEX_HW_STATUS_NOT_READY
};
Amit will help prepare the new patch to handle this.
Regards
Cathy
-----Original Message-----
From: Brian Norris [mailto:[email protected]]
Sent: Wednesday, October 05, 2016 9:42 AM
To: Amitkumar Karwar
Cc: [email protected]; Cathy Luo; Nishant Sarmukadam; [email protected]; Xinming Hu
Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and resume handlers
Hi Amit,
On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:[email protected]]
> > Sent: Wednesday, October 05, 2016 2:35 AM
> > To: Amitkumar Karwar
> > Cc: [email protected]; Cathy Luo; Nishant Sarmukadam;
> > [email protected]; [email protected]; Xinming Hu
> > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> > resume handlers
> >
> > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <[email protected]>
> > >
> > > We have observed a kernel crash when system immediately suspends
> > > after booting. There is a race between suspend and driver
> > > initialization paths.
> > > This patch adds hw_status checks to fix the problem
> > >
> > > Signed-off-by: Xinming Hu <[email protected]>
> > > Signed-off-by: Amitkumar Karwar <[email protected]>
> > > ---
> > > v2: Return failure in suspend/resume handler in this scenario.
> > > ---
> > > drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index ba9e068..fa6bf85 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > *dev)
> > >
> > > if (pdev) {
> > > card = pci_get_drvdata(pdev);
> > > - if (!card || !card->adapter) {
> > > + if (!card || !card->adapter ||
> > > + card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> >
> > Wait, is there no locking on the 'hw_status' field? That is
> > inherently an unsafe race all on its own; you're not guaranteed that
> > this will be read/written atomically. And you also aren't guaranteed
> > that writes to this happen in the order they appear in the code --
> > in other words, reading this flag doesn't necessarily guarantee that
> > initialization is actually complete (even if that's very likely to
> > be true, given that it's probably just a single-instruction
> > word-access, and any prior HW polling or interrupts likely have done
> > some synchronization and can't be reordered).
> > actually complete
>
> Here is the brief info on how "hw_status" flag is updated.
> 1) It gets changed incrementally during initialization.
> MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE ->
> MWIFIEX_HW_STATUS_READY
>
> 2) Status will remain READY once driver+firmware is up and running.
>
> 3) Below is status during teardown
> MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET ->
> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
>
> As the events occur one after another, we don't expect a race and
> don't need locking here for the flag. Flag status
> MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.
It seems like, as with patch 1, you're mostly arguing about the writes to this variable. But writes race with reads as well; how do you guarantee that you're not seeing incorrect values of 'hw_status' here in
suspend() -- e.g., either old or new values of it, or even partially-written values, if for some reason the compiler decides it can't read/write this all in one go?
> In worst case scenario, only first system suspend attempt issued
> immediately after system boot will be aborted with BUSY error. I
> think, that should be fine.
(For the record, my concern about -EBUSY is separate from my concern about the potential race condition.)
> Let me know if you have any concerns.
Sorry, I probably didn't completely flesh out my thought here.
I think I was concerned about a failed initialization causing the system to never enter suspend again. So specifically: what happens if (e.g.) the firmware fails to load? AFAICT, the device doesn't actually unbind itself from the driver, so instead, you have a device in limbo that will always return -EBUSY in suspend(), and your system can never again enter suspend. Am I correct? If so, that doesn't sound great.
> > This is probably better than nothing, but it's not very good.
> >
> > > pr_err("Card or adapter structure is not valid\n");
> > > - return 0;
> > > + return -EBUSY;
> >
> > So the above cases all mean that the driver hasn't finished loading,
> > right?
> >
> > !card => can't happen (PCIe probe() would have failed
> > mwifiex_add_card()), but fine to check
>
> NULL "card" or "card->adapter" pointers are expected in below cases
> 1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
> 2) Race of teardown + suspend
>
> >
> > !card->adapter => only happens after patch 1; i.e., when tearing
> > down the device and detaching it from the driver
> >
> > card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not
> > card->adapter->loaded
> > (i.e., in the process of starting or stopping FW?)
> >
> > I guess all of those cases make sense to be -EBUSY.
^^ I'm second-guessing my claim here then.
> Yes. we can keep -EBUSY for all of the cases.
Brian
From: Xinming Hu <[email protected]>
We have observed a kernel crash when system immediately suspends
after booting. There is a race between suspend and driver
initialization paths.
This patch adds hw_status checks to fix the problem
Signed-off-by: Xinming Hu <[email protected]>
Signed-off-by: Amitkumar Karwar <[email protected]>
---
v2: Return failure in suspend/resume handler in this scenario.
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index ba9e068..fa6bf85 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device *dev)
if (pdev) {
card = pci_get_drvdata(pdev);
- if (!card || !card->adapter) {
+ if (!card || !card->adapter ||
+ card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
pr_err("Card or adapter structure is not valid\n");
- return 0;
+ return -EBUSY;
}
} else {
pr_err("PCIE device is not specified\n");
@@ -166,9 +167,10 @@ static int mwifiex_pcie_resume(struct device *dev)
if (pdev) {
card = pci_get_drvdata(pdev);
- if (!card || !card->adapter) {
+ if (!card || !card->adapter ||
+ card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
pr_err("Card or adapter structure is not valid\n");
- return 0;
+ return -EBUSY;
}
} else {
pr_err("PCIE device is not specified\n");
--
1.9.1
Hi,
On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> From: Xinming Hu <[email protected]>
>
> We have observed a kernel crash when system immediately suspends
> after booting. There is a race between suspend and driver
> initialization paths.
> This patch adds hw_status checks to fix the problem
>
> Signed-off-by: Xinming Hu <[email protected]>
> Signed-off-by: Amitkumar Karwar <[email protected]>
> ---
> v2: Return failure in suspend/resume handler in this scenario.
> ---
> drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index ba9e068..fa6bf85 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device *dev)
>
> if (pdev) {
> card = pci_get_drvdata(pdev);
> - if (!card || !card->adapter) {
> + if (!card || !card->adapter ||
> + card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
Wait, is there no locking on the 'hw_status' field? That is inherently
an unsafe race all on its own; you're not guaranteed that this will be
read/written atomically. And you also aren't guaranteed that writes to
this happen in the order they appear in the code -- in other words,
reading this flag doesn't necessarily guarantee that initialization is
actually complete (even if that's very likely to be true, given that
it's probably just a single-instruction word-access, and any prior HW
polling or interrupts likely have done some synchronization and can't be
reordered).
This is probably better than nothing, but it's not very good.
> pr_err("Card or adapter structure is not valid\n");
> - return 0;
> + return -EBUSY;
So the above cases all mean that the driver hasn't finished loading,
right?
!card => can't happen (PCIe probe() would have failed
mwifiex_add_card()), but fine to check
!card->adapter => only happens after patch 1; i.e., when tearing down
the device and detaching it from the driver
card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not loaded
(i.e., in the process of starting or stopping FW?)
I guess all of those cases make sense to be -EBUSY.
> }
> } else {
> pr_err("PCIE device is not specified\n");
I was going to complain about this branch (!pdev) returning 0, since
that looked asymmetric. But this case will never happen, actually, since
to_pci_dev() is just doing struct offset arithmetic on struct device,
and struct device is never going to be NULL. It probably makes more
sense to just remove this branch entirely, but that's for another patch.
> @@ -166,9 +167,10 @@ static int mwifiex_pcie_resume(struct device *dev)
>
> if (pdev) {
> card = pci_get_drvdata(pdev);
> - if (!card || !card->adapter) {
> + if (!card || !card->adapter ||
> + card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
Same complaint, except if you've screwed up here, you probably already
screwed up in suspend().
Brian
> pr_err("Card or adapter structure is not valid\n");
> - return 0;
> + return -EBUSY;
> }
> } else {
> pr_err("PCIE device is not specified\n");
> --
> 1.9.1
>