2013-03-15 22:03:51

by Daniel Drake

[permalink] [raw]
Subject: mwifiex frequent "not allowed while suspended" crash on resume

Hi,

When the XO-4 with 8787 wireless is woken up due to wake-on-WLAN,
mwifiex is often unhappy and reports this:

mwifiex_sdio mmc0:0001:1: mwifiex_write_data_sync: not allowed while suspended
mwifiex_sdio mmc0:0001:1: host_to_card, write iomem (1) failed: -1
mwifiex_sdio mmc0:0001:1: mwifiex_write_data_sync: not allowed while suspended
mwifiex_sdio mmc0:0001:1: host_to_card, write iomem (2) failed: -1
mwifiex_sdio mmc0:0001:1: mwifiex_write_data_sync: not allowed while suspended
mwifiex_sdio mmc0:0001:1: host_to_card, write iomem (3) failed: -1
mwifiex_sdio mmc0:0001:1: mwifiex_write_data_async failed: 0xFFFFFFFF
mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
[...]

After this flood of messages, the interface is unusable.

This can be readily reproduced when putting the XO-4 in a loop where
it goes to sleep due to inactivity, but then wakes up due to an
incoming ping. The error is hit within an hour or two.

I think it shows a race where the upper mwifiex layers try to do card
communication before mwifiex_sdio_resume() has done its work - there
is some serialization missing.

This can be reproduced on every single ping-invoked wakeup if you
insert a msleep(1000) inside mwifiex_sdio_resume() e.g. here:

if (!adapter->is_suspended) {
dev_warn(adapter->dev, "device already resumed\n");
return 0;
}

msleep(1000);
adapter->is_suspended = false;


Thanks for any help you can offer.
Daniel


2013-03-19 06:44:42

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex frequent "not allowed while suspended" crash on resume

Hi Daniel,

> Hi,
>
> When the XO-4 with 8787 wireless is woken up due to wake-on-WLAN,
> mwifiex is often unhappy and reports this:
>
> mwifiex_sdio mmc0:0001:1: mwifiex_write_data_sync: not allowed while suspended
> mwifiex_sdio mmc0:0001:1: host_to_card, write iomem (1) failed: -1

This looks like the same issue with old XO platform where mmc gets timeout.
Do you see "mmc0: Timeout waiting for hardware interrupt" message before mwifiex errors?

> mwifiex_sdio mmc0:0001:1: mwifiex_write_data_sync: not allowed while suspended
> mwifiex_sdio mmc0:0001:1: host_to_card, write iomem (2) failed: -1
> mwifiex_sdio mmc0:0001:1: mwifiex_write_data_sync: not allowed while suspended
> mwifiex_sdio mmc0:0001:1: host_to_card, write iomem (3) failed: -1
> mwifiex_sdio mmc0:0001:1: mwifiex_write_data_async failed: 0xFFFFFFFF
> mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
> mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
> mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
> mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
> mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
> mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
> mwifiex_sdio mmc0:0001:1: mwifiex_sdio_host_to_card: no wr_port available
> [...]
>
> After this flood of messages, the interface is unusable.
>
> This can be readily reproduced when putting the XO-4 in a loop where
> it goes to sleep due to inactivity, but then wakes up due to an
> incoming ping. The error is hit within an hour or two.
>
> I think it shows a race where the upper mwifiex layers try to do card
> communication before mwifiex_sdio_resume() has done its work - there
> is some serialization missing.
>
> This can be reproduced on every single ping-invoked wakeup if you
> insert a msleep(1000) inside mwifiex_sdio_resume() e.g. here:
>
> if (!adapter->is_suspended) {
> dev_warn(adapter->dev, "device already resumed\n");
> return 0;
> }
>
> msleep(1000);
> adapter->is_suspended = false;

WLAN driver expects that the resume handler is called and returned before its interrupt handler is called. Adding a second delay for clearing is_suspended flag almost guarantees the failure of "...not allowed while suspended".

I will think about how to serialize the resume/interrupt sequence.

Thanks,
Bing

>
>
> Thanks for any help you can offer.
> Daniel

2013-03-19 13:07:07

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex frequent "not allowed while suspended" crash on resume

On Tue, Mar 19, 2013 at 12:43 AM, Bing Zhao <[email protected]> wrote:
> Hi Daniel,
>> mwifiex_sdio mmc0:0001:1: mwifiex_write_data_sync: not allowed while suspended
>> mwifiex_sdio mmc0:0001:1: host_to_card, write iomem (1) failed: -1
>
> This looks like the same issue with old XO platform where mmc gets timeout.
> Do you see "mmc0: Timeout waiting for hardware interrupt" message before mwifiex errors?

No. The messages I pasted are the first mwifiex messages produced on resume.

Daniel

2013-04-05 22:13:37

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex frequent "not allowed while suspended" crash on resume

On Tue, Mar 19, 2013 at 12:43 AM, Bing Zhao <[email protected]> wrote:
> I will think about how to serialize the resume/interrupt sequence.

As we haven't heard back from you in a while, and this is blocking
several things on our end, I had a quick look myself.

The problem seems to be TX soon after resume, and there does seem to
be a bit of a lack of synchronization here (i.e. mwifiex_main_process
should maybe avoid doing TX while the interface is suspended).

As a quick hack I added netdev_device_detach() early in suspend, and
netdev_device_attach() late in resume, and that seems to serialize
things. But I don't think its a good fix. Hopefully you can find a
solution that flows better with the mwifiex design.

Thanks
Daniel

2013-04-26 19:02:35

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex frequent "not allowed while suspended" crash on resume

On Thu, Apr 25, 2013 at 6:40 PM, Bing Zhao <[email protected]> wrote:
> No, it's not OK to transmit before the host sleep is cancelled.
> I was under the assumption that no TX packet is from kernel during the time we cancel host sleep. This assumption can be wrong without netif_device_detach, etc.

OK, this is a fair statement to make. But the presence of (not
entirely working) mechanisms in the code such as the is_suspended flag
that seem to work towards the same goal don't really seem to agree.

Maybe we can drop is_suspended and just stop the TX queue as
appropriate (via netif_device_detach or similar). That would leave us
with both a clear design and a working implementation.

Unless I am missing something, is_suspended only really seems to have
what I would regard as a clear and valid use in the USB subdriver.

> Could you try resetting is_suspend flag when an early RX is received?
>
> +++ b/drivers/net/wireless/mwifiex/cmdevt.c
> @@ -1191,6 +1191,7 @@ mwifiex_process_hs_config(struct mwifiex_adapter *adapter)
> adapter->if_ops.wakeup(adapter);
> adapter->hs_activated = false;
> adapter->is_hs_configured = false;
> + adapter->is_suspended = false;
> mwifiex_hs_activated_event(mwifiex_get_priv(adapter,
> MWIFIEX_BSS_ROLE_ANY),
> false);

This seems to avoid the issue: no error messages during resume, and
the interface remains functional.

Thanks
Daniel

2013-04-16 20:59:44

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex frequent "not allowed while suspended" crash on resume

On Fri, Apr 5, 2013 at 5:48 PM, Bing Zhao <[email protected]> wrote:
> Normally TX is blocked until resume handler is called and host sleep handshake between driver and firmware is done.

Where is the code that makes such synchronization happen?

> Does the TX happen before "hs_deactivated" event?

hard_start_xmit is called approximately 1ms after hs_deactivated arrives.

> I'm thinking of releasing IRQ in driver suspend handler and re-claim the IRQ in resume. Of course it needs some changes in sdio_release_irq to skip SDIO_CCCR_IENx disabling, otherwise host may not be woken up by SDIO DAT1.
>
> Attached is the sample code, not tested.

That patch breaks wake-on-LAN, the suspended system simply doesn't
respond to incoming packets.

Daniel

2013-04-25 19:48:34

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex frequent "not allowed while suspended" crash on resume

On Thu, Apr 18, 2013 at 1:20 PM, Bing Zhao <[email protected]> wrote:
> Hi Daniel,
>
>> > Normally TX is blocked until resume handler is called and host sleep handshake between driver and
>> firmware is done.
>>
>> Where is the code that makes such synchronization happen?
>
> If a data packet is received unexpectedly during suspended, driver blocks it and displays a message "not allowed while suspended".

I think we have enough evidence to show that the handling of this
condition is rather broken and breaks future communication with the
card.

I also don't regard such behaviour as "blocking" since when the data
packet fails to be handled, it is not blocked and processed later, it
is discarded.

If blocking is desired (sounds sensible) I would expect the packet not
to reach the mwifiex_sdio layer at all, until it has flagged itself as
ready to continue handling such packets.

> Earlier you mentioned that doing netif attach/detach in suspend/resume would work around this problem.
> It reminders me that previously we set carrier off in suspend so that kernel won't send data packets to driver. That carrier-off code was removed in recent commit 1499d9f to fix a missing ping response issue right after resume. Perhaps it's worth a revisit to this change.

I am confident that this bug also existed before that change, but it
might indeed have made it more likely now.

Regarding the attach/detach possibility, I would say the open question
is: does this issue expose a bug in the mwifiex logic for handling TX
etc? If so, that bug should be fixed. On the other hand, if the
mwifiex logic is *designed* under the idea that no TX will ever happen
at these moments, then putting netif_attach/detatch in the right place
is an OK solution. I have looked at the mwifiex code and history but I
haven't been able to determine what the design here is supposed to be.

>> > Does the TX happen before "hs_deactivated" event?
>>
>> hard_start_xmit is called approximately 1ms after hs_deactivated arrives.
>
> When hs_deactivated event arrives, 'is_suspended' flag must have been cleared already. The tx packets should be allowed to send freely. I don't know why the message "not allowed while suspended" is fired in this case.
> There could be a race condition somewhere though. Could you please send a complete dynamic_debug enabled log showing all the sequence?

I assume the code you are referring to is in mwifiex_sdio_resume():

adapter->is_suspended = false;

/* Disable Host Sleep */
mwifiex_cancel_hs(mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_STA),
MWIFIEX_ASYNC_CMD);

i.e. the cancel_hs() call results in the hs_deactivated event
arriving, at which point we are free to transmit.

The order of events in this code makes me uneasy. You suggested
earlier that is_suspended is the flag that indicates from the
mwifiex_sdio layer "I'm ready to transmit again". But is it really OK
to transmit before the host sleep is cancelled, as the above ordering
suggests?

In this particular case, as you can see in the dynamic debug logs, it
looks like the whole situation is confused here due to the arrival of
an interrupt which causes mwifiex to take an early exit from the host
sleep.

http://dev.laptop.org/~dsd/20130425/mwifiex-earlytx.log

>>
>> > I'm thinking of releasing IRQ in driver suspend handler and re-claim the IRQ in resume. Of course
>> it needs some changes in sdio_release_irq to skip SDIO_CCCR_IENx disabling, otherwise host may not be
>> woken up by SDIO DAT1.
>> >
>> > Attached is the sample code, not tested.
>>
>> That patch breaks wake-on-LAN, the suspended system simply doesn't
>> respond to incoming packets.
>
> Could you confirm that you saw this message on suspend?
>
> "SDIO: KEEP IRQ ON for ..."

No, that message doesn't appear.
I can't quite grasp what you're trying to do in that patch. Where does
mwifiex even set MMC_PM_WAKE_SDIO_IRQ?

Admittedly we are using a slightly different suspend routine that
predates the upstream mwifiex implementation. But I hadn't noticed any
interesting differences until now. We do set MMC_PM_WAKE_SDIO_IRQ,
after your patch it looks like this:

/* Enable the Host Sleep */
hs_actived = mwifiex_enable_hs(adapter);
if (hs_actived) {
sdio_claim_host(card->func);
sdio_release_irq(func);
sdio_release_host(card->func);
pr_info("cmd: suspend with MMC_PM_KEEP_POWER|MMC_PM_WAKE_SDIO_IRQ\n");
ret = sdio_set_host_pm_flags(func,
MMC_PM_KEEP_POWER|
MMC_PM_WAKE_SDIO_IRQ);
}

This one hunk had to be applied by hand, but I thought I stuck to the
same approach as in the patch: release the irq before calling
set_host_pm_flags.

But the logic in the patch suggests that you would expect the
WAKE_SDIO_IRQ flag to be set before sdio_release_irq() is called,
triggering the "KEEP IRQ ON" message that you're looking for.

Thanks
Daniel

2013-04-05 23:49:08

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex frequent "not allowed while suspended" crash on resume

Hi Daniel,

> As we haven't heard back from you in a while, and this is blocking
> several things on our end, I had a quick look myself.
>
> The problem seems to be TX soon after resume, and there does seem to
> be a bit of a lack of synchronization here (i.e. mwifiex_main_process
> should maybe avoid doing TX while the interface is suspended).

Normally TX is blocked until resume handler is called and host sleep handshake between driver and firmware is done. Does the TX happen before "hs_deactivated" event?

>
> As a quick hack I added netdev_device_detach() early in suspend, and
> netdev_device_attach() late in resume, and that seems to serialize
> things. But I don't think its a good fix. Hopefully you can find a
> solution that flows better with the mwifiex design.

I'm thinking of releasing IRQ in driver suspend handler and re-claim the IRQ in resume. Of course it needs some changes in sdio_release_irq to skip SDIO_CCCR_IENx disabling, otherwise host may not be woken up by SDIO DAT1.

Attached is the sample code, not tested.

Thanks,
Bing


Attachments:
release_irq_in_suspend.diff (2.19 kB)
release_irq_in_suspend.diff

2013-04-26 00:41:07

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex frequent "not allowed while suspended" crash on resume

Hi Daniel,

> The order of events in this code makes me uneasy. You suggested
> earlier that is_suspended is the flag that indicates from the
> mwifiex_sdio layer "I'm ready to transmit again". But is it really OK
> to transmit before the host sleep is cancelled, as the above ordering
> suggests?

No, it's not OK to transmit before the host sleep is cancelled.
I was under the assumption that no TX packet is from kernel during the time we cancel host sleep. This assumption can be wrong without netif_device_detach, etc.

>
> In this particular case, as you can see in the dynamic debug logs, it
> looks like the whole situation is confused here due to the arrival of
> an interrupt which causes mwifiex to take an early exit from the host
> sleep.

The driver has the code to work around this case. It is buggy however because it never gets verified with my own platforms.

>
> http://dev.laptop.org/~dsd/20130425/mwifiex-earlytx.log

Could you try resetting is_suspend flag when an early RX is received?

+++ b/drivers/net/wireless/mwifiex/cmdevt.c
@@ -1191,6 +1191,7 @@ mwifiex_process_hs_config(struct mwifiex_adapter *adapter)
adapter->if_ops.wakeup(adapter);
adapter->hs_activated = false;
adapter->is_hs_configured = false;
+ adapter->is_suspended = false;
mwifiex_hs_activated_event(mwifiex_get_priv(adapter,
MWIFIEX_BSS_ROLE_ANY),
false);

> No, that message doesn't appear.
> I can't quite grasp what you're trying to do in that patch. Where does
> mwifiex even set MMC_PM_WAKE_SDIO_IRQ?

I was expecting sdhci driver to set MMC_PM_WAKE_SDIO_IRQ bit.

> But the logic in the patch suggests that you would expect the
> WAKE_SDIO_IRQ flag to be set before sdio_release_irq() is called,
> triggering the "KEEP IRQ ON" message that you're looking for.

That's correct. I'm hoping that this MMC change generates "resume handler first, RX interrupt next" sequence.

Thanks,
Bing

>
> Thanks
> Daniel

2013-04-18 19:23:36

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex frequent "not allowed while suspended" crash on resume

Hi Daniel,

> > Normally TX is blocked until resume handler is called and host sleep handshake between driver and
> firmware is done.
>
> Where is the code that makes such synchronization happen?

If a data packet is received unexpectedly during suspended, driver blocks it and displays a message "not allowed while suspended".

Earlier you mentioned that doing netif attach/detach in suspend/resume would work around this problem.
It reminders me that previously we set carrier off in suspend so that kernel won't send data packets to driver. That carrier-off code was removed in recent commit 1499d9f to fix a missing ping response issue right after resume. Perhaps it's worth a revisit to this change.

>
> > Does the TX happen before "hs_deactivated" event?
>
> hard_start_xmit is called approximately 1ms after hs_deactivated arrives.

When hs_deactivated event arrives, 'is_suspended' flag must have been cleared already. The tx packets should be allowed to send freely. I don't know why the message "not allowed while suspended" is fired in this case.
There could be a race condition somewhere though. Could you please send a complete dynamic_debug enabled log showing all the sequence?

>
> > I'm thinking of releasing IRQ in driver suspend handler and re-claim the IRQ in resume. Of course
> it needs some changes in sdio_release_irq to skip SDIO_CCCR_IENx disabling, otherwise host may not be
> woken up by SDIO DAT1.
> >
> > Attached is the sample code, not tested.
>
> That patch breaks wake-on-LAN, the suspended system simply doesn't
> respond to incoming packets.

Could you confirm that you saw this message on suspend?

"SDIO: KEEP IRQ ON for ..."

Thanks,
Bing

>
> Daniel