2019-07-23 02:50:50

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card

As talked about in the thread at:

http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com

...when the Marvell WiFi card tries to reset itself it kills
Bluetooth. It was observed that we could re-init the card properly by
unbinding / rebinding the host controller. It was also observed that
in the downstream Chrome OS codebase the solution used was
mmc_remove_host() / mmc_add_host(), which is similar to the solution
in this series.

So far I've only done testing of this series using the reset test
source that can be simulated via sysfs. Specifically I ran this test:

for i in $(seq 1000); do
echo "LOOP $i --------"
echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset

while true; do
if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
fail=$(( fail + 1 ))
echo "Fail WiFi ${fail}"
if [[ ${fail} == 3 ]]; then
exit 1
fi
else
fail=0
break
fi
done

hciconfig hci0 down
sleep 1
if ! hciconfig hci0 up; then
echo "Fail BT"
exit 1
fi
done

I ran this several times and got several hundred iterations each
before a failure. When I saw failures:

* Once I saw a "Fail BT"; manually resetting the card again fixed it.
I didn't give it time to see if it would have detected this
automatically.
* Once I saw the ping fail because (for some reason) my device only
got an IPv6 address from my router and the IPv4 ping failed. I
changed my script to use 'ping6' to see if that would help.
* Once I saw the ping fail because the higher level network stack
("shill" in my case) seemed to crash. A few minutes later the
system recovered itself automatically. https://crbug.com/984593 if
you want more details.
* Sometimes while I was testing I saw "Fail WiFi 1" indicating a
transitory failure. Usually this was an association failure, but in
one case I saw the device do "Firmware wakeup failed" after I
triggered the reset. This caused the driver to trigger a re-reset
of itself which eventually recovered things. This was good because
it was an actual test of the normal reset flow (not the one
triggered via sysfs).

Changes in v2:
- s/routnine/routine (Brian Norris, Matthias Kaehlcke).
- s/contining/containing (Matthias Kaehlcke).
- Add Matthias Reviewed-by tag.
- Removed clear_bit() calls and old comment (Brian Norris).
- Explicit CC of Andreas Fenkart.
- Explicit CC of Brian Norris.
- Add "Fixes" pointing at the commit Brian talked about.
- Add Brian's Reviewed-by tag.

Douglas Anderson (2):
mmc: core: Add sdio_trigger_replug() API
mwifiex: Make use of the new sdio_trigger_replug() API to reset

drivers/mmc/core/core.c | 28 +++++++++++++++++++--
drivers/mmc/core/sdio_io.c | 20 +++++++++++++++
drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
include/linux/mmc/host.h | 15 ++++++++++-
include/linux/mmc/sdio_func.h | 2 ++
5 files changed, 63 insertions(+), 18 deletions(-)

--
2.22.0.657.g960e92d24f-goog


2019-07-23 02:52:34

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 2/2] mwifiex: Make use of the new sdio_trigger_replug() API to reset

As described in the patch ("mmc: core: Add sdio_trigger_replug()
API"), the current mwifiex_sdio_card_reset() is broken in the cases
where we're running Bluetooth on a second SDIO func on the same card
as WiFi. The problem goes away if we just use the
sdio_trigger_replug() API call.

NOTE: Even though with this new solution there is less of a reason to
do our work from a workqueue (the unplug / plug mechanism we're using
is possible for a human to perform at any time so the stack is
supposed to handle it without it needing to be called from a special
context), we still need a workqueue because the Marvell reset function
could called from a context where sleeping is invalid and thus we
can't claim the host. One example is Marvell's wakeup_timer_fn().

Cc: Andreas Fenkart <[email protected]>
Cc: Brian Norris <[email protected]>
Fixes: b4336a282db8 ("mwifiex: sdio: reset adapter using mmc_hw_reset")
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---

Changes in v2:
- Removed clear_bit() calls and old comment (Brian Norris).
- Explicit CC of Andreas Fenkart.
- Explicit CC of Brian Norris.
- Add "Fixes" pointing at the commit Brian talked about.
- Add Brian's Reviewed-by tag.

drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
index 24c041dad9f6..7ec5068f6ffd 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -2218,24 +2218,10 @@ static void mwifiex_sdio_card_reset_work(struct mwifiex_adapter *adapter)
{
struct sdio_mmc_card *card = adapter->card;
struct sdio_func *func = card->func;
- int ret;
-
- mwifiex_shutdown_sw(adapter);

- /* power cycle the adapter */
sdio_claim_host(func);
- mmc_hw_reset(func->card->host);
+ sdio_trigger_replug(func);
sdio_release_host(func);
-
- /* Previous save_adapter won't be valid after this. We will cancel
- * pending work requests.
- */
- clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags);
- clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags);
-
- ret = mwifiex_reinit_sw(adapter);
- if (ret)
- dev_err(&func->dev, "reinit failed: %d\n", ret);
}

/* This function read/write firmware */
--
2.22.0.657.g960e92d24f-goog

2019-07-23 02:52:39

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 1/2] mmc: core: Add sdio_trigger_replug() API

When using Marvell WiFi SDIO cards, it is not uncommon for Linux WiFi
driver to fully lose the communication channel to the firmware running
on the card. Presumably the firmware on the card has a bug or two in
it and occasionally crashes.

The Marvell WiFi driver attempts to recover from this problem.
Specifically the driver has the function mwifiex_sdio_card_reset()
which is called when communcation problems are found. That function
attempts to reset the state of things by utilizing the mmc_hw_reset()
function.

The current solution is a bit complex because the Marvell WiFi driver
needs to manually deinit and reinit the WiFi driver around the reset
call. This means it's going through a bunch of code paths that aren't
normally tested. However, complexity isn't our only problem. The
other (bigger) problem is that Marvell WiFi cards are often combo
WiFi/Bluetooth cards and Bluetooth runs on a second SDIO func. While
the WiFi driver knows that it should re-init its own state around the
mmc_hw_reset() call there is no good way to inform the Bluetooth
driver. That means that in Linux today when you reset the Marvell
WiFi driver you lose all Bluetooth communication. Doh!

One way to fix the above problems is to leverage a more standard way
to reset the Marvell WiFi card where we go through the same code paths
as card unplug and the card plug. In this patch we introduce a new
API call for doing just that: sdio_trigger_replug(). This API call
will trigger an unplug of the SDIO card followed by a plug of the
card. As part of this the card will be nicely reset.

Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Matthias Kaehlcke <[email protected]>
---

Changes in v2:
- s/routnine/routine (Brian Norris, Matthias Kaehlcke).
- s/contining/containing (Matthias Kaehlcke).
- Add Matthias Reviewed-by tag.

drivers/mmc/core/core.c | 28 ++++++++++++++++++++++++++--
drivers/mmc/core/sdio_io.c | 20 ++++++++++++++++++++
include/linux/mmc/host.h | 15 ++++++++++++++-
include/linux/mmc/sdio_func.h | 2 ++
4 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 221127324709..5da365b1fdb4 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2161,6 +2161,12 @@ int mmc_sw_reset(struct mmc_host *host)
}
EXPORT_SYMBOL(mmc_sw_reset);

+void mmc_trigger_replug(struct mmc_host *host)
+{
+ host->trigger_replug_state = MMC_REPLUG_STATE_UNPLUG;
+ _mmc_detect_change(host, 0, false);
+}
+
static int mmc_rescan_try_freq(struct mmc_host *host, unsigned freq)
{
host->f_init = freq;
@@ -2214,6 +2220,11 @@ int _mmc_detect_card_removed(struct mmc_host *host)
if (!host->card || mmc_card_removed(host->card))
return 1;

+ if (host->trigger_replug_state == MMC_REPLUG_STATE_UNPLUG) {
+ mmc_card_set_removed(host->card);
+ return 1;
+ }
+
ret = host->bus_ops->alive(host);

/*
@@ -2326,8 +2337,21 @@ void mmc_rescan(struct work_struct *work)
mmc_bus_put(host);

mmc_claim_host(host);
- if (mmc_card_is_removable(host) && host->ops->get_cd &&
- host->ops->get_cd(host) == 0) {
+
+ /*
+ * Move through the state machine if we're triggering an unplug
+ * followed by a re-plug.
+ */
+ if (host->trigger_replug_state == MMC_REPLUG_STATE_UNPLUG) {
+ host->trigger_replug_state = MMC_REPLUG_STATE_PLUG;
+ _mmc_detect_change(host, 0, false);
+ } else if (host->trigger_replug_state == MMC_REPLUG_STATE_PLUG) {
+ host->trigger_replug_state = MMC_REPLUG_STATE_NONE;
+ }
+
+ if (host->trigger_replug_state == MMC_REPLUG_STATE_PLUG ||
+ (mmc_card_is_removable(host) && host->ops->get_cd &&
+ host->ops->get_cd(host) == 0)) {
mmc_power_off(host);
mmc_release_host(host);
goto out;
diff --git a/drivers/mmc/core/sdio_io.c b/drivers/mmc/core/sdio_io.c
index 2ba00acf64e6..9b96267ac855 100644
--- a/drivers/mmc/core/sdio_io.c
+++ b/drivers/mmc/core/sdio_io.c
@@ -811,3 +811,23 @@ void sdio_retune_release(struct sdio_func *func)
mmc_retune_release(func->card->host);
}
EXPORT_SYMBOL_GPL(sdio_retune_release);
+
+/**
+ * sdio_trigger_replug - trigger an "unplug" + "plug" of the card
+ * @func: SDIO function attached to host
+ *
+ * When you call this function we will schedule events that will
+ * make it look like the card containing the given SDIO func was
+ * unplugged and then re-plugged-in. This is as close as possible
+ * to a full reset of the card that can be achieved.
+ *
+ * NOTE: routine will temporarily make the card look as if it is
+ * removable even if it is marked non-removable.
+ *
+ * This function should be called while the host is claimed.
+ */
+void sdio_trigger_replug(struct sdio_func *func)
+{
+ mmc_trigger_replug(func->card->host);
+}
+EXPORT_SYMBOL(sdio_trigger_replug);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 4a351cb7f20f..40f21b3e6aaf 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -407,6 +407,12 @@ struct mmc_host {

bool trigger_card_event; /* card_event necessary */

+ /* state machine for triggering unplug/replug */
+#define MMC_REPLUG_STATE_NONE 0 /* not doing unplug/replug */
+#define MMC_REPLUG_STATE_UNPLUG 1 /* do unplug next */
+#define MMC_REPLUG_STATE_PLUG 2 /* do plug next */
+ u8 trigger_replug_state;
+
struct mmc_card *card; /* device attached to this host */

wait_queue_head_t wq;
@@ -527,7 +533,12 @@ int mmc_regulator_get_supply(struct mmc_host *mmc);

static inline int mmc_card_is_removable(struct mmc_host *host)
{
- return !(host->caps & MMC_CAP_NONREMOVABLE);
+ /*
+ * A non-removable card briefly looks removable if code has forced
+ * a re-plug of the card.
+ */
+ return host->trigger_replug_state != MMC_REPLUG_STATE_NONE ||
+ !(host->caps & MMC_CAP_NONREMOVABLE);
}

static inline int mmc_card_keep_power(struct mmc_host *host)
@@ -580,4 +591,6 @@ static inline enum dma_data_direction mmc_get_dma_dir(struct mmc_data *data)
int mmc_send_tuning(struct mmc_host *host, u32 opcode, int *cmd_error);
int mmc_abort_tuning(struct mmc_host *host, u32 opcode);

+void mmc_trigger_replug(struct mmc_host *host);
+
#endif /* LINUX_MMC_HOST_H */
diff --git a/include/linux/mmc/sdio_func.h b/include/linux/mmc/sdio_func.h
index 5a177f7a83c3..0d6c73768ae3 100644
--- a/include/linux/mmc/sdio_func.h
+++ b/include/linux/mmc/sdio_func.h
@@ -173,4 +173,6 @@ extern void sdio_retune_crc_enable(struct sdio_func *func);
extern void sdio_retune_hold_now(struct sdio_func *func);
extern void sdio_retune_release(struct sdio_func *func);

+extern void sdio_trigger_replug(struct sdio_func *func);
+
#endif /* LINUX_MMC_SDIO_FUNC_H */
--
2.22.0.657.g960e92d24f-goog

2019-07-24 20:23:34

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mwifiex: Make use of the new sdio_trigger_replug() API to reset

Hi,

On Wed, Jul 24, 2019 at 4:35 AM Kalle Valo <[email protected]> wrote:
>
> Douglas Anderson <[email protected]> wrote:
>
> > As described in the patch ("mmc: core: Add sdio_trigger_replug()
> > API"), the current mwifiex_sdio_card_reset() is broken in the cases
> > where we're running Bluetooth on a second SDIO func on the same card
> > as WiFi. The problem goes away if we just use the
> > sdio_trigger_replug() API call.
> >
> > NOTE: Even though with this new solution there is less of a reason to
> > do our work from a workqueue (the unplug / plug mechanism we're using
> > is possible for a human to perform at any time so the stack is
> > supposed to handle it without it needing to be called from a special
> > context), we still need a workqueue because the Marvell reset function
> > could called from a context where sleeping is invalid and thus we
> > can't claim the host. One example is Marvell's wakeup_timer_fn().
> >
> > Cc: Andreas Fenkart <[email protected]>
> > Cc: Brian Norris <[email protected]>
> > Fixes: b4336a282db8 ("mwifiex: sdio: reset adapter using mmc_hw_reset")
> > Signed-off-by: Douglas Anderson <[email protected]>
> > Reviewed-by: Brian Norris <[email protected]>
>
> I assume this is going via some other tree so I'm dropping this from my
> queue. If I should apply this please resend once the dependency is in
> wireless-drivers-next.
>
> Patch set to Not Applicable.

Thanks. For now I'll assume that Ulf will pick it up if/when he is
happy with patch #1 in this series. Would you be willing to provide
your Ack on this patch to make it clear to Ulf you're OK with that?

-Doug

2019-07-25 06:20:25

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mwifiex: Make use of the new sdio_trigger_replug() API to reset

Doug Anderson <[email protected]> writes:

> Hi,
>
> On Wed, Jul 24, 2019 at 4:35 AM Kalle Valo <[email protected]> wrote:
>>
>> Douglas Anderson <[email protected]> wrote:
>>
>> > As described in the patch ("mmc: core: Add sdio_trigger_replug()
>> > API"), the current mwifiex_sdio_card_reset() is broken in the cases
>> > where we're running Bluetooth on a second SDIO func on the same card
>> > as WiFi. The problem goes away if we just use the
>> > sdio_trigger_replug() API call.
>> >
>> > NOTE: Even though with this new solution there is less of a reason to
>> > do our work from a workqueue (the unplug / plug mechanism we're using
>> > is possible for a human to perform at any time so the stack is
>> > supposed to handle it without it needing to be called from a special
>> > context), we still need a workqueue because the Marvell reset function
>> > could called from a context where sleeping is invalid and thus we
>> > can't claim the host. One example is Marvell's wakeup_timer_fn().
>> >
>> > Cc: Andreas Fenkart <[email protected]>
>> > Cc: Brian Norris <[email protected]>
>> > Fixes: b4336a282db8 ("mwifiex: sdio: reset adapter using mmc_hw_reset")
>> > Signed-off-by: Douglas Anderson <[email protected]>
>> > Reviewed-by: Brian Norris <[email protected]>
>>
>> I assume this is going via some other tree so I'm dropping this from my
>> queue. If I should apply this please resend once the dependency is in
>> wireless-drivers-next.
>>
>> Patch set to Not Applicable.
>
> Thanks. For now I'll assume that Ulf will pick it up if/when he is
> happy with patch #1 in this series. Would you be willing to provide
> your Ack on this patch to make it clear to Ulf you're OK with that?

Sure, I was planning to do that already in my previous email but forgot.

Acked-by: Kalle Valo <[email protected]>

--
Kalle Valo

2019-07-30 11:09:51

by Andreas Fenkart

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card

Hi Douglas,

Am Mo., 22. Juli 2019 um 21:41 Uhr schrieb Douglas Anderson
<[email protected]>:
>
> As talked about in the thread at:
>
> http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
>
> ...when the Marvell WiFi card tries to reset itself it kills
> Bluetooth. It was observed that we could re-init the card properly by
> unbinding / rebinding the host controller. It was also observed that
> in the downstream Chrome OS codebase the solution used was
> mmc_remove_host() / mmc_add_host(), which is similar to the solution
> in this series.
>
> So far I've only done testing of this series using the reset test
> source that can be simulated via sysfs. Specifically I ran this test:
>
> for i in $(seq 1000); do
> echo "LOOP $i --------"
> echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
>
> while true; do
> if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> fail=$(( fail + 1 ))
> echo "Fail WiFi ${fail}"
> if [[ ${fail} == 3 ]]; then
> exit 1
> fi
> else
> fail=0
> break
> fi
> done
>
> hciconfig hci0 down
> sleep 1
> if ! hciconfig hci0 up; then
> echo "Fail BT"
> exit 1
> fi
> done
>
> I ran this several times and got several hundred iterations each
> before a failure. When I saw failures:
>
> * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> I didn't give it time to see if it would have detected this
> automatically.
> * Once I saw the ping fail because (for some reason) my device only
> got an IPv6 address from my router and the IPv4 ping failed. I
> changed my script to use 'ping6' to see if that would help.
> * Once I saw the ping fail because the higher level network stack
> ("shill" in my case) seemed to crash. A few minutes later the
> system recovered itself automatically. https://crbug.com/984593 if
> you want more details.
> * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> transitory failure. Usually this was an association failure, but in
> one case I saw the device do "Firmware wakeup failed" after I
> triggered the reset. This caused the driver to trigger a re-reset
> of itself which eventually recovered things. This was good because
> it was an actual test of the normal reset flow (not the one
> triggered via sysfs).

This error triggers something. I remember that when I was working on
suspend-to-ram feature, we had problems to wake up the firmware
reliable. I found this patch in one of my old 3.13 tree

the missing bit -- ugly hack to force cmd52 before cmd53.
---
drivers/mmc/host/omap_hsmmc.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
index fb24a006080f..710d0bdf39e5 100644
--- a/drivers/mmc/host/omap_hsmmc.c
+++ b/drivers/mmc/host/omap_hsmmc.c
@@ -2372,6 +2372,12 @@ static int omap_hsmmc_suspend(struct device *dev)
if (host->flags & HSMMC_SWAKEUP_QUIRK)
disable_irq(host->gpio_sdio_irq);

+ /*
+ * force a polling cycle after resume.
+ * will issue cmd52, not cmd53 straight away
+ */
+ omap_hsmmc_enable_sdio_irq(host->mmc, false);
+
if (host->dbclk)
clk_disable_unprepare(host->dbclk);

>
> Changes in v2:
> - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> - s/contining/containing (Matthias Kaehlcke).
> - Add Matthias Reviewed-by tag.
> - Removed clear_bit() calls and old comment (Brian Norris).
> - Explicit CC of Andreas Fenkart.
> - Explicit CC of Brian Norris.
> - Add "Fixes" pointing at the commit Brian talked about.
> - Add Brian's Reviewed-by tag.
>
> Douglas Anderson (2):
> mmc: core: Add sdio_trigger_replug() API
> mwifiex: Make use of the new sdio_trigger_replug() API to reset
>
> drivers/mmc/core/core.c | 28 +++++++++++++++++++--
> drivers/mmc/core/sdio_io.c | 20 +++++++++++++++
> drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> include/linux/mmc/host.h | 15 ++++++++++-
> include/linux/mmc/sdio_func.h | 2 ++
> 5 files changed, 63 insertions(+), 18 deletions(-)
>
> --
> 2.22.0.657.g960e92d24f-goog
>

2019-07-30 17:00:59

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card

Hi,

On Tue, Jul 30, 2019 at 1:47 AM Andreas Fenkart <[email protected]> wrote:
>
> > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > transitory failure. Usually this was an association failure, but in
> > one case I saw the device do "Firmware wakeup failed" after I
> > triggered the reset. This caused the driver to trigger a re-reset
> > of itself which eventually recovered things. This was good because
> > it was an actual test of the normal reset flow (not the one
> > triggered via sysfs).
>
> This error triggers something. I remember that when I was working on
> suspend-to-ram feature, we had problems to wake up the firmware
> reliable. I found this patch in one of my old 3.13 tree
>
> the missing bit -- ugly hack to force cmd52 before cmd53.

Thanks for the reference! At the moment I'm not terribly worried
about this particular failure case (compared to other failure modes)
because it's rare and it self-heals.

...my best guess, though, is that the problem isn't exactly the same.
The "Firmware wakeup failed" is a pretty generic error message, kind
of like "something went wrong" and not all instances of this message
will have the same root cause.

I actually dealt with a few suspend/resume issues around mwifiex
recently though. If you ever uprev, you might be interested in:

b82d6c1f8f82 mwifiex: Make resume actually do something useful again
on SDIO cards
83293386bc95 mmc: core: Prevent processing SDIO IRQs when the card is suspended

-Doug

2019-09-11 22:30:28

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card

Hi,

On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <[email protected]> wrote:
>
> On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <[email protected]> wrote:
> >
> > As talked about in the thread at:
> >
> > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> >
> > ...when the Marvell WiFi card tries to reset itself it kills
> > Bluetooth. It was observed that we could re-init the card properly by
> > unbinding / rebinding the host controller. It was also observed that
> > in the downstream Chrome OS codebase the solution used was
> > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > in this series.
> >
> > So far I've only done testing of this series using the reset test
> > source that can be simulated via sysfs. Specifically I ran this test:
> >
> > for i in $(seq 1000); do
> > echo "LOOP $i --------"
> > echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> >
> > while true; do
> > if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> > fail=$(( fail + 1 ))
> > echo "Fail WiFi ${fail}"
> > if [[ ${fail} == 3 ]]; then
> > exit 1
> > fi
> > else
> > fail=0
> > break
> > fi
> > done
> >
> > hciconfig hci0 down
> > sleep 1
> > if ! hciconfig hci0 up; then
> > echo "Fail BT"
> > exit 1
> > fi
> > done
> >
> > I ran this several times and got several hundred iterations each
> > before a failure. When I saw failures:
> >
> > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> > I didn't give it time to see if it would have detected this
> > automatically.
> > * Once I saw the ping fail because (for some reason) my device only
> > got an IPv6 address from my router and the IPv4 ping failed. I
> > changed my script to use 'ping6' to see if that would help.
> > * Once I saw the ping fail because the higher level network stack
> > ("shill" in my case) seemed to crash. A few minutes later the
> > system recovered itself automatically. https://crbug.com/984593 if
> > you want more details.
> > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > transitory failure. Usually this was an association failure, but in
> > one case I saw the device do "Firmware wakeup failed" after I
> > triggered the reset. This caused the driver to trigger a re-reset
> > of itself which eventually recovered things. This was good because
> > it was an actual test of the normal reset flow (not the one
> > triggered via sysfs).
> >
> > Changes in v2:
> > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > - s/contining/containing (Matthias Kaehlcke).
> > - Add Matthias Reviewed-by tag.
> > - Removed clear_bit() calls and old comment (Brian Norris).
> > - Explicit CC of Andreas Fenkart.
> > - Explicit CC of Brian Norris.
> > - Add "Fixes" pointing at the commit Brian talked about.
> > - Add Brian's Reviewed-by tag.
> >
> > Douglas Anderson (2):
> > mmc: core: Add sdio_trigger_replug() API
> > mwifiex: Make use of the new sdio_trigger_replug() API to reset
> >
> > drivers/mmc/core/core.c | 28 +++++++++++++++++++--
> > drivers/mmc/core/sdio_io.c | 20 +++++++++++++++
> > drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> > include/linux/mmc/host.h | 15 ++++++++++-
> > include/linux/mmc/sdio_func.h | 2 ++
> > 5 files changed, 63 insertions(+), 18 deletions(-)
> >
>
> Doug, thanks for sending this!
>
> As you know, I have been working on additional changes for SDIO
> suspend/resume (still WIP and not ready for sharing) and this series
> is related.
>
> The thing is, that even during system suspend/resume, synchronizations
> are needed between the different layers (mmc host, mmc core and
> sdio-funcs), which is common to the problem you want to solve.
>
> That said, I need to scratch my head a bit more before I can provide
> you some feedback on $subject series. Moreover, it's vacation period
> at my side so things are moving a bit slower. Please be patient.

I had kinda forgotten about this series after we landed it locally in
Chrome OS, but I realized that it never got resolved upstream. Any
chance your head has been sufficiently scratched and you're now happy
with $subject series? ;-)

-Doug

2019-09-16 10:42:42

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card

On Wed, 11 Sep 2019 at 23:26, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <[email protected]> wrote:
> >
> > On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <[email protected]> wrote:
> > >
> > > As talked about in the thread at:
> > >
> > > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> > >
> > > ...when the Marvell WiFi card tries to reset itself it kills
> > > Bluetooth. It was observed that we could re-init the card properly by
> > > unbinding / rebinding the host controller. It was also observed that
> > > in the downstream Chrome OS codebase the solution used was
> > > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > > in this series.
> > >
> > > So far I've only done testing of this series using the reset test
> > > source that can be simulated via sysfs. Specifically I ran this test:
> > >
> > > for i in $(seq 1000); do
> > > echo "LOOP $i --------"
> > > echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> > >
> > > while true; do
> > > if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> > > fail=$(( fail + 1 ))
> > > echo "Fail WiFi ${fail}"
> > > if [[ ${fail} == 3 ]]; then
> > > exit 1
> > > fi
> > > else
> > > fail=0
> > > break
> > > fi
> > > done
> > >
> > > hciconfig hci0 down
> > > sleep 1
> > > if ! hciconfig hci0 up; then
> > > echo "Fail BT"
> > > exit 1
> > > fi
> > > done
> > >
> > > I ran this several times and got several hundred iterations each
> > > before a failure. When I saw failures:
> > >
> > > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> > > I didn't give it time to see if it would have detected this
> > > automatically.
> > > * Once I saw the ping fail because (for some reason) my device only
> > > got an IPv6 address from my router and the IPv4 ping failed. I
> > > changed my script to use 'ping6' to see if that would help.
> > > * Once I saw the ping fail because the higher level network stack
> > > ("shill" in my case) seemed to crash. A few minutes later the
> > > system recovered itself automatically. https://crbug.com/984593 if
> > > you want more details.
> > > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > > transitory failure. Usually this was an association failure, but in
> > > one case I saw the device do "Firmware wakeup failed" after I
> > > triggered the reset. This caused the driver to trigger a re-reset
> > > of itself which eventually recovered things. This was good because
> > > it was an actual test of the normal reset flow (not the one
> > > triggered via sysfs).
> > >
> > > Changes in v2:
> > > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > > - s/contining/containing (Matthias Kaehlcke).
> > > - Add Matthias Reviewed-by tag.
> > > - Removed clear_bit() calls and old comment (Brian Norris).
> > > - Explicit CC of Andreas Fenkart.
> > > - Explicit CC of Brian Norris.
> > > - Add "Fixes" pointing at the commit Brian talked about.
> > > - Add Brian's Reviewed-by tag.
> > >
> > > Douglas Anderson (2):
> > > mmc: core: Add sdio_trigger_replug() API
> > > mwifiex: Make use of the new sdio_trigger_replug() API to reset
> > >
> > > drivers/mmc/core/core.c | 28 +++++++++++++++++++--
> > > drivers/mmc/core/sdio_io.c | 20 +++++++++++++++
> > > drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> > > include/linux/mmc/host.h | 15 ++++++++++-
> > > include/linux/mmc/sdio_func.h | 2 ++
> > > 5 files changed, 63 insertions(+), 18 deletions(-)
> > >
> >
> > Doug, thanks for sending this!
> >
> > As you know, I have been working on additional changes for SDIO
> > suspend/resume (still WIP and not ready for sharing) and this series
> > is related.
> >
> > The thing is, that even during system suspend/resume, synchronizations
> > are needed between the different layers (mmc host, mmc core and
> > sdio-funcs), which is common to the problem you want to solve.
> >
> > That said, I need to scratch my head a bit more before I can provide
> > you some feedback on $subject series. Moreover, it's vacation period
> > at my side so things are moving a bit slower. Please be patient.
>
> I had kinda forgotten about this series after we landed it locally in
> Chrome OS, but I realized that it never got resolved upstream. Any
> chance your head has been sufficiently scratched and you're now happy
> with $subject series? ;-)

It's still on my TODO list. Apologize for the delay, but I still need
more time to look into it, possibly later this week.

In any case, let's make sure we get this problem resolved for v5.5.

Kind regards
Uffe

2019-10-07 23:42:54

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card

Hi,

On Mon, Sep 16, 2019 at 2:25 AM Ulf Hansson <[email protected]> wrote:
>
> On Wed, 11 Sep 2019 at 23:26, Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <[email protected]> wrote:
> > >
> > > On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <[email protected]> wrote:
> > > >
> > > > As talked about in the thread at:
> > > >
> > > > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> > > >
> > > > ...when the Marvell WiFi card tries to reset itself it kills
> > > > Bluetooth. It was observed that we could re-init the card properly by
> > > > unbinding / rebinding the host controller. It was also observed that
> > > > in the downstream Chrome OS codebase the solution used was
> > > > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > > > in this series.
> > > >
> > > > So far I've only done testing of this series using the reset test
> > > > source that can be simulated via sysfs. Specifically I ran this test:
> > > >
> > > > for i in $(seq 1000); do
> > > > echo "LOOP $i --------"
> > > > echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> > > >
> > > > while true; do
> > > > if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> > > > fail=$(( fail + 1 ))
> > > > echo "Fail WiFi ${fail}"
> > > > if [[ ${fail} == 3 ]]; then
> > > > exit 1
> > > > fi
> > > > else
> > > > fail=0
> > > > break
> > > > fi
> > > > done
> > > >
> > > > hciconfig hci0 down
> > > > sleep 1
> > > > if ! hciconfig hci0 up; then
> > > > echo "Fail BT"
> > > > exit 1
> > > > fi
> > > > done
> > > >
> > > > I ran this several times and got several hundred iterations each
> > > > before a failure. When I saw failures:
> > > >
> > > > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> > > > I didn't give it time to see if it would have detected this
> > > > automatically.
> > > > * Once I saw the ping fail because (for some reason) my device only
> > > > got an IPv6 address from my router and the IPv4 ping failed. I
> > > > changed my script to use 'ping6' to see if that would help.
> > > > * Once I saw the ping fail because the higher level network stack
> > > > ("shill" in my case) seemed to crash. A few minutes later the
> > > > system recovered itself automatically. https://crbug.com/984593 if
> > > > you want more details.
> > > > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > > > transitory failure. Usually this was an association failure, but in
> > > > one case I saw the device do "Firmware wakeup failed" after I
> > > > triggered the reset. This caused the driver to trigger a re-reset
> > > > of itself which eventually recovered things. This was good because
> > > > it was an actual test of the normal reset flow (not the one
> > > > triggered via sysfs).
> > > >
> > > > Changes in v2:
> > > > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > > > - s/contining/containing (Matthias Kaehlcke).
> > > > - Add Matthias Reviewed-by tag.
> > > > - Removed clear_bit() calls and old comment (Brian Norris).
> > > > - Explicit CC of Andreas Fenkart.
> > > > - Explicit CC of Brian Norris.
> > > > - Add "Fixes" pointing at the commit Brian talked about.
> > > > - Add Brian's Reviewed-by tag.
> > > >
> > > > Douglas Anderson (2):
> > > > mmc: core: Add sdio_trigger_replug() API
> > > > mwifiex: Make use of the new sdio_trigger_replug() API to reset
> > > >
> > > > drivers/mmc/core/core.c | 28 +++++++++++++++++++--
> > > > drivers/mmc/core/sdio_io.c | 20 +++++++++++++++
> > > > drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> > > > include/linux/mmc/host.h | 15 ++++++++++-
> > > > include/linux/mmc/sdio_func.h | 2 ++
> > > > 5 files changed, 63 insertions(+), 18 deletions(-)
> > > >
> > >
> > > Doug, thanks for sending this!
> > >
> > > As you know, I have been working on additional changes for SDIO
> > > suspend/resume (still WIP and not ready for sharing) and this series
> > > is related.
> > >
> > > The thing is, that even during system suspend/resume, synchronizations
> > > are needed between the different layers (mmc host, mmc core and
> > > sdio-funcs), which is common to the problem you want to solve.
> > >
> > > That said, I need to scratch my head a bit more before I can provide
> > > you some feedback on $subject series. Moreover, it's vacation period
> > > at my side so things are moving a bit slower. Please be patient.
> >
> > I had kinda forgotten about this series after we landed it locally in
> > Chrome OS, but I realized that it never got resolved upstream. Any
> > chance your head has been sufficiently scratched and you're now happy
> > with $subject series? ;-)
>
> It's still on my TODO list. Apologize for the delay, but I still need
> more time to look into it, possibly later this week.
>
> In any case, let's make sure we get this problem resolved for v5.5.

Hi Ulf. It's your friendly pest, Doug, here to ask how things are going. :-P


-Doug

2019-10-08 11:51:13

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card

On Tue, 8 Oct 2019 at 01:39, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Mon, Sep 16, 2019 at 2:25 AM Ulf Hansson <[email protected]> wrote:
> >
> > On Wed, 11 Sep 2019 at 23:26, Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <[email protected]> wrote:
> > > > >
> > > > > As talked about in the thread at:
> > > > >
> > > > > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> > > > >
> > > > > ...when the Marvell WiFi card tries to reset itself it kills
> > > > > Bluetooth. It was observed that we could re-init the card properly by
> > > > > unbinding / rebinding the host controller. It was also observed that
> > > > > in the downstream Chrome OS codebase the solution used was
> > > > > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > > > > in this series.
> > > > >
> > > > > So far I've only done testing of this series using the reset test
> > > > > source that can be simulated via sysfs. Specifically I ran this test:
> > > > >
> > > > > for i in $(seq 1000); do
> > > > > echo "LOOP $i --------"
> > > > > echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> > > > >
> > > > > while true; do
> > > > > if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> > > > > fail=$(( fail + 1 ))
> > > > > echo "Fail WiFi ${fail}"
> > > > > if [[ ${fail} == 3 ]]; then
> > > > > exit 1
> > > > > fi
> > > > > else
> > > > > fail=0
> > > > > break
> > > > > fi
> > > > > done
> > > > >
> > > > > hciconfig hci0 down
> > > > > sleep 1
> > > > > if ! hciconfig hci0 up; then
> > > > > echo "Fail BT"
> > > > > exit 1
> > > > > fi
> > > > > done
> > > > >
> > > > > I ran this several times and got several hundred iterations each
> > > > > before a failure. When I saw failures:
> > > > >
> > > > > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> > > > > I didn't give it time to see if it would have detected this
> > > > > automatically.
> > > > > * Once I saw the ping fail because (for some reason) my device only
> > > > > got an IPv6 address from my router and the IPv4 ping failed. I
> > > > > changed my script to use 'ping6' to see if that would help.
> > > > > * Once I saw the ping fail because the higher level network stack
> > > > > ("shill" in my case) seemed to crash. A few minutes later the
> > > > > system recovered itself automatically. https://crbug.com/984593 if
> > > > > you want more details.
> > > > > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > > > > transitory failure. Usually this was an association failure, but in
> > > > > one case I saw the device do "Firmware wakeup failed" after I
> > > > > triggered the reset. This caused the driver to trigger a re-reset
> > > > > of itself which eventually recovered things. This was good because
> > > > > it was an actual test of the normal reset flow (not the one
> > > > > triggered via sysfs).
> > > > >
> > > > > Changes in v2:
> > > > > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > > > > - s/contining/containing (Matthias Kaehlcke).
> > > > > - Add Matthias Reviewed-by tag.
> > > > > - Removed clear_bit() calls and old comment (Brian Norris).
> > > > > - Explicit CC of Andreas Fenkart.
> > > > > - Explicit CC of Brian Norris.
> > > > > - Add "Fixes" pointing at the commit Brian talked about.
> > > > > - Add Brian's Reviewed-by tag.
> > > > >
> > > > > Douglas Anderson (2):
> > > > > mmc: core: Add sdio_trigger_replug() API
> > > > > mwifiex: Make use of the new sdio_trigger_replug() API to reset
> > > > >
> > > > > drivers/mmc/core/core.c | 28 +++++++++++++++++++--
> > > > > drivers/mmc/core/sdio_io.c | 20 +++++++++++++++
> > > > > drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> > > > > include/linux/mmc/host.h | 15 ++++++++++-
> > > > > include/linux/mmc/sdio_func.h | 2 ++
> > > > > 5 files changed, 63 insertions(+), 18 deletions(-)
> > > > >
> > > >
> > > > Doug, thanks for sending this!
> > > >
> > > > As you know, I have been working on additional changes for SDIO
> > > > suspend/resume (still WIP and not ready for sharing) and this series
> > > > is related.
> > > >
> > > > The thing is, that even during system suspend/resume, synchronizations
> > > > are needed between the different layers (mmc host, mmc core and
> > > > sdio-funcs), which is common to the problem you want to solve.
> > > >
> > > > That said, I need to scratch my head a bit more before I can provide
> > > > you some feedback on $subject series. Moreover, it's vacation period
> > > > at my side so things are moving a bit slower. Please be patient.
> > >
> > > I had kinda forgotten about this series after we landed it locally in
> > > Chrome OS, but I realized that it never got resolved upstream. Any
> > > chance your head has been sufficiently scratched and you're now happy
> > > with $subject series? ;-)
> >
> > It's still on my TODO list. Apologize for the delay, but I still need
> > more time to look into it, possibly later this week.
> >
> > In any case, let's make sure we get this problem resolved for v5.5.
>
> Hi Ulf. It's your friendly pest, Doug, here to ask how things are going. :-P

:-)

The series on the top of my "things to review" list. I will definitely
provide you with some feedback then next days or so.

Again, sorry for the delay!

Kind regards
Uffe

2019-10-10 14:13:57

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mmc: core: Add sdio_trigger_replug() API

On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <[email protected]> wrote:
>
> When using Marvell WiFi SDIO cards, it is not uncommon for Linux WiFi
> driver to fully lose the communication channel to the firmware running
> on the card. Presumably the firmware on the card has a bug or two in
> it and occasionally crashes.
>
> The Marvell WiFi driver attempts to recover from this problem.
> Specifically the driver has the function mwifiex_sdio_card_reset()
> which is called when communcation problems are found. That function
> attempts to reset the state of things by utilizing the mmc_hw_reset()
> function.
>
> The current solution is a bit complex because the Marvell WiFi driver
> needs to manually deinit and reinit the WiFi driver around the reset
> call. This means it's going through a bunch of code paths that aren't
> normally tested. However, complexity isn't our only problem. The
> other (bigger) problem is that Marvell WiFi cards are often combo
> WiFi/Bluetooth cards and Bluetooth runs on a second SDIO func. While
> the WiFi driver knows that it should re-init its own state around the
> mmc_hw_reset() call there is no good way to inform the Bluetooth
> driver. That means that in Linux today when you reset the Marvell
> WiFi driver you lose all Bluetooth communication. Doh!

Thanks for a nice description to the problem!

In principle it makes mmc_hw_reset() quite questionable to use for
SDIO func drivers, at all. However, let's consider that for later.

>
> One way to fix the above problems is to leverage a more standard way
> to reset the Marvell WiFi card where we go through the same code paths
> as card unplug and the card plug. In this patch we introduce a new
> API call for doing just that: sdio_trigger_replug(). This API call
> will trigger an unplug of the SDIO card followed by a plug of the
> card. As part of this the card will be nicely reset.

I have been thinking back and forth on this, exploring various
options, perhaps adding some callbacks that the core could invoke to
inform the SDIO func drivers of what is going on.

Although, in the end this boils done to complexity and I think your
approach is simply the most superior in regards to this. However, I
think there is a few things that we can do to even further simply your
approach, let me comment on the code below.

>
> Signed-off-by: Douglas Anderson <[email protected]>
> Reviewed-by: Matthias Kaehlcke <[email protected]>
> ---
>
> Changes in v2:
> - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> - s/contining/containing (Matthias Kaehlcke).
> - Add Matthias Reviewed-by tag.
>
> drivers/mmc/core/core.c | 28 ++++++++++++++++++++++++++--
> drivers/mmc/core/sdio_io.c | 20 ++++++++++++++++++++
> include/linux/mmc/host.h | 15 ++++++++++++++-
> include/linux/mmc/sdio_func.h | 2 ++
> 4 files changed, 62 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 221127324709..5da365b1fdb4 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2161,6 +2161,12 @@ int mmc_sw_reset(struct mmc_host *host)
> }
> EXPORT_SYMBOL(mmc_sw_reset);
>
> +void mmc_trigger_replug(struct mmc_host *host)
> +{
> + host->trigger_replug_state = MMC_REPLUG_STATE_UNPLUG;
> + _mmc_detect_change(host, 0, false);
> +}
> +
> static int mmc_rescan_try_freq(struct mmc_host *host, unsigned freq)
> {
> host->f_init = freq;
> @@ -2214,6 +2220,11 @@ int _mmc_detect_card_removed(struct mmc_host *host)
> if (!host->card || mmc_card_removed(host->card))
> return 1;
>
> + if (host->trigger_replug_state == MMC_REPLUG_STATE_UNPLUG) {
> + mmc_card_set_removed(host->card);
> + return 1;

Do you really need to set state of the card to "removed"?

If I understand correctly, what you need is to allow mmc_rescan() to
run a second time, in particular for non removable cards.

In that path, mmc_rescan should find the card being non-functional,
thus it should remove it and then try to re-initialize it again. Etc.

Do you want me to send a patch to show you what I mean!?

[...]

Kind regards
Uffe

2019-10-18 10:20:34

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mmc: core: Add sdio_trigger_replug() API

On Thu, 17 Oct 2019 at 02:22, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Thu, Oct 10, 2019 at 7:11 AM Ulf Hansson <[email protected]> wrote:
> >
> > On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <[email protected]> wrote:
> > >
> > > When using Marvell WiFi SDIO cards, it is not uncommon for Linux WiFi
> > > driver to fully lose the communication channel to the firmware running
> > > on the card. Presumably the firmware on the card has a bug or two in
> > > it and occasionally crashes.
> > >
> > > The Marvell WiFi driver attempts to recover from this problem.
> > > Specifically the driver has the function mwifiex_sdio_card_reset()
> > > which is called when communcation problems are found. That function
> > > attempts to reset the state of things by utilizing the mmc_hw_reset()
> > > function.
> > >
> > > The current solution is a bit complex because the Marvell WiFi driver
> > > needs to manually deinit and reinit the WiFi driver around the reset
> > > call. This means it's going through a bunch of code paths that aren't
> > > normally tested. However, complexity isn't our only problem. The
> > > other (bigger) problem is that Marvell WiFi cards are often combo
> > > WiFi/Bluetooth cards and Bluetooth runs on a second SDIO func. While
> > > the WiFi driver knows that it should re-init its own state around the
> > > mmc_hw_reset() call there is no good way to inform the Bluetooth
> > > driver. That means that in Linux today when you reset the Marvell
> > > WiFi driver you lose all Bluetooth communication. Doh!
> >
> > Thanks for a nice description to the problem!
> >
> > In principle it makes mmc_hw_reset() quite questionable to use for
> > SDIO func drivers, at all. However, let's consider that for later.
>
> Yeah, unless you somehow knew that your card would only have one function.
>
>
> > > One way to fix the above problems is to leverage a more standard way
> > > to reset the Marvell WiFi card where we go through the same code paths
> > > as card unplug and the card plug. In this patch we introduce a new
> > > API call for doing just that: sdio_trigger_replug(). This API call
> > > will trigger an unplug of the SDIO card followed by a plug of the
> > > card. As part of this the card will be nicely reset.
> >
> > I have been thinking back and forth on this, exploring various
> > options, perhaps adding some callbacks that the core could invoke to
> > inform the SDIO func drivers of what is going on.
> >
> > Although, in the end this boils done to complexity and I think your
> > approach is simply the most superior in regards to this. However, I
> > think there is a few things that we can do to even further simply your
> > approach, let me comment on the code below.
>
> Right. Unplugging / re-plugging is sorta gross / inelegant, but it is
> definitely simpler and nice that it doesn't add so many new code
> paths. For cases where you're just trying to re-init things with a
> hammer it works pretty well.
>
>
> > > Signed-off-by: Douglas Anderson <[email protected]>
> > > Reviewed-by: Matthias Kaehlcke <[email protected]>
> > > ---
> > >
> > > Changes in v2:
> > > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > > - s/contining/containing (Matthias Kaehlcke).
> > > - Add Matthias Reviewed-by tag.
> > >
> > > drivers/mmc/core/core.c | 28 ++++++++++++++++++++++++++--
> > > drivers/mmc/core/sdio_io.c | 20 ++++++++++++++++++++
> > > include/linux/mmc/host.h | 15 ++++++++++++++-
> > > include/linux/mmc/sdio_func.h | 2 ++
> > > 4 files changed, 62 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > index 221127324709..5da365b1fdb4 100644
> > > --- a/drivers/mmc/core/core.c
> > > +++ b/drivers/mmc/core/core.c
> > > @@ -2161,6 +2161,12 @@ int mmc_sw_reset(struct mmc_host *host)
> > > }
> > > EXPORT_SYMBOL(mmc_sw_reset);
> > >
> > > +void mmc_trigger_replug(struct mmc_host *host)
> > > +{
> > > + host->trigger_replug_state = MMC_REPLUG_STATE_UNPLUG;
> > > + _mmc_detect_change(host, 0, false);
> > > +}
> > > +
> > > static int mmc_rescan_try_freq(struct mmc_host *host, unsigned freq)
> > > {
> > > host->f_init = freq;
> > > @@ -2214,6 +2220,11 @@ int _mmc_detect_card_removed(struct mmc_host *host)
> > > if (!host->card || mmc_card_removed(host->card))
> > > return 1;
> > >
> > > + if (host->trigger_replug_state == MMC_REPLUG_STATE_UNPLUG) {
> > > + mmc_card_set_removed(host->card);
> > > + return 1;
> >
> > Do you really need to set state of the card to "removed"?
> >
> > If I understand correctly, what you need is to allow mmc_rescan() to
> > run a second time, in particular for non removable cards.
> >
> > In that path, mmc_rescan should find the card being non-functional,
> > thus it should remove it and then try to re-initialize it again. Etc.
> >
> > Do you want me to send a patch to show you what I mean!?
>
> If you don't mind, that would probably be easiest. I've totally
> swapped out all of the implementation details of this from my brain
> now, but if I saw a patch from you it would be easy for me to analyze
> it and test it.

Alright, I think I owe you that because of my slow review pase. :-)

Patches are coming soon!

Kind regards
Uffe