2020-04-10 21:33:31

by Martin Blumenstingl

[permalink] [raw]
Subject: [PATCH 1/1] mmc: meson-mx-sdio: Set MMC_CAP_WAIT_WHILE_BUSY

The Meson SDIO controller uses the DAT0 lane for hardware busy
detection. Set MMC_CAP_WAIT_WHILE_BUSY accordingly. This fixes
the following error observed with Linux 5.7 (pre-rc-1):
mmc1: Card stuck being busy! __mmc_poll_for_busy
blk_update_request: I/O error, dev mmcblk1, sector 17111080 op
0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0

Signed-off-by: Martin Blumenstingl <[email protected]>
---
I'm sending this as RFC because I'm not sure if this is a proper fix.
It "fixes" the issue for me but I want the MMC maintainers to double-
check this.


drivers/mmc/host/meson-mx-sdio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/host/meson-mx-sdio.c b/drivers/mmc/host/meson-mx-sdio.c
index 8b038e7b2cd3..fe02130237a8 100644
--- a/drivers/mmc/host/meson-mx-sdio.c
+++ b/drivers/mmc/host/meson-mx-sdio.c
@@ -570,7 +570,7 @@ static int meson_mx_mmc_add_host(struct meson_mx_mmc_host *host)
mmc->f_max = clk_round_rate(host->cfg_div_clk,
clk_get_rate(host->parent_clk));

- mmc->caps |= MMC_CAP_ERASE | MMC_CAP_CMD23;
+ mmc->caps |= MMC_CAP_ERASE | MMC_CAP_CMD23 | MMC_CAP_WAIT_WHILE_BUSY;
mmc->ops = &meson_mx_mmc_ops;

ret = mmc_of_parse(mmc);
--
2.26.0


2020-04-15 23:58:35

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 1/1] mmc: meson-mx-sdio: Set MMC_CAP_WAIT_WHILE_BUSY

On Fri, 10 Apr 2020 at 23:30, Martin Blumenstingl
<[email protected]> wrote:
>
> The Meson SDIO controller uses the DAT0 lane for hardware busy
> detection. Set MMC_CAP_WAIT_WHILE_BUSY accordingly. This fixes
> the following error observed with Linux 5.7 (pre-rc-1):
> mmc1: Card stuck being busy! __mmc_poll_for_busy
> blk_update_request: I/O error, dev mmcblk1, sector 17111080 op
> 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
>
> Signed-off-by: Martin Blumenstingl <[email protected]>
> ---
> I'm sending this as RFC because I'm not sure if this is a proper fix.
> It "fixes" the issue for me but I want the MMC maintainers to double-
> check this.

Thanks for sending this! I assume it's a regression and caused by one
of my patches that went in for 5.7. Probably this one:
0d84c3e6a5b2 mmc: core: Convert to mmc_poll_for_busy() for erase/trim/discard

Now, even if enabling MMC_CAP_WAIT_WHILE_BUSY seems like the correct
thing to do, I suggest we really try understand why it works, so we
don't overlook some other issue that needs to be fixed.

Would you be willing to try a few debug patches, according to the below?

First, can you double check so the original polling with CMD13 is
still okay, by trying the below minor change. The intent is to force
polling with CMD13 for the erase/discard operation.

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index baa6314f69b4..bbf1dff0ae2a 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -452,7 +452,7 @@ static int mmc_busy_status(struct mmc_card *card,
bool retry_crc_err,
u32 status = 0;
int err;

- if (host->ops->card_busy) {
+ if (host->ops->card_busy && busy_cmd != MMC_BUSY_ERASE) {
*busy = host->ops->card_busy(host);
return 0;
}
--

Second, if the above works, it looks like the polling with
->card_busy() isn't really working for meson-mx-sdio.c, together with
erase/discard. To narrow down that problem, I suggest to try with a
longer erase/discard timeout in a retry fashion, while using
->card_busy(). Along the lines of the below:

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 4c5de6d37ac7..240e52fcdf2d 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1746,6 +1746,11 @@ static int mmc_do_erase(struct mmc_card *card,
unsigned int from,

/* Let's poll to find out when the erase operation completes. */
err = mmc_poll_for_busy(card, busy_timeout, MMC_BUSY_ERASE);
+ if (err) {
+ pr_err("%s: Erase poll failed err=%d timeout_ms=%u - retry!\n",
+ mmc_hostname(host), err, busy_timeout);
+ err = mmc_poll_for_busy(card, 30000, MMC_BUSY_ERASE);
+ }

out:
mmc_retune_release(card->host);
--

Let's see what this gives us...

Kind regards
Uffe

>
>
> drivers/mmc/host/meson-mx-sdio.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/host/meson-mx-sdio.c b/drivers/mmc/host/meson-mx-sdio.c
> index 8b038e7b2cd3..fe02130237a8 100644
> --- a/drivers/mmc/host/meson-mx-sdio.c
> +++ b/drivers/mmc/host/meson-mx-sdio.c
> @@ -570,7 +570,7 @@ static int meson_mx_mmc_add_host(struct meson_mx_mmc_host *host)
> mmc->f_max = clk_round_rate(host->cfg_div_clk,
> clk_get_rate(host->parent_clk));
>
> - mmc->caps |= MMC_CAP_ERASE | MMC_CAP_CMD23;
> + mmc->caps |= MMC_CAP_ERASE | MMC_CAP_CMD23 | MMC_CAP_WAIT_WHILE_BUSY;
> mmc->ops = &meson_mx_mmc_ops;
>
> ret = mmc_of_parse(mmc);
> --
> 2.26.0
>

2020-04-16 01:12:45

by Martin Blumenstingl

[permalink] [raw]
Subject: Re: [PATCH 1/1] mmc: meson-mx-sdio: Set MMC_CAP_WAIT_WHILE_BUSY

Hi Ulf,

thank you very much for taking the time to look into this!

On Wed, Apr 15, 2020 at 2:57 PM Ulf Hansson <[email protected]> wrote:
[...]
> Thanks for sending this! I assume it's a regression and caused by one
> of my patches that went in for 5.7. Probably this one:
> 0d84c3e6a5b2 mmc: core: Convert to mmc_poll_for_busy() for erase/trim/discard
indeed, I only observed this with 5.7-rc1-ish, before everything was
working fine

> Now, even if enabling MMC_CAP_WAIT_WHILE_BUSY seems like the correct
> thing to do, I suggest we really try understand why it works, so we
> don't overlook some other issue that needs to be fixed.
great, that's why I'm seeking for help!

> Would you be willing to try a few debug patches, according to the below?
sure
while reading your suggestions I went back to the vendor driver and
observed that they don't implement card_busy for this controller
Thus I added the following line to meson_mx_mmc_card_busy for all of
your tests to see what the controller sees in terms of our card busy
implementation:
dev_info(mmc_dev(host->mmc), "%s read IRQC = 0x%08x\n",
__func__, irqc);

> First, can you double check so the original polling with CMD13 is
> still okay, by trying the below minor change. The intent is to force
> polling with CMD13 for the erase/discard operation.
I have tried this one and it seems to work around the problem (before
I reverted my change and dropped MMC_CAP_WAIT_WHILE_BUSY from
mmc->caps)
also I did not see meson_mx_mmc_card_busy being invoked (not even
once, but I don't know if that's expected)

[...]
> Second, if the above works, it looks like the polling with
> ->card_busy() isn't really working for meson-mx-sdio.c, together with
> erase/discard. To narrow down that problem, I suggest to try with a
> longer erase/discard timeout in a retry fashion, while using
> ->card_busy(). Along the lines of the below:
I have tried this one as well (before I reverted the earlier CMD13
patch) and with MMC_CAP_WAIT_WHILE_BUSY unset in mmc->caps
This doesn't seem to work around the issue - kernel log extract attached.
Also I'm seeing that the the current meson_mx_mmc_card_busy
implementation returns that the card is busy.
example: 0x1f001f10 & 0x3c00 = 0x1c00. the busy logic in the driver
is: !!0x1c00 = 1

My conclusion is:
- meson_mx_mmc_card_busy is not working and should be removed (because
I don't know how to make it work). it probably never worked but we
didn't notice until a recent change
- set MMC_CAP_WAIT_WHILE_BUSY as per my initial patch
- use Fixes: ed80a13bb4c4c9 ("mmc: meson-mx-sdio: Add a driver for the
Amlogic Meson8 and Meson8b SoCs")

Does this make sense?
Also please let me know if you want me to try something else


Martin


Attachments:
erase-poll-retry.txt (7.53 kB)

2020-04-16 11:29:32

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 1/1] mmc: meson-mx-sdio: Set MMC_CAP_WAIT_WHILE_BUSY

On Wed, 15 Apr 2020 at 23:24, Martin Blumenstingl
<[email protected]> wrote:
>
> Hi Ulf,
>
> thank you very much for taking the time to look into this!
>
> On Wed, Apr 15, 2020 at 2:57 PM Ulf Hansson <[email protected]> wrote:
> [...]
> > Thanks for sending this! I assume it's a regression and caused by one
> > of my patches that went in for 5.7. Probably this one:
> > 0d84c3e6a5b2 mmc: core: Convert to mmc_poll_for_busy() for erase/trim/discard
> indeed, I only observed this with 5.7-rc1-ish, before everything was
> working fine
>
> > Now, even if enabling MMC_CAP_WAIT_WHILE_BUSY seems like the correct
> > thing to do, I suggest we really try understand why it works, so we
> > don't overlook some other issue that needs to be fixed.
> great, that's why I'm seeking for help!
>
> > Would you be willing to try a few debug patches, according to the below?
> sure
> while reading your suggestions I went back to the vendor driver and
> observed that they don't implement card_busy for this controller
> Thus I added the following line to meson_mx_mmc_card_busy for all of
> your tests to see what the controller sees in terms of our card busy
> implementation:
> dev_info(mmc_dev(host->mmc), "%s read IRQC = 0x%08x\n",
> __func__, irqc);
>
> > First, can you double check so the original polling with CMD13 is
> > still okay, by trying the below minor change. The intent is to force
> > polling with CMD13 for the erase/discard operation.
> I have tried this one and it seems to work around the problem (before
> I reverted my change and dropped MMC_CAP_WAIT_WHILE_BUSY from
> mmc->caps)
> also I did not see meson_mx_mmc_card_busy being invoked (not even
> once, but I don't know if that's expected)

For eMMC it should be used quite frequently, as CMD6 is sent quite
often, during initialization for example (see mmc_switch() and
__mmc_switch()).

For SD cards, it's being used for erase/trim/discard and while
changing to UHS-I speed modes (1.8V I/O voltage, see
mmc_set_uhs_voltage(). The latter also requires your host driver to
implement the ->start_signal_voltage_switch() host ops, which isn't
the case (yet!?)

For SDIO cards it's being used in-between requests to make sure the
SDIO card is ready for the next command (see __mmc_start_request())

>
> [...]
> > Second, if the above works, it looks like the polling with
> > ->card_busy() isn't really working for meson-mx-sdio.c, together with
> > erase/discard. To narrow down that problem, I suggest to try with a
> > longer erase/discard timeout in a retry fashion, while using
> > ->card_busy(). Along the lines of the below:
> I have tried this one as well (before I reverted the earlier CMD13
> patch) and with MMC_CAP_WAIT_WHILE_BUSY unset in mmc->caps
> This doesn't seem to work around the issue - kernel log extract attached.
> Also I'm seeing that the the current meson_mx_mmc_card_busy
> implementation returns that the card is busy.
> example: 0x1f001f10 & 0x3c00 = 0x1c00. the busy logic in the driver
> is: !!0x1c00 = 1
>
> My conclusion is:
> - meson_mx_mmc_card_busy is not working and should be removed (because
> I don't know how to make it work). it probably never worked but we
> didn't notice until a recent change

I see.

Depending on what your driver plans to support for the future, see
above, you may need to come back to this in future.

> - set MMC_CAP_WAIT_WHILE_BUSY as per my initial patch
> - use Fixes: ed80a13bb4c4c9 ("mmc: meson-mx-sdio: Add a driver for the
> Amlogic Meson8 and Meson8b SoCs")
>
> Does this make sense?

Yes, I think so.

> Also please let me know if you want me to try something else

I would also suggest adding a patch that removes the ->card_busy() ops
from the meson driver - and that should probably also carry the same
fixes tag as above. Just to make sure the callback doesn't get used in
some other circumstances, when going forward.

Kind regards
Uffe

2020-04-16 20:42:48

by Martin Blumenstingl

[permalink] [raw]
Subject: Re: [PATCH 1/1] mmc: meson-mx-sdio: Set MMC_CAP_WAIT_WHILE_BUSY

Hi Ulf,

On Thu, Apr 16, 2020 at 1:26 PM Ulf Hansson <[email protected]> wrote:
[...]
> > > First, can you double check so the original polling with CMD13 is
> > > still okay, by trying the below minor change. The intent is to force
> > > polling with CMD13 for the erase/discard operation.
> > I have tried this one and it seems to work around the problem (before
> > I reverted my change and dropped MMC_CAP_WAIT_WHILE_BUSY from
> > mmc->caps)
> > also I did not see meson_mx_mmc_card_busy being invoked (not even
> > once, but I don't know if that's expected)
>
> For eMMC it should be used quite frequently, as CMD6 is sent quite
> often, during initialization for example (see mmc_switch() and
> __mmc_switch()).
I only tested the meson-mx-sdio driver with eMMC once (a long time
ago) and it did not work.
...maybe this is the reason ;)

> For SD cards, it's being used for erase/trim/discard and while
> changing to UHS-I speed modes (1.8V I/O voltage, see
> mmc_set_uhs_voltage(). The latter also requires your host driver to
> implement the ->start_signal_voltage_switch() host ops, which isn't
> the case (yet!?)
SD cards and SDIO cards are the main use-case for this driver.
I don't know of any board which connects this controller to a card
with 1.8V (or 1.8V/3.3V configurable) I/O voltage. This is why I
didn't care about ->start_signal_voltage_switch() so far

[...]
> > > Second, if the above works, it looks like the polling with
> > > ->card_busy() isn't really working for meson-mx-sdio.c, together with
> > > erase/discard. To narrow down that problem, I suggest to try with a
> > > longer erase/discard timeout in a retry fashion, while using
> > > ->card_busy(). Along the lines of the below:
> > I have tried this one as well (before I reverted the earlier CMD13
> > patch) and with MMC_CAP_WAIT_WHILE_BUSY unset in mmc->caps
> > This doesn't seem to work around the issue - kernel log extract attached.
> > Also I'm seeing that the the current meson_mx_mmc_card_busy
> > implementation returns that the card is busy.
> > example: 0x1f001f10 & 0x3c00 = 0x1c00. the busy logic in the driver
> > is: !!0x1c00 = 1
> >
> > My conclusion is:
> > - meson_mx_mmc_card_busy is not working and should be removed (because
> > I don't know how to make it work). it probably never worked but we
> > didn't notice until a recent change
>
> I see.
>
> Depending on what your driver plans to support for the future, see
> above, you may need to come back to this in future.
I'll let future Martin deal with that - he can add it back as needed ;-)
current Martin has his doubts that it'll be needed (see above)

> > - set MMC_CAP_WAIT_WHILE_BUSY as per my initial patch
> > - use Fixes: ed80a13bb4c4c9 ("mmc: meson-mx-sdio: Add a driver for the
> > Amlogic Meson8 and Meson8b SoCs")
> >
> > Does this make sense?
>
> Yes, I think so.
thank you for double-checking!

> > Also please let me know if you want me to try something else
>
> I would also suggest adding a patch that removes the ->card_busy() ops
> from the meson driver - and that should probably also carry the same
> fixes tag as above. Just to make sure the callback doesn't get used in
> some other circumstances, when going forward.
agreed, I will send a v2 later which adds the Fixes tag, a bit more
description and an additional patch to remove ->card_busy()


Thank you again very much for the insights!
Have a great day,
Martin