2022-02-18 00:18:53

by H. Nikolaus Schaller

[permalink] [raw]
Subject: [BUG] mmc: core: adjust polling interval for CMD1

Hi,
Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
with our disto kernel based on v5.17-rc1:

>> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
>> Black, but didn't check the logs.
>> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
>> traffic there were messages like
>>
>> [ 662.529584] mmc1: error -110 doing runtime resume
>> [ 669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
>>
>> [ 739.076072] mmc1: Timeout waiting for hardware interrupt.
>> [ 739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
>> [ 739.231053] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00003101
>> [ 739.316472] mmc1: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000400
>> [ 739.401937] mmc1: sdhci: Argument: 0x00342d30 | Trn mode: 0x00000023
>> [ 739.487439] mmc1: sdhci: Present: 0x01f70000 | Host ctl: 0x00000000
>> [ 739.573007] mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
>> [ 739.658609] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00003c07
>> [ 739.744224] mmc1: sdhci: Timeout: 0x00000007 | Int stat: 0x00000002
>> [ 739.829896] mmc1: sdhci: Int enab: 0x027f000b | Sig enab: 0x027f000b
>> [ 739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>> [ 740.001394] mmc1: sdhci: Caps: 0x07e10080 | Caps_1: 0x00000000
>> [ 740.087208] mmc1: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
>> [ 740.173051] mmc1: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x00000000
>> [ 740.258928] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
>> [ 740.344854] mmc1: sdhci: Host ctl2: 0x00000000
>> [ 740.402796] mmc1: sdhci: ============================================
>>
>> and finally IO errors and a corrupted filesystem.
>>
>> 5.17.0-rc4-letux+ shows the same behaviour.

I checked with my BeagleBoard Black (am3358) and can confirm this observation.
It happens only with the integrated eMMC but not with the µSD connected to
the other mmc interface.

A git bisect found:

76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
Author: Huijin Park <[email protected]>
Date: Thu Nov 4 15:32:31 2021 +0900

mmc: core: adjust polling interval for CMD1

In mmc_send_op_cond(), loops are continuously performed at the same
interval of 10 ms. However the behaviour is not good for some eMMC
which can be out from a busy state earlier than 10 ms if normal.

Rather than fixing about the interval time in mmc_send_op_cond(),
let's instead convert into using the common __mmc_poll_for_busy().

The reason for adjusting the interval time is that it is important
to reduce the eMMC initialization time, especially in devices that
use eMMC as rootfs.

Test log(eMMC:KLM8G1GETF-B041):

before: 12 ms (0.311016 - 0.298729)
[ 0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
[ 0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
[ 0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
[ 0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007

after: 2 ms (0.301270 - 0.298762)
[ 0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
[ 0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
[ 0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
[ 0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
[ 0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
[ 0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
[ 0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
[ 0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007

Signed-off-by: Huijin Park <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

Reverting this makes v5.17-rc[1-4] work.

Any suggestions or fixes?

BR and thanks,
Nikolaus Schaller

Reported-by: [email protected]


2022-03-01 15:50:37

by Ulf Hansson

[permalink] [raw]
Subject: Re: [BUG] mmc: core: adjust polling interval for CMD1

On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <[email protected]> wrote:
>
> Hi,
> Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
> with our disto kernel based on v5.17-rc1:
>
> >> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
> >> Black, but didn't check the logs.
> >> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
> >> traffic there were messages like
> >>
> >> [ 662.529584] mmc1: error -110 doing runtime resume
> >> [ 669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
> >>
> >> [ 739.076072] mmc1: Timeout waiting for hardware interrupt.
> >> [ 739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> >> [ 739.231053] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00003101
> >> [ 739.316472] mmc1: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000400
> >> [ 739.401937] mmc1: sdhci: Argument: 0x00342d30 | Trn mode: 0x00000023
> >> [ 739.487439] mmc1: sdhci: Present: 0x01f70000 | Host ctl: 0x00000000
> >> [ 739.573007] mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
> >> [ 739.658609] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00003c07
> >> [ 739.744224] mmc1: sdhci: Timeout: 0x00000007 | Int stat: 0x00000002
> >> [ 739.829896] mmc1: sdhci: Int enab: 0x027f000b | Sig enab: 0x027f000b
> >> [ 739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> >> [ 740.001394] mmc1: sdhci: Caps: 0x07e10080 | Caps_1: 0x00000000
> >> [ 740.087208] mmc1: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
> >> [ 740.173051] mmc1: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x00000000
> >> [ 740.258928] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
> >> [ 740.344854] mmc1: sdhci: Host ctl2: 0x00000000
> >> [ 740.402796] mmc1: sdhci: ============================================
> >>
> >> and finally IO errors and a corrupted filesystem.
> >>
> >> 5.17.0-rc4-letux+ shows the same behaviour.
>
> I checked with my BeagleBoard Black (am3358) and can confirm this observation.
> It happens only with the integrated eMMC but not with the µSD connected to
> the other mmc interface.
>
> A git bisect found:
>
> 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
> commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
> Author: Huijin Park <[email protected]>
> Date: Thu Nov 4 15:32:31 2021 +0900
>
> mmc: core: adjust polling interval for CMD1
>
> In mmc_send_op_cond(), loops are continuously performed at the same
> interval of 10 ms. However the behaviour is not good for some eMMC
> which can be out from a busy state earlier than 10 ms if normal.
>
> Rather than fixing about the interval time in mmc_send_op_cond(),
> let's instead convert into using the common __mmc_poll_for_busy().
>
> The reason for adjusting the interval time is that it is important
> to reduce the eMMC initialization time, especially in devices that
> use eMMC as rootfs.
>
> Test log(eMMC:KLM8G1GETF-B041):
>
> before: 12 ms (0.311016 - 0.298729)
> [ 0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
> [ 0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
> [ 0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
> [ 0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007
>
> after: 2 ms (0.301270 - 0.298762)
> [ 0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
> [ 0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
> [ 0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
> [ 0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
> [ 0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
> [ 0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
> [ 0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
> [ 0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007
>
> Signed-off-by: Huijin Park <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Ulf Hansson <[email protected]>
>
> Reverting this makes v5.17-rc[1-4] work.
>
> Any suggestions or fixes?
>
> BR and thanks,
> Nikolaus Schaller
>
> Reported-by: [email protected]
>

Thanks for reporting and bisecting!

What changed with the offending commit is two things:

1) We are sending the CMD1 more frequently, initially in the loop in
the __mmc_poll_for_busy. Step by step, we increase the polling period.
2) We may end up using a slightly shorter total timeout for polling
time, compared to what we used before the offending commit.

Hopefully the problem is related to 2), in which case I think the
below patch should help. Can you please give it a try?

Kind regards
Uffe

From: Ulf Hansson <[email protected]>
Date: Tue, 1 Mar 2022 14:24:21 +0100
Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND

It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
a bit too small due to recent changes. Therefore, let's extend it to 2s,
which is probably more inline with its previous value, to fix the reported
timeout problems.

While at it, let's add a define for the timeout value, rather than using
a hard-coded value for it.

Reported-by: Jean Rene Dawin <[email protected]>
Reported-by: H. Nikolaus Schaller <[email protected]>
Cc: Huijin Park <[email protected]>
Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
Signed-off-by: Ulf Hansson <[email protected]>
---
drivers/mmc/core/mmc_ops.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index d63d1c735335..1f57174b3cf3 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -21,6 +21,7 @@

#define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
#define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
+#define MMC_OP_COND_TIMEOUT_MS 2000 /* 2s */

static const u8 tuning_blk_pattern_4bit[] = {
0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
@@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;

- err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
+ err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
+ &__mmc_send_op_cond_cb, &cb_data);
if (err)
return err;

--
2.25.1

2022-03-02 10:11:29

by Jean Rene Dawin

[permalink] [raw]
Subject: Re: [BUG] mmc: core: adjust polling interval for CMD1

Ulf Hansson wrote on Tue 1/03/22 14:38:
> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <[email protected]> wrote:
> >
> > Hi,
> > Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
> > with our disto kernel based on v5.17-rc1:
> >
> > >> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
> > >> Black, but didn't check the logs.
> > >> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
> > >> traffic there were messages like
> > >>
> > >> [ 662.529584] mmc1: error -110 doing runtime resume
> > >> [ 669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
> > >>
> > >> [ 739.076072] mmc1: Timeout waiting for hardware interrupt.
> > >> [ 739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> > >> [ 739.231053] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00003101
> > >> [ 739.316472] mmc1: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000400
> > >> [ 739.401937] mmc1: sdhci: Argument: 0x00342d30 | Trn mode: 0x00000023
> > >> [ 739.487439] mmc1: sdhci: Present: 0x01f70000 | Host ctl: 0x00000000
> > >> [ 739.573007] mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
> > >> [ 739.658609] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00003c07
> > >> [ 739.744224] mmc1: sdhci: Timeout: 0x00000007 | Int stat: 0x00000002
> > >> [ 739.829896] mmc1: sdhci: Int enab: 0x027f000b | Sig enab: 0x027f000b
> > >> [ 739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> > >> [ 740.001394] mmc1: sdhci: Caps: 0x07e10080 | Caps_1: 0x00000000
> > >> [ 740.087208] mmc1: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
> > >> [ 740.173051] mmc1: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x00000000
> > >> [ 740.258928] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
> > >> [ 740.344854] mmc1: sdhci: Host ctl2: 0x00000000
> > >> [ 740.402796] mmc1: sdhci: ============================================
> > >>
> > >> and finally IO errors and a corrupted filesystem.
> > >>
> > >> 5.17.0-rc4-letux+ shows the same behaviour.
> >
> > I checked with my BeagleBoard Black (am3358) and can confirm this observation.
> > It happens only with the integrated eMMC but not with the ?SD connected to
> > the other mmc interface.
> >
> > A git bisect found:
> >
> > 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
> > commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
> > Author: Huijin Park <[email protected]>
> > Date: Thu Nov 4 15:32:31 2021 +0900
> >
> > mmc: core: adjust polling interval for CMD1
> >
> > In mmc_send_op_cond(), loops are continuously performed at the same
> > interval of 10 ms. However the behaviour is not good for some eMMC
> > which can be out from a busy state earlier than 10 ms if normal.
> >
> > Rather than fixing about the interval time in mmc_send_op_cond(),
> > let's instead convert into using the common __mmc_poll_for_busy().
> >
> > The reason for adjusting the interval time is that it is important
> > to reduce the eMMC initialization time, especially in devices that
> > use eMMC as rootfs.
> >
> > Test log(eMMC:KLM8G1GETF-B041):
> >
> > before: 12 ms (0.311016 - 0.298729)
> > [ 0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
> > [ 0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
> > [ 0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
> > [ 0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007
> >
> > after: 2 ms (0.301270 - 0.298762)
> > [ 0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
> > [ 0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
> > [ 0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
> > [ 0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
> > [ 0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
> > [ 0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
> > [ 0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
> > [ 0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007
> >
> > Signed-off-by: Huijin Park <[email protected]>
> > Link: https://lore.kernel.org/r/[email protected]
> > Signed-off-by: Ulf Hansson <[email protected]>
> >
> > Reverting this makes v5.17-rc[1-4] work.
> >
> > Any suggestions or fixes?
> >
> > BR and thanks,
> > Nikolaus Schaller
> >
> > Reported-by: [email protected]
> >
>
> Thanks for reporting and bisecting!
>
> What changed with the offending commit is two things:
>
> 1) We are sending the CMD1 more frequently, initially in the loop in
> the __mmc_poll_for_busy. Step by step, we increase the polling period.
> 2) We may end up using a slightly shorter total timeout for polling
> time, compared to what we used before the offending commit.
>
> Hopefully the problem is related to 2), in which case I think the
> below patch should help. Can you please give it a try?
>
> Kind regards
> Uffe
>
> From: Ulf Hansson <[email protected]>
> Date: Tue, 1 Mar 2022 14:24:21 +0100
> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
>
> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> which is probably more inline with its previous value, to fix the reported
> timeout problems.
>
> While at it, let's add a define for the timeout value, rather than using
> a hard-coded value for it.
>
> Reported-by: Jean Rene Dawin <[email protected]>
> Reported-by: H. Nikolaus Schaller <[email protected]>
> Cc: Huijin Park <[email protected]>
> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
> drivers/mmc/core/mmc_ops.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index d63d1c735335..1f57174b3cf3 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -21,6 +21,7 @@
>
> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
> +#define MMC_OP_COND_TIMEOUT_MS 2000 /* 2s */
>
> static const u8 tuning_blk_pattern_4bit[] = {
> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> ocr, u32 *rocr)
> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>
> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> + err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> + &__mmc_send_op_cond_cb, &cb_data);
> if (err)
> return err;
>
> --
> 2.25.1

Hi,

thanks. But testing with this patch still gives the same errors:

[ 52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
[ 52.273380] mmc1: error -110 doing runtime resume

and the system gets stuck eventually.

Regards,
Jean Rene Dawin

2022-03-02 11:01:53

by H. Nikolaus Schaller

[permalink] [raw]
Subject: Re: [BUG] mmc: core: adjust polling interval for CMD1

Hi,

> Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <[email protected]>:
>
> Ulf Hansson wrote on Tue 1/03/22 14:38:
>> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <[email protected]> wrote:
>>>
>>
>> From: Ulf Hansson <[email protected]>
>> Date: Tue, 1 Mar 2022 14:24:21 +0100
>> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
>>
>> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
>> a bit too small due to recent changes. Therefore, let's extend it to 2s,
>> which is probably more inline with its previous value, to fix the reported
>> timeout problems.
>>
>> While at it, let's add a define for the timeout value, rather than using
>> a hard-coded value for it.
>>
>> Reported-by: Jean Rene Dawin <[email protected]>
>> Reported-by: H. Nikolaus Schaller <[email protected]>
>> Cc: Huijin Park <[email protected]>
>> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
>> Signed-off-by: Ulf Hansson <[email protected]>
>> ---
>> drivers/mmc/core/mmc_ops.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
>> index d63d1c735335..1f57174b3cf3 100644
>> --- a/drivers/mmc/core/mmc_ops.c
>> +++ b/drivers/mmc/core/mmc_ops.c
>> @@ -21,6 +21,7 @@
>>
>> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
>> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
>> +#define MMC_OP_COND_TIMEOUT_MS 2000 /* 2s */
>>
>> static const u8 tuning_blk_pattern_4bit[] = {
>> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
>> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
>> ocr, u32 *rocr)
>> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>>
>> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
>> + err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
>> + &__mmc_send_op_cond_cb, &cb_data);
>> if (err)
>> return err;
>>
>> --
>> 2.25.1
>
> Hi,
>
> thanks. But testing with this patch still gives the same errors:
>
> [ 52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> [ 52.273380] mmc1: error -110 doing runtime resume
>
> and the system gets stuck eventually.

Same result from my tests.

BR and thanks,
Nikolaus

2022-03-03 11:33:20

by Ulf Hansson

[permalink] [raw]
Subject: Re: [BUG] mmc: core: adjust polling interval for CMD1

On Wed, 2 Mar 2022 at 10:40, H. Nikolaus Schaller <[email protected]> wrote:
>
> Hi,
>
> > Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <[email protected]>:
> >
> > Ulf Hansson wrote on Tue 1/03/22 14:38:
> >> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <[email protected]> wrote:
> >>>
> >>
> >> From: Ulf Hansson <[email protected]>
> >> Date: Tue, 1 Mar 2022 14:24:21 +0100
> >> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> >>
> >> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> >> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> >> which is probably more inline with its previous value, to fix the reported
> >> timeout problems.
> >>
> >> While at it, let's add a define for the timeout value, rather than using
> >> a hard-coded value for it.
> >>
> >> Reported-by: Jean Rene Dawin <[email protected]>
> >> Reported-by: H. Nikolaus Schaller <[email protected]>
> >> Cc: Huijin Park <[email protected]>
> >> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> >> Signed-off-by: Ulf Hansson <[email protected]>
> >> ---
> >> drivers/mmc/core/mmc_ops.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >> index d63d1c735335..1f57174b3cf3 100644
> >> --- a/drivers/mmc/core/mmc_ops.c
> >> +++ b/drivers/mmc/core/mmc_ops.c
> >> @@ -21,6 +21,7 @@
> >>
> >> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
> >> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
> >> +#define MMC_OP_COND_TIMEOUT_MS 2000 /* 2s */
> >>
> >> static const u8 tuning_blk_pattern_4bit[] = {
> >> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> >> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> >> ocr, u32 *rocr)
> >> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> >> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >>
> >> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> >> + err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> >> + &__mmc_send_op_cond_cb, &cb_data);
> >> if (err)
> >> return err;
> >>
> >> --
> >> 2.25.1
> >
> > Hi,
> >
> > thanks. But testing with this patch still gives the same errors:
> >
> > [ 52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> > [ 52.273380] mmc1: error -110 doing runtime resume
> >
> > and the system gets stuck eventually.
>
> Same result from my tests.
>
> BR and thanks,
> Nikolaus

Alright, it starts to sound like we might need a revert (or at least a
way to restore the previous behaviour) - even if that would just paper
over the real problem. The real problem is more likely related to how
the host driver manages restoring of the power to the card, which
happens when runtime resuming it.

In any case, just to make sure the loop timeout itself isn't the
problem, can you run the below debug patch please? The intent is to
figure out how long the worst case timeout is, when it's working with
CMD1. As soon as the timeout exceeds the worst case, there is a
message printed to the log.

The below patch is based upon that the offending commit has been reverted.

Kind regards
Uffe

From: Ulf Hansson <[email protected]>
Date: Thu, 3 Mar 2022 11:00:04 +0100
Subject: [PATCH] mmc: core: DEBUG - Measure and log worst case CMD1 loop

Signed-off-by: Ulf Hansson <[email protected]>
---
drivers/mmc/core/mmc_ops.c | 11 +++++++++++
include/linux/mmc/host.h | 1 +
2 files changed, 12 insertions(+)

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index 9946733a34c6..3f03d9e8a3a4 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -177,11 +177,15 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
{
struct mmc_command cmd = {};
int i, err = 0;
+ s64 cmd1_ms;
+ ktime_t time_start;

cmd.opcode = MMC_SEND_OP_COND;
cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;

+ time_start = ktime_get();
+
for (i = 100; i; i--) {
err = mmc_wait_for_cmd(host, &cmd, 0);
if (err)
@@ -211,6 +215,13 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
cmd.arg = cmd.resp[0] | BIT(30);
}

+ cmd1_ms = ktime_to_ms(ktime_sub(ktime_get(), time_start));
+ if (!err && cmd1_ms > host->cmd1_ms) {
+ pr_warn("%s: CMD1 timeout increased to %lld ms, loop=%d\n",
+ mmc_hostname(host), cmd1_ms, i);
+ host->cmd1_ms = cmd1_ms;
+ }
+
if (rocr && !mmc_host_is_spi(host))
*rocr = cmd.resp[0];

diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 7afb57cab00b..c2ca3bb05620 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -301,6 +301,7 @@ struct mmc_host {
unsigned int f_min;
unsigned int f_max;
unsigned int f_init;
+ s64 cmd1_ms;
u32 ocr_avail;
u32 ocr_avail_sdio; /* SDIO-specific OCR */
u32 ocr_avail_sd; /* SD-specific OCR */
--
2.25.1

2022-03-03 13:40:42

by Huijin Park

[permalink] [raw]
Subject: RE: [BUG] mmc: core: adjust polling interval for CMD1

Hi, sorry for the late reply.
I guess the problem occurs depending on the eMMC model.
Because I tested again and there was no problem.
The eMMC model used for the test are as follows.
(THGBMJG6C1LBAIL, KLM8G1GETF-B041)
Anyway I guess the cause is interval time.
I wrote a debug patch assuming that the reason was that some mmc models
could not process CMD1 delivered at short intervals.
I copied the polling function and adds interval minimum time parameter.
I set the minimum time to 1ms. You can adjust it.
Please test if there is no problem with the debug patch.

>Hi,
>
>> Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <[email protected]>:
>>
>> Ulf Hansson wrote on Tue 1/03/22 14:38:
>>> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <[email protected]> wrote:
>>>>
>>>
>>> From: Ulf Hansson <[email protected]>
>>> Date: Tue, 1 Mar 2022 14:24:21 +0100
>>> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
>>>
>>> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
>>> a bit too small due to recent changes. Therefore, let's extend it to 2s,
>>> which is probably more inline with its previous value, to fix the reported
>>> timeout problems.
>>>
>>> While at it, let's add a define for the timeout value, rather than using
>>> a hard-coded value for it.
>>>
>>> Reported-by: Jean Rene Dawin <[email protected]>
>>> Reported-by: H. Nikolaus Schaller <[email protected]>
>>> Cc: Huijin Park <[email protected]>
>>> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
>>> Signed-off-by: Ulf Hansson <[email protected]>
>>> ---
>>> drivers/mmc/core/mmc_ops.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
>>> index d63d1c735335..1f57174b3cf3 100644
>>> --- a/drivers/mmc/core/mmc_ops.c
>>> +++ b/drivers/mmc/core/mmc_ops.c
>>> @@ -21,6 +21,7 @@
>>>
>>> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
>>> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
>>> +#define MMC_OP_COND_TIMEOUT_MS 2000 /* 2s */
>>>
>>> static const u8 tuning_blk_pattern_4bit[] = {
>>> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
>>> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
>>> ocr, u32 *rocr)
>>> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>>> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>>>
>>> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
>>> + err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
>>> + &__mmc_send_op_cond_cb, &cb_data);
>>> if (err)
>>> return err;
>>>
>>> --
>>> 2.25.1
>>
>> Hi,
>>
>> thanks. But testing with this patch still gives the same errors:
>>
>> [ 52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
>> [ 52.273380] mmc1: error -110 doing runtime resume
>>
>> and the system gets stuck eventually.
>
>Same result from my tests.
>
>BR and thanks,
>Nikolaus


From c2458cb998dd8e275fefba52dd2532beb153c82d Mon Sep 17 00:00:00 2001
From: Huijin Park <[email protected]>
Date: Thu, 3 Mar 2022 20:43:22 +0900
Subject: [PATCH] mmc: core: extend timeout and set min time for op_cond

This patch extends timeout to 2s and sets minimal interval time for
checking op_cond stuck problem.

Signed-off-by: Huijin Park <[email protected]>

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index d63d1c735335..ccad6379d183 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -21,6 +21,7 @@

#define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
#define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
+#define MMC_OP_COND_TIMEOUT_MS ( 2 * 1000) /* 2s */

static const u8 tuning_blk_pattern_4bit[] = {
0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
@@ -179,6 +180,47 @@ int mmc_go_idle(struct mmc_host *host)
return err;
}

+static int ____mmc_poll_for_busy(struct mmc_host *host, unsigned int udelay_min,
+ unsigned int timeout_ms,
+ int (*busy_cb)(void *cb_data, bool *busy),
+ void *cb_data)
+{
+ int err;
+ unsigned long timeout;
+ unsigned int udelay = udelay_min, udelay_max = 32768;
+ bool expired = false;
+ bool busy = false;
+
+ timeout = jiffies + msecs_to_jiffies(timeout_ms) + 1;
+ do {
+ /*
+ * Due to the possibility of being preempted while polling,
+ * check the expiration time first.
+ */
+ expired = time_after(jiffies, timeout);
+
+ err = (*busy_cb)(cb_data, &busy);
+ if (err)
+ return err;
+
+ /* Timeout if the device still remains busy. */
+ if (expired && busy) {
+ pr_err("%s: Card stuck being busy! %s\n",
+ mmc_hostname(host), __func__);
+ return -ETIMEDOUT;
+ }
+
+ /* Throttle the polling rate to avoid hogging the CPU. */
+ if (busy) {
+ usleep_range(udelay, udelay * 2);
+ if (udelay < udelay_max)
+ udelay *= 2;
+ }
+ } while (busy);
+
+ return 0;
+}
+
static int __mmc_send_op_cond_cb(void *cb_data, bool *busy)
{
struct mmc_op_cond_busy_data *data = cb_data;
@@ -232,7 +274,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32 ocr, u32 *rocr)
cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;

- err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
+ err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
+ &__mmc_send_op_cond_cb, &cb_data);
if (err)
return err;

--
2.17.1

2022-03-04 10:50:30

by Jean Rene Dawin

[permalink] [raw]
Subject: Re: [BUG] mmc: core: adjust polling interval for CMD1

Huijin Park wrote on Thu 3/03/22 21:16:
> Hi, sorry for the late reply.
> I guess the problem occurs depending on the eMMC model.
> Because I tested again and there was no problem.
> The eMMC model used for the test are as follows.
> (THGBMJG6C1LBAIL, KLM8G1GETF-B041)
> Anyway I guess the cause is interval time.
> I wrote a debug patch assuming that the reason was that some mmc models
> could not process CMD1 delivered at short intervals.
> I copied the polling function and adds interval minimum time parameter.
> I set the minimum time to 1ms. You can adjust it.
> Please test if there is no problem with the debug patch.
>
> >Hi,
> >
> >> Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <[email protected]>:
> >>
> >> Ulf Hansson wrote on Tue 1/03/22 14:38:
> >>> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <[email protected]> wrote:
> >>>>
> >>>
> >>> From: Ulf Hansson <[email protected]>
> >>> Date: Tue, 1 Mar 2022 14:24:21 +0100
> >>> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> >>>
> >>> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> >>> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> >>> which is probably more inline with its previous value, to fix the reported
> >>> timeout problems.
> >>>
> >>> While at it, let's add a define for the timeout value, rather than using
> >>> a hard-coded value for it.
> >>>
> >>> Reported-by: Jean Rene Dawin <[email protected]>
> >>> Reported-by: H. Nikolaus Schaller <[email protected]>
> >>> Cc: Huijin Park <[email protected]>
> >>> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> >>> Signed-off-by: Ulf Hansson <[email protected]>
> >>> ---
> >>> drivers/mmc/core/mmc_ops.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >>> index d63d1c735335..1f57174b3cf3 100644
> >>> --- a/drivers/mmc/core/mmc_ops.c
> >>> +++ b/drivers/mmc/core/mmc_ops.c
> >>> @@ -21,6 +21,7 @@
> >>>
> >>> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
> >>> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
> >>> +#define MMC_OP_COND_TIMEOUT_MS 2000 /* 2s */
> >>>
> >>> static const u8 tuning_blk_pattern_4bit[] = {
> >>> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> >>> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> >>> ocr, u32 *rocr)
> >>> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> >>> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >>>
> >>> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> >>> + err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> >>> + &__mmc_send_op_cond_cb, &cb_data);
> >>> if (err)
> >>> return err;
> >>>
> >>> --
> >>> 2.25.1
> >>
> >> Hi,
> >>
> >> thanks. But testing with this patch still gives the same errors:
> >>
> >> [ 52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> >> [ 52.273380] mmc1: error -110 doing runtime resume
> >>
> >> and the system gets stuck eventually.
> >
> >Same result from my tests.
> >
> >BR and thanks,
> >Nikolaus
>
>
> From c2458cb998dd8e275fefba52dd2532beb153c82d Mon Sep 17 00:00:00 2001
> From: Huijin Park <[email protected]>
> Date: Thu, 3 Mar 2022 20:43:22 +0900
> Subject: [PATCH] mmc: core: extend timeout and set min time for op_cond
>
> This patch extends timeout to 2s and sets minimal interval time for
> checking op_cond stuck problem.
>
> Signed-off-by: Huijin Park <[email protected]>
>
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index d63d1c735335..ccad6379d183 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -21,6 +21,7 @@
>
> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
> +#define MMC_OP_COND_TIMEOUT_MS ( 2 * 1000) /* 2s */
>
> static const u8 tuning_blk_pattern_4bit[] = {
> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> @@ -179,6 +180,47 @@ int mmc_go_idle(struct mmc_host *host)
> return err;
> }
>
> +static int ____mmc_poll_for_busy(struct mmc_host *host, unsigned int udelay_min,
> + unsigned int timeout_ms,
> + int (*busy_cb)(void *cb_data, bool *busy),
> + void *cb_data)
> +{
> + int err;
> + unsigned long timeout;
> + unsigned int udelay = udelay_min, udelay_max = 32768;
> + bool expired = false;
> + bool busy = false;
> +
> + timeout = jiffies + msecs_to_jiffies(timeout_ms) + 1;
> + do {
> + /*
> + * Due to the possibility of being preempted while polling,
> + * check the expiration time first.
> + */
> + expired = time_after(jiffies, timeout);
> +
> + err = (*busy_cb)(cb_data, &busy);
> + if (err)
> + return err;
> +
> + /* Timeout if the device still remains busy. */
> + if (expired && busy) {
> + pr_err("%s: Card stuck being busy! %s\n",
> + mmc_hostname(host), __func__);
> + return -ETIMEDOUT;
> + }
> +
> + /* Throttle the polling rate to avoid hogging the CPU. */
> + if (busy) {
> + usleep_range(udelay, udelay * 2);
> + if (udelay < udelay_max)
> + udelay *= 2;
> + }
> + } while (busy);
> +
> + return 0;
> +}
> +
> static int __mmc_send_op_cond_cb(void *cb_data, bool *busy)
> {
> struct mmc_op_cond_busy_data *data = cb_data;
> @@ -232,7 +274,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32 ocr, u32 *rocr)
> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>
> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> + err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
> + &__mmc_send_op_cond_cb, &cb_data);
> if (err)
> return err;
>
> --
> 2.17.1
>

Hi,

thanks for the patch.
I started with a value of 10 ms :
err = ____mmc_poll_for_busy(host, 10, MMC_OP_COND_TIMEOUT_MS,
&__mmc_send_op_cond_cb, &cb_data);

and the result was agian

[ 23.349932] mmc1: Card stuck being busy! __mmc_poll_for_busy
[ 23.355936] mmc1: error -110 doing runtime resume

Same with 100 ms and 500 ms.

The messages seem to come from __mmc_poll_for_busy and not ____mmc_poll_for_busy.
Yet, changing the value in the call of ____mmc_poll_for_busy to 1000 ms changed the behaviour:

err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
&__mmc_send_op_cond_cb, &cb_data);

With this I didn't get any "stuck" errors and the board seemed stable.

Regards,
Jean Rene Dawin

2022-03-04 13:10:04

by Ulf Hansson

[permalink] [raw]
Subject: Re: [BUG] mmc: core: adjust polling interval for CMD1

On Fri, 4 Mar 2022 at 10:28, Jean Rene Dawin
<[email protected]> wrote:
>
> Ulf Hansson wrote on Thu 3/03/22 11:40:
> > On Wed, 2 Mar 2022 at 10:40, H. Nikolaus Schaller <[email protected]> wrote:
> >
> > Alright, it starts to sound like we might need a revert (or at least a
> > way to restore the previous behaviour) - even if that would just paper
> > over the real problem. The real problem is more likely related to how
> > the host driver manages restoring of the power to the card, which
> > happens when runtime resuming it.
> >
> > In any case, just to make sure the loop timeout itself isn't the
> > problem, can you run the below debug patch please? The intent is to
> > figure out how long the worst case timeout is, when it's working with
> > CMD1. As soon as the timeout exceeds the worst case, there is a
> > message printed to the log.
> >
> > The below patch is based upon that the offending commit has been reverted.
> >
> > Kind regards
> > Uffe
> >
> > From: Ulf Hansson <[email protected]>
> > Date: Thu, 3 Mar 2022 11:00:04 +0100
> > Subject: [PATCH] mmc: core: DEBUG - Measure and log worst case CMD1 loop
> >
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> > drivers/mmc/core/mmc_ops.c | 11 +++++++++++
> > include/linux/mmc/host.h | 1 +
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> > index 9946733a34c6..3f03d9e8a3a4 100644
> > --- a/drivers/mmc/core/mmc_ops.c
> > +++ b/drivers/mmc/core/mmc_ops.c
> > @@ -177,11 +177,15 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> > ocr, u32 *rocr)
> > {
> > struct mmc_command cmd = {};
> > int i, err = 0;
> > + s64 cmd1_ms;
> > + ktime_t time_start;
> >
> > cmd.opcode = MMC_SEND_OP_COND;
> > cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> > cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >
> > + time_start = ktime_get();
> > +
> > for (i = 100; i; i--) {
> > err = mmc_wait_for_cmd(host, &cmd, 0);
> > if (err)
> > @@ -211,6 +215,13 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> > ocr, u32 *rocr)
> > cmd.arg = cmd.resp[0] | BIT(30);
> > }
> >
> > + cmd1_ms = ktime_to_ms(ktime_sub(ktime_get(), time_start));
> > + if (!err && cmd1_ms > host->cmd1_ms) {
> > + pr_warn("%s: CMD1 timeout increased to %lld ms, loop=%d\n",
> > + mmc_hostname(host), cmd1_ms, i);
> > + host->cmd1_ms = cmd1_ms;
> > + }
> > +
> > if (rocr && !mmc_host_is_spi(host))
> > *rocr = cmd.resp[0];
> >
> > diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
> > index 7afb57cab00b..c2ca3bb05620 100644
> > --- a/include/linux/mmc/host.h
> > +++ b/include/linux/mmc/host.h
> > @@ -301,6 +301,7 @@ struct mmc_host {
> > unsigned int f_min;
> > unsigned int f_max;
> > unsigned int f_init;
> > + s64 cmd1_ms;
> > u32 ocr_avail;
> > u32 ocr_avail_sdio; /* SDIO-specific OCR */
> > u32 ocr_avail_sd; /* SD-specific OCR */
> > --
> > 2.25.1
>
> Hi,
>
> thanks. With this patch applied with reverted 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
> I don't get the "stuck" errors in __mmc_poll_for_busy and only one message of
>
> [ 1.095194] mmc1: CMD1 timeout increased to 64 ms, loop=98

Okay, so for some reason polling doesn't work, if we poll too
frequently. Clearly the 1s total timeout should be sufficient.

I will prepare a patch that restores the old behaviour and post it
within a few minutes.

Kind regards
Uffe

2022-03-04 15:23:23

by Jean Rene Dawin

[permalink] [raw]
Subject: Re: [BUG] mmc: core: adjust polling interval for CMD1

Ulf Hansson wrote on Thu 3/03/22 11:40:
> On Wed, 2 Mar 2022 at 10:40, H. Nikolaus Schaller <[email protected]> wrote:
>
> Alright, it starts to sound like we might need a revert (or at least a
> way to restore the previous behaviour) - even if that would just paper
> over the real problem. The real problem is more likely related to how
> the host driver manages restoring of the power to the card, which
> happens when runtime resuming it.
>
> In any case, just to make sure the loop timeout itself isn't the
> problem, can you run the below debug patch please? The intent is to
> figure out how long the worst case timeout is, when it's working with
> CMD1. As soon as the timeout exceeds the worst case, there is a
> message printed to the log.
>
> The below patch is based upon that the offending commit has been reverted.
>
> Kind regards
> Uffe
>
> From: Ulf Hansson <[email protected]>
> Date: Thu, 3 Mar 2022 11:00:04 +0100
> Subject: [PATCH] mmc: core: DEBUG - Measure and log worst case CMD1 loop
>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
> drivers/mmc/core/mmc_ops.c | 11 +++++++++++
> include/linux/mmc/host.h | 1 +
> 2 files changed, 12 insertions(+)
>
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index 9946733a34c6..3f03d9e8a3a4 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -177,11 +177,15 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> ocr, u32 *rocr)
> {
> struct mmc_command cmd = {};
> int i, err = 0;
> + s64 cmd1_ms;
> + ktime_t time_start;
>
> cmd.opcode = MMC_SEND_OP_COND;
> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>
> + time_start = ktime_get();
> +
> for (i = 100; i; i--) {
> err = mmc_wait_for_cmd(host, &cmd, 0);
> if (err)
> @@ -211,6 +215,13 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> ocr, u32 *rocr)
> cmd.arg = cmd.resp[0] | BIT(30);
> }
>
> + cmd1_ms = ktime_to_ms(ktime_sub(ktime_get(), time_start));
> + if (!err && cmd1_ms > host->cmd1_ms) {
> + pr_warn("%s: CMD1 timeout increased to %lld ms, loop=%d\n",
> + mmc_hostname(host), cmd1_ms, i);
> + host->cmd1_ms = cmd1_ms;
> + }
> +
> if (rocr && !mmc_host_is_spi(host))
> *rocr = cmd.resp[0];
>
> diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
> index 7afb57cab00b..c2ca3bb05620 100644
> --- a/include/linux/mmc/host.h
> +++ b/include/linux/mmc/host.h
> @@ -301,6 +301,7 @@ struct mmc_host {
> unsigned int f_min;
> unsigned int f_max;
> unsigned int f_init;
> + s64 cmd1_ms;
> u32 ocr_avail;
> u32 ocr_avail_sdio; /* SDIO-specific OCR */
> u32 ocr_avail_sd; /* SD-specific OCR */
> --
> 2.25.1

Hi,

thanks. With this patch applied with reverted 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
I don't get the "stuck" errors in __mmc_poll_for_busy and only one message of

[ 1.095194] mmc1: CMD1 timeout increased to 64 ms, loop=98

Regards,
Jean Rene Dawin