2023-11-03 08:48:39

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery

Hi

Some issues have been found with CQE error recovery. Here are some fixes.

As of V2, the alternative implementation for the patch from Kornel Dulęba:

https://lore.kernel.org/linux-mmc/[email protected]/T/#u

is now included, see patch 6 "mmc: cqhci: Fix task clearing in CQE error
recovery")

Please also note ->post_disable() seems to be missing from
cqhci_recovery_start(). It would be good if ->post_disable()
users could check if this needs attention.


Changes in V2:

mmc: cqhci: Fix task clearing in CQE error recovery
New patch

mmc: cqhci: Warn of halt or task clear failure
Add fixes and stable tags


Adrian Hunter (6):
mmc: block: Do not lose cache flush during CQE error recovery
mmc: cqhci: Increase recovery halt timeout
mmc: block: Be sure to wait while busy in CQE error recovery
mmc: block: Retry commands in CQE error recovery
mmc: cqhci: Warn of halt or task clear failure
mmc: cqhci: Fix task clearing in CQE error recovery

drivers/mmc/core/block.c | 2 ++
drivers/mmc/core/core.c | 9 +++++++--
drivers/mmc/host/cqhci-core.c | 44 +++++++++++++++++++++----------------------
3 files changed, 31 insertions(+), 24 deletions(-)


Regards
Adrian


2023-11-03 08:48:40

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery

STOP command does not guarantee to wait while busy, but subsequent command
MMC_CMDQ_TASK_MGMT to discard the queue will fail if the card is busy, so
be sure to wait by employing mmc_poll_for_busy().

Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
Cc: [email protected]
Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/core/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 3d3e0ca52614..befde2bd26d3 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -553,6 +553,8 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
mmc_wait_for_cmd(host, &cmd, 0);

+ mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true, MMC_BUSY_IO);
+
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = MMC_CMDQ_TASK_MGMT;
cmd.arg = 1; /* Discard entire queue */
--
2.34.1

2023-11-03 08:49:24

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 6/6] mmc: cqhci: Fix task clearing in CQE error recovery

If a task completion notification (TCN) is received when there is no
outstanding task, the cqhci driver issues a "spurious TCN" warning. This
was observed to happen right after CQE error recovery.

When an error interrupt is received the driver runs recovery logic.
It halts the controller, clears all pending tasks, and then re-enables
it. On some platforms, like Intel Jasper Lake, a stale task completion
event was observed, regardless of the CQHCI_CLEAR_ALL_TASKS bit being set.

This results in either:
a) Spurious TC completion event for an empty slot.
b) Corrupted data being passed up the stack, as a result of premature
completion for a newly added task.

Rather than add a quirk for affected controllers, ensure tasks are cleared
by toggling CQHCI_ENABLE, which would happen anyway if
cqhci_clear_all_tasks() timed out. This is simpler and should be safe and
effective for all controllers.

Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host")
Cc: [email protected]
Reported-by: Kornel Dulęba <[email protected]>
Tested-by: Kornel Dulęba <[email protected]>
Co-developed-by: Kornel Dulęba <[email protected]>
Signed-off-by: Kornel Dulęba <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/host/cqhci-core.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index 948799a0980c..41e94cd14109 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -1075,28 +1075,28 @@ static void cqhci_recovery_finish(struct mmc_host *mmc)

ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);

- if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
- ok = false;
-
/*
* The specification contradicts itself, by saying that tasks cannot be
* cleared if CQHCI does not halt, but if CQHCI does not halt, it should
* be disabled/re-enabled, but not to disable before clearing tasks.
* Have a go anyway.
*/
- if (!ok) {
- pr_debug("%s: cqhci: disable / re-enable\n", mmc_hostname(mmc));
- cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
- cqcfg &= ~CQHCI_ENABLE;
- cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
- cqcfg |= CQHCI_ENABLE;
- cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
- /* Be sure that there are no tasks */
- ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
- if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
- ok = false;
- WARN_ON(!ok);
- }
+ if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
+ ok = false;
+
+ /* Disable to make sure tasks really are cleared */
+ cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+ cqcfg &= ~CQHCI_ENABLE;
+ cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+ cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+ cqcfg |= CQHCI_ENABLE;
+ cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+ cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
+
+ if (!ok)
+ cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT);

cqhci_recover_mrqs(cq_host);

--
2.34.1

2023-11-03 08:49:35

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure

A correctly operating controller should successfully halt and clear tasks.
Failure may result in errors elsewhere, so promote messages from debug to
warnings.

Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host")
Cc: [email protected]
Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/host/cqhci-core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index 15f5a069af1f..948799a0980c 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -942,8 +942,8 @@ static bool cqhci_clear_all_tasks(struct mmc_host *mmc, unsigned int timeout)
ret = cqhci_tasks_cleared(cq_host);

if (!ret)
- pr_debug("%s: cqhci: Failed to clear tasks\n",
- mmc_hostname(mmc));
+ pr_warn("%s: cqhci: Failed to clear tasks\n",
+ mmc_hostname(mmc));

return ret;
}
@@ -976,7 +976,7 @@ static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout)
ret = cqhci_halted(cq_host);

if (!ret)
- pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
+ pr_warn("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));

return ret;
}
--
2.34.1

2023-11-03 08:49:38

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 4/6] mmc: block: Retry commands in CQE error recovery

It is important that MMC_CMDQ_TASK_MGMT command to discard the queue is
successful because otherwise a subsequent reset might fail to flush the
cache first. Retry it and the previous STOP command.

Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
Cc: [email protected]
Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/core/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index befde2bd26d3..a8c17b4cd737 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -551,7 +551,7 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
- mmc_wait_for_cmd(host, &cmd, 0);
+ mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);

mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true, MMC_BUSY_IO);

@@ -561,10 +561,13 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
- err = mmc_wait_for_cmd(host, &cmd, 0);
+ err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);

host->cqe_ops->cqe_recovery_finish(host);

+ if (err)
+ err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
+
mmc_retune_release(host);

return err;
--
2.34.1

2023-11-03 10:11:03

by Avri Altman

[permalink] [raw]
Subject: RE: [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery

Does the double "recovery" in the subject intentional?

Thanks,
Avri

> Hi
>
> Some issues have been found with CQE error recovery. Here are some fixes.
>
> As of V2, the alternative implementation for the patch from Kornel Dulęba:
>
> https://lore.kernel.org/linux-mmc/e7c12e07-7540-47ea-8891-
> [email protected]/T/#u
>
> is now included, see patch 6 "mmc: cqhci: Fix task clearing in CQE error
> recovery")
>
> Please also note ->post_disable() seems to be missing from
> cqhci_recovery_start(). It would be good if ->post_disable() users could
> check if this needs attention.
>
>
> Changes in V2:
>
> mmc: cqhci: Fix task clearing in CQE error recovery
> New patch
>
> mmc: cqhci: Warn of halt or task clear failure
> Add fixes and stable tags
>
>
> Adrian Hunter (6):
> mmc: block: Do not lose cache flush during CQE error recovery
> mmc: cqhci: Increase recovery halt timeout
> mmc: block: Be sure to wait while busy in CQE error recovery
> mmc: block: Retry commands in CQE error recovery
> mmc: cqhci: Warn of halt or task clear failure
> mmc: cqhci: Fix task clearing in CQE error recovery
>
> drivers/mmc/core/block.c | 2 ++
> drivers/mmc/core/core.c | 9 +++++++--
> drivers/mmc/host/cqhci-core.c | 44 +++++++++++++++++++++-----------------
> -----
> 3 files changed, 31 insertions(+), 24 deletions(-)
>
>
> Regards
> Adrian

2023-11-03 11:58:49

by Avri Altman

[permalink] [raw]
Subject: RE: [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure

> A correctly operating controller should successfully halt and clear tasks.
> Failure may result in errors elsewhere, so promote messages from debug to
> warnings.
>
> Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled
> host")
> Cc: [email protected]
> Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Avri Altman <[email protected]>

> ---
> drivers/mmc/host/cqhci-core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
> index 15f5a069af1f..948799a0980c 100644
> --- a/drivers/mmc/host/cqhci-core.c
> +++ b/drivers/mmc/host/cqhci-core.c
> @@ -942,8 +942,8 @@ static bool cqhci_clear_all_tasks(struct mmc_host
> *mmc, unsigned int timeout)
> ret = cqhci_tasks_cleared(cq_host);
>
> if (!ret)
> - pr_debug("%s: cqhci: Failed to clear tasks\n",
> - mmc_hostname(mmc));
> + pr_warn("%s: cqhci: Failed to clear tasks\n",
> + mmc_hostname(mmc));
>
> return ret;
> }
> @@ -976,7 +976,7 @@ static bool cqhci_halt(struct mmc_host *mmc,
> unsigned int timeout)
> ret = cqhci_halted(cq_host);
>
> if (!ret)
> - pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
> + pr_warn("%s: cqhci: Failed to halt\n",
> + mmc_hostname(mmc));
>
> return ret;
> }
> --
> 2.34.1

2023-11-06 06:35:53

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V2 3/6] mmc: block: Be sure to wait while busy in CQE error recovery

On 3/11/23 12:48, Avri Altman wrote:
>> STOP command does not guarantee to wait while busy, but subsequent
>> command MMC_CMDQ_TASK_MGMT to discard the queue will fail if the
>> card is busy, so be sure to wait by employing mmc_poll_for_busy().
> Doesn't the Task Discard Sequence expects you to check CQDPT[i]==1
> before sending MMC_CMDQ_TASK_MGMT to discard task id i?

We do not clear individual tasks. Instead the MMC_CMDQ_TASK_MGMT is
sent with the op-code to "discard entire queue", which will also
work even if the queue is empty. Refer JESD84-B51A,
6.6.39.6 CMDQ_TASK_MGMT and Table 43 — Task Management op-codes.

>
> Thanks,
> Avri
>
>>
>> Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
>> Cc: [email protected]
>> Signed-off-by: Adrian Hunter <[email protected]>
>> ---
>> drivers/mmc/core/core.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index
>> 3d3e0ca52614..befde2bd26d3 100644
>> --- a/drivers/mmc/core/core.c
>> +++ b/drivers/mmc/core/core.c
>> @@ -553,6 +553,8 @@ int mmc_cqe_recovery(struct mmc_host *host)
>> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
>> mmc_wait_for_cmd(host, &cmd, 0);
>>
>> + mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT,
>> true,
>> + MMC_BUSY_IO);
>> +
>> memset(&cmd, 0, sizeof(cmd));
>> cmd.opcode = MMC_CMDQ_TASK_MGMT;
>> cmd.arg = 1; /* Discard entire queue */
>> --
>> 2.34.1
>

2023-11-06 06:39:18

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery

On 3/11/23 12:10, Avri Altman wrote:
> Does the double "recovery" in the subject intentional?

No, must be an echo in here

>
> Thanks,
> Avri
>
>> Hi
>>
>> Some issues have been found with CQE error recovery. Here are some fixes.
>>
>> As of V2, the alternative implementation for the patch from Kornel Dulęba:
>>
>> https://lore.kernel.org/linux-mmc/e7c12e07-7540-47ea-8891-
>> [email protected]/T/#u
>>
>> is now included, see patch 6 "mmc: cqhci: Fix task clearing in CQE error
>> recovery")
>>
>> Please also note ->post_disable() seems to be missing from
>> cqhci_recovery_start(). It would be good if ->post_disable() users could
>> check if this needs attention.
>>
>>
>> Changes in V2:
>>
>> mmc: cqhci: Fix task clearing in CQE error recovery
>> New patch
>>
>> mmc: cqhci: Warn of halt or task clear failure
>> Add fixes and stable tags
>>
>>
>> Adrian Hunter (6):
>> mmc: block: Do not lose cache flush during CQE error recovery
>> mmc: cqhci: Increase recovery halt timeout
>> mmc: block: Be sure to wait while busy in CQE error recovery
>> mmc: block: Retry commands in CQE error recovery
>> mmc: cqhci: Warn of halt or task clear failure
>> mmc: cqhci: Fix task clearing in CQE error recovery
>>
>> drivers/mmc/core/block.c | 2 ++
>> drivers/mmc/core/core.c | 9 +++++++--
>> drivers/mmc/host/cqhci-core.c | 44 +++++++++++++++++++++-----------------
>> -----
>> 3 files changed, 31 insertions(+), 24 deletions(-)
>>
>>
>> Regards
>> Adrian

2023-11-06 07:38:06

by Avri Altman

[permalink] [raw]
Subject: RE: [PATCH V2 4/6] mmc: block: Retry commands in CQE error recovery

> It is important that MMC_CMDQ_TASK_MGMT command to discard the
> queue is successful because otherwise a subsequent reset might fail to flush
> the cache first. Retry it and the previous STOP command.
>
> Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
> Cc: [email protected]
> Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Avri Altman <[email protected]>

> ---
> drivers/mmc/core/core.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index
> befde2bd26d3..a8c17b4cd737 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -551,7 +551,7 @@ int mmc_cqe_recovery(struct mmc_host *host)
> cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
> cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
> - mmc_wait_for_cmd(host, &cmd, 0);
> + mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
>
> mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true,
> MMC_BUSY_IO);
>
> @@ -561,10 +561,13 @@ int mmc_cqe_recovery(struct mmc_host *host)
> cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
> cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
> cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
> - err = mmc_wait_for_cmd(host, &cmd, 0);
> + err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
>
> host->cqe_ops->cqe_recovery_finish(host);
>
> + if (err)
> + err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
> +
> mmc_retune_release(host);
>
> return err;
> --
> 2.34.1

2023-11-06 07:38:50

by Avri Altman

[permalink] [raw]
Subject: RE: [PATCH V2 5/6] mmc: cqhci: Warn of halt or task clear failure

> A correctly operating controller should successfully halt and clear tasks.
> Failure may result in errors elsewhere, so promote messages from debug to
> warnings.
>
> Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled
> host")
> Cc: [email protected]
> Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Avri Altman <[email protected]>

> ---
> drivers/mmc/host/cqhci-core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
> index 15f5a069af1f..948799a0980c 100644
> --- a/drivers/mmc/host/cqhci-core.c
> +++ b/drivers/mmc/host/cqhci-core.c
> @@ -942,8 +942,8 @@ static bool cqhci_clear_all_tasks(struct mmc_host
> *mmc, unsigned int timeout)
> ret = cqhci_tasks_cleared(cq_host);
>
> if (!ret)
> - pr_debug("%s: cqhci: Failed to clear tasks\n",
> - mmc_hostname(mmc));
> + pr_warn("%s: cqhci: Failed to clear tasks\n",
> + mmc_hostname(mmc));
>
> return ret;
> }
> @@ -976,7 +976,7 @@ static bool cqhci_halt(struct mmc_host *mmc,
> unsigned int timeout)
> ret = cqhci_halted(cq_host);
>
> if (!ret)
> - pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
> + pr_warn("%s: cqhci: Failed to halt\n",
> + mmc_hostname(mmc));
>
> return ret;
> }
> --
> 2.34.1

2023-11-15 15:53:38

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH V2 0/6] mmc: block: Fixes for CQE error recovery recovery

On Fri, 3 Nov 2023 at 09:48, Adrian Hunter <[email protected]> wrote:
>
> Hi
>
> Some issues have been found with CQE error recovery. Here are some fixes.
>
> As of V2, the alternative implementation for the patch from Kornel Dulęba:
>
> https://lore.kernel.org/linux-mmc/[email protected]/T/#u
>
> is now included, see patch 6 "mmc: cqhci: Fix task clearing in CQE error
> recovery")
>
> Please also note ->post_disable() seems to be missing from
> cqhci_recovery_start(). It would be good if ->post_disable()
> users could check if this needs attention.
>
>
> Changes in V2:
>
> mmc: cqhci: Fix task clearing in CQE error recovery
> New patch
>
> mmc: cqhci: Warn of halt or task clear failure
> Add fixes and stable tags
>
>
> Adrian Hunter (6):
> mmc: block: Do not lose cache flush during CQE error recovery
> mmc: cqhci: Increase recovery halt timeout
> mmc: block: Be sure to wait while busy in CQE error recovery
> mmc: block: Retry commands in CQE error recovery
> mmc: cqhci: Warn of halt or task clear failure
> mmc: cqhci: Fix task clearing in CQE error recovery
>
> drivers/mmc/core/block.c | 2 ++
> drivers/mmc/core/core.c | 9 +++++++--
> drivers/mmc/host/cqhci-core.c | 44 +++++++++++++++++++++----------------------
> 3 files changed, 31 insertions(+), 24 deletions(-)
>
>

Applied for fixes, thanks!

Kind regards
Uffe