2023-11-01 07:38:52

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 0/5] mmc: block: Fixes for CQE error recovery recovery

Hi

Some issues have been found with CQE error recovery. Here are some fixes.

There is also a patch from Kornel Dulęba:

https://lore.kernel.org/linux-mmc/[email protected]/T/#u

which is still under review.

Please also note ->post_disable() seems to be missing from
cqhci_recovery_start(). It would be good if ->post_disable()
users could check if this needs attention.


Adrian Hunter (5):
mmc: block: Do not lose cache flush during CQE error recovery
mmc: cqhci: Increase recovery halt timeout
mmc: block: Be sure to wait while busy in CQE error recovery
mmc: block: Retry commands in CQE error recovery
mmc: cqhci: Warn of halt or task clear failure

drivers/mmc/core/block.c | 2 ++
drivers/mmc/core/core.c | 9 +++++++--
drivers/mmc/host/cqhci-core.c | 12 ++++++------
3 files changed, 15 insertions(+), 8 deletions(-)


Regards
Adrian


2023-11-01 07:39:01

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 1/5] mmc: block: Do not lose cache flush during CQE error recovery

During CQE error recovery, error-free data commands get requeued if there
is any data left to transfer, but non-data commands are completed even
though they have not been processed. Requeue them instead.

Note the only non-data command is cache flush, which would have resulted in
a cache flush being lost if it was queued at the time of CQE recovery.

Fixes: 1e8e55b67030 ("mmc: block: Add CQE support")
Cc: [email protected]
Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/core/block.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 3a8f27c3e310..4a32b756b7d8 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1482,6 +1482,8 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
blk_mq_requeue_request(req, true);
else
__blk_mq_end_request(req, BLK_STS_OK);
+ } else if (mq->in_recovery) {
+ blk_mq_requeue_request(req, true);
} else {
blk_mq_end_request(req, BLK_STS_OK);
}
--
2.34.1

2023-11-01 07:39:11

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 2/5] mmc: cqhci: Increase recovery halt timeout

Failing to halt complicates the recovery. Additionally, unless the card or
controller are stuck, which is expected to be very rare, then the halt
should succeed, so it is better to wait. Set a large timeout.

Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host")
Cc: [email protected]
Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/host/cqhci-core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index b3d7d6d8d654..15f5a069af1f 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -984,10 +984,10 @@ static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout)
/*
* After halting we expect to be able to use the command line. We interpret the
* failure to halt to mean the data lines might still be in use (and the upper
- * layers will need to send a STOP command), so we set the timeout based on a
- * generous command timeout.
+ * layers will need to send a STOP command), however failing to halt complicates
+ * the recovery, so set a timeout that would reasonably allow I/O to complete.
*/
-#define CQHCI_START_HALT_TIMEOUT 5
+#define CQHCI_START_HALT_TIMEOUT 500

static void cqhci_recovery_start(struct mmc_host *mmc)
{
--
2.34.1

2023-11-01 07:39:17

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 3/5] mmc: block: Be sure to wait while busy in CQE error recovery

STOP command does not guarantee to wait while busy, but subsequent command
MMC_CMDQ_TASK_MGMT to discard the queue will fail if the card is busy, so
be sure to wait by employing mmc_poll_for_busy().

Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
Cc: [email protected]
Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/core/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 3d3e0ca52614..befde2bd26d3 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -553,6 +553,8 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
mmc_wait_for_cmd(host, &cmd, 0);

+ mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true, MMC_BUSY_IO);
+
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = MMC_CMDQ_TASK_MGMT;
cmd.arg = 1; /* Discard entire queue */
--
2.34.1

2023-11-01 07:40:47

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 4/5] mmc: block: Retry commands in CQE error recovery

It is important that MMC_CMDQ_TASK_MGMT command to discard the queue is
successful because otherwise a subsequent reset might fail to flush the
cache first. Retry it and the previous STOP command.

Fixes: 72a5af554df8 ("mmc: core: Add support for handling CQE requests")
Cc: [email protected]
Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/core/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index befde2bd26d3..a8c17b4cd737 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -551,7 +551,7 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
- mmc_wait_for_cmd(host, &cmd, 0);
+ mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);

mmc_poll_for_busy(host->card, MMC_CQE_RECOVERY_TIMEOUT, true, MMC_BUSY_IO);

@@ -561,10 +561,13 @@ int mmc_cqe_recovery(struct mmc_host *host)
cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
cmd.flags &= ~MMC_RSP_CRC; /* Ignore CRC */
cmd.busy_timeout = MMC_CQE_RECOVERY_TIMEOUT;
- err = mmc_wait_for_cmd(host, &cmd, 0);
+ err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);

host->cqe_ops->cqe_recovery_finish(host);

+ if (err)
+ err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
+
mmc_retune_release(host);

return err;
--
2.34.1

2023-11-01 07:40:49

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 5/5] mmc: cqhci: Warn of halt or task clear failure

A correctly operating controller should successfully halt and clear tasks.
Failure may result in errors elsewhere, so promote messages from debug to
warnings.

Signed-off-by: Adrian Hunter <[email protected]>
---
drivers/mmc/host/cqhci-core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index 15f5a069af1f..948799a0980c 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -942,8 +942,8 @@ static bool cqhci_clear_all_tasks(struct mmc_host *mmc, unsigned int timeout)
ret = cqhci_tasks_cleared(cq_host);

if (!ret)
- pr_debug("%s: cqhci: Failed to clear tasks\n",
- mmc_hostname(mmc));
+ pr_warn("%s: cqhci: Failed to clear tasks\n",
+ mmc_hostname(mmc));

return ret;
}
@@ -976,7 +976,7 @@ static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout)
ret = cqhci_halted(cq_host);

if (!ret)
- pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
+ pr_warn("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));

return ret;
}
--
2.34.1