2009-04-08 09:56:21

by Wolfgang Mües

[permalink] [raw]
Subject: [PATCH] mmc_spi: do propper retry managment in the block layer

From: Wolfgang Muees <[email protected]>

o This patch adds a propper retry managment for reading
and writing data blocks for mmc and mmc_spi. Blocks are
retransmitted 2 times in case of a transmission failure.
This patch was tested with induced transmission errors
by ESD pulses (and survived).

Signed-off-by: Wolfgang Muees <[email protected]>

---

diff -uprN 2_6_29_rc7_patch_wearout_speedup/drivers/mmc/card/block.c 2_6_29_rc7_patch_retries/drivers/mmc/card/block.c
--- 2_6_29_rc7_patch_wearout_speedup/drivers/mmc/card/block.c 2009-03-09 17:10:55.000000000 +0100
+++ 2_6_29_rc7_patch_retries/drivers/mmc/card/block.c 2009-04-08 11:19:55.000000000 +0200
@@ -230,12 +230,14 @@ static int mmc_blk_issue_rq(struct mmc_q
struct mmc_card *card = md->queue.card;
struct mmc_blk_request brq;
int ret = 1, disable_multi = 0;
+ int error_retry_count = 3;

mmc_claim_host(card->host);

do {
struct mmc_command cmd;
u32 readcmd, writecmd, status = 0;
+ int error;

memset(&brq, 0, sizeof(struct mmc_blk_request));
brq.mrq.cmd = &brq.cmd;
@@ -252,9 +254,9 @@ static int mmc_blk_issue_rq(struct mmc_q
brq.data.blocks = req->nr_sectors;

/*
- * After a read error, we redo the request one sector at a time
- * in order to accurately determine which sectors can be read
- * successfully.
+ * After a transmission error, we redo the request one sector
+ * at a time in order to accurately determine which sectors
+ * can be read/written successfully.
*/
if (disable_multi && brq.data.blocks > 1)
brq.data.blocks = 1;
@@ -262,10 +264,12 @@ static int mmc_blk_issue_rq(struct mmc_q
if (brq.data.blocks > 1) {
/* SPI multiblock writes terminate using a special
* token, not a STOP_TRANSMISSION request.
- */
- if (!mmc_host_is_spi(card->host)
- || rq_data_dir(req) == READ)
- brq.mrq.stop = &brq.stop;
+ * Here, this request is set for all types of
+ * hosts, so that we can use it in the lower
+ * layers if the data transfer stage has failed
+ * and the card is not able to accept the token.
+ */
+ brq.mrq.stop = &brq.stop;
readcmd = MMC_READ_MULTIPLE_BLOCK;
writecmd = MMC_WRITE_MULTIPLE_BLOCK;
} else {
@@ -312,20 +316,44 @@ static int mmc_blk_issue_rq(struct mmc_q

mmc_queue_bounce_post(mq);

- /*
- * Check for errors here, but don't jump to cmd_err
- * until later as we need to wait for the card to leave
- * programming mode even when things go wrong.
+ /* Check for all sort of errors which might be
+ * a transmission fault.
*/
- if (brq.cmd.error || brq.data.error || brq.stop.error) {
- if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
- /* Redo read one sector at a time */
- printk(KERN_WARNING "%s: retrying using single "
- "block read\n", req->rq_disk->disk_name);
- disable_multi = 1;
- continue;
- }
+ error = brq.cmd.error;
+ if (!error)
+ error = brq.data.error;
+ if (!error)
+ error = brq.stop.error;
+ /* Do retries for all sort of transmission errors */
+ switch (error) {
+ /* Card has not understand command. As we do only send
+ * valid commands, this must be a transmission error.
+ */
+ case -EPROTO:
+ /* CRC error */
+ case -EILSEQ:
+ /* Timeout, no answer. If we repeat the command,
+ * we do an overall slowdown and have good chances to
+ * complete the transfer.
+ */
+ case -ETIMEDOUT:
+ /* Invalid response. This is most likely a transmission
+ * error from card to host.
+ */
+ case -EINVAL:
+ error_retry_count -= 1;
+ disable_multi = 1;
+ printk(KERN_WARNING "%s: transmission error %d\n",
+ req->rq_disk->disk_name, error);
status = get_card_status(card, req);
+ if (error_retry_count > 0)
+ continue;
+ break;
+ case 0: /* no error: reset retry count for next block. */
+ error_retry_count = 3;
+ break;
+ default: /* unknown error: do error handling below */
+ break;
}

if (brq.cmd.error) {
@@ -384,18 +412,15 @@ static int mmc_blk_issue_rq(struct mmc_q
}

if (brq.cmd.error || brq.stop.error || brq.data.error) {
- if (rq_data_dir(req) == READ) {
- /*
- * After an error, we redo I/O one sector at a
- * time, so we only reach here after trying to
- * read a single sector.
- */
- spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, -EIO, brq.data.blksz);
- spin_unlock_irq(&md->lock);
- continue;
- }
- goto cmd_err;
+ /*
+ * After an error, we redo I/O one sector at a
+ * time, so we only reach here after trying to
+ * read/write a single sector.
+ */
+ spin_lock_irq(&md->lock);
+ ret = __blk_end_request(req, -EIO, brq.data.blksz);
+ spin_unlock_irq(&md->lock);
+ continue;
}

/*
diff -uprN 2_6_29_rc7_patch_wearout_speedup/drivers/mmc/host/mmc_spi.c 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c
--- 2_6_29_rc7_patch_wearout_speedup/drivers/mmc/host/mmc_spi.c 2009-04-08 11:11:20.000000000 +0200
+++ 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c 2009-04-08 11:25:04.000000000 +0200
@@ -742,22 +742,24 @@ mmc_spi_writeblock(struct mmc_spi_host *
if (status != 0) {
dev_dbg(&spi->dev, "write error %02x (%d)\n",
scratch->status[0], status);
- return status;
+ } else {
+ t->tx_buf += t->len;
+ if (host->dma_dev)
+ t->tx_dma += t->len;
}

- t->tx_buf += t->len;
- if (host->dma_dev)
- t->tx_dma += t->len;
-
/* Return when not busy. If we didn't collect that status yet,
* we'll need some more I/O.
*/
for (i = 4; i < sizeof(scratch->status); i++) {
/* card is non-busy if the most recent bit is 1 */
if (scratch->status[i] & 0x01)
- return 0;
+ return status;
}
- return mmc_spi_wait_unbusy(host, timeout);
+ i = mmc_spi_wait_unbusy(host, timeout);
+ if (!status)
+ status = i;
+ return status;
}

/*
@@ -1086,7 +1088,15 @@ static void mmc_spi_request(struct mmc_h
status = mmc_spi_command_send(host, mrq, mrq->cmd, mrq->data != NULL);
if (status == 0 && mrq->data) {
mmc_spi_data_do(host, mrq->cmd, mrq->data, mrq->data->blksz);
- if (mrq->stop)
+ /* filter-out the stop command for multiblock writes,
+ * only if the data stage has no transmission error.
+ * If the data stage has a transmission error, send the
+ * STOP command because there is a great chance that the
+ * SPI stop token was not accepted by the card.
+ */
+ if (mrq->stop &&
+ ((mrq->data->flags & MMC_DATA_READ)
+ || mrq->data->error))
status = mmc_spi_command_send(host, mrq, mrq->stop, 0);
else
mmc_cs_off(host);

---
regards

i. A. Wolfgang M?es
--
Auerswald GmbH & Co. KG
Hardware Development
Telefon: +49 (0)5306 9219 0
Telefax: +49 (0)5306 9219 94
E-Mail: [email protected]
Web: http://www.auerswald.de
?
--------------------------------------------------------------
Auerswald GmbH & Co. KG, Vor den Grash?fen 1, 38162 Cremlingen
Registriert beim AG Braunschweig HRA 13289
p.h.G Auerswald Gesch?ftsf?hrungsges. mbH
Registriert beim AG Braunschweig HRB 7463
Gesch?ftsf?hrer: Dipl-Ing. Gerhard Auerswald


2009-04-10 22:10:51

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] mmc_spi: do propper retry managment in the block layer

On Wed, 8 Apr 2009 10:55:53 +0100
Wolfgang Mües <[email protected]> wrote:

> From: Wolfgang Muees <[email protected]>
>
> o This patch adds a propper retry managment for reading
> and writing data blocks for mmc and mmc_spi. Blocks are
> retransmitted 2 times in case of a transmission failure.
> This patch was tested with induced transmission errors
> by ESD pulses (and survived).
>
> Signed-off-by: Wolfgang Muees <[email protected]>
>

NAK. Writes cannot be retried safely as upper layers rely on the fact
that writes fail in a linear manner (a stupid assumption IMO, but
that's the way things are).

> + /* Invalid response. This is most likely a transmission
> + * error from card to host.
> + */
> + case -EINVAL:

EINVAL is actually "host controller driver/hardware does not support
this type of request".

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (198.00 B)

2009-04-14 14:27:56

by Wolfgang Mües

[permalink] [raw]
Subject: Re: [PATCH] mmc_spi: do propper retry managment in the block layer

Hello Pierre,

Am Samstag, 11. April 2009 schrieb Pierre Ossman:
> NAK. Writes cannot be retried safely as upper layers rely on the fact
> that writes fail in a linear manner (a stupid assumption IMO, but
> that's the way things are).

Hmmm.... so this patch has to be improved. I think the (original) code in
mmc_blk_issue_rq() is somewhat questionable. As far as I understand the code,
in the case of an error, brq.data.bytes_xfered is not examined, so the code
starts over from the very first sector of the request, and rereads all
sectors (one after the another).

So if there is a non-recoverable write error, this code must stop and do not
continue to write, right?

But if there is a read error, it is OK and advisable to try to read the rest
of the sectors?

If there is a write error in the middle of a request, is it OK to do
__blk_end_request(req, 0, brq.data.bytes_xfered);
and try to retry from this point, and stop if there is a non-recoverable write
error?

> > + /* Invalid response. This is most likely a transmission
> > + * error from card to host.
> > + */
> > + case -EINVAL:
>
> EINVAL is actually "host controller driver/hardware does not support
> this type of request".

"Hardware" is including the hardware of the SD card. This should NEVER happen,
because block.c issues only valid requests, which are translated to mandatory
commands for the mmc/sd card.

So if there is an EINVAL result from the lower layers, it comes from
transmission errors (card has not understand the command, or host has not
understand the response).

If EINVAL is coming from the HOST CONTROLER, a retry will not harm anyone.

I have seen EINVAL results due to spikes on the SD lines.

Please correct me if I am wrong.

best regards
 
i. A. Wolfgang Mües
--
Auerswald GmbH & Co. KG
Hardware Development
Telefon: +49 (0)5306 9219 0
Telefax: +49 (0)5306 9219 94
E-Mail: [email protected]
Web: http://www.auerswald.de
 
--------------------------------------------------------------
Auerswald GmbH & Co. KG, Vor den Grashöfen 1, 38162 Cremlingen
Registriert beim AG Braunschweig HRA 13289
p.h.G Auerswald Geschäftsführungsges. mbH
Registriert beim AG Braunschweig HRB 7463
Geschäftsführer: Dipl-Ing. Gerhard Auerswald

2009-04-28 19:57:45

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] mmc_spi: do propper retry managment in the block layer

On Tue, 14 Apr 2009 16:27:26 +0200
Wolfgang Mües <[email protected]> wrote:

> Hello Pierre,
>
> Am Samstag, 11. April 2009 schrieb Pierre Ossman:
> > NAK. Writes cannot be retried safely as upper layers rely on the fact
> > that writes fail in a linear manner (a stupid assumption IMO, but
> > that's the way things are).
>
> Hmmm.... so this patch has to be improved. I think the (original) code in
> mmc_blk_issue_rq() is somewhat questionable. As far as I understand the code,
> in the case of an error, brq.data.bytes_xfered is not examined, so the code
> starts over from the very first sector of the request, and rereads all
> sectors (one after the another).
>

That would be a bug, yes.

> So if there is a non-recoverable write error, this code must stop and do not
> continue to write, right?

Yes.

> But if there is a read error, it is OK and advisable to try to read the rest
> of the sectors?

We're working on the assumption that reads do not modify any state in
the card, so those should be safe to retry any number of times. :)

> If there is a write error in the middle of a request, is it OK to do
> __blk_end_request(req, 0, brq.data.bytes_xfered);
> and try to retry from this point, and stop if there is a non-recoverable write
> error?

Afraid not. The reason is that bytes_xfered is not guaranteed to be
exact. It gives you a lower bound, but not an upper one.

> > > + /* Invalid response. This is most likely a transmission
> > > + * error from card to host.
> > > + */
> > > + case -EINVAL:
> >
> > EINVAL is actually "host controller driver/hardware does not support
> > this type of request".
>
> "Hardware" is including the hardware of the SD card. This should NEVER happen,
> because block.c issues only valid requests, which are translated to mandatory
> commands for the mmc/sd card.
>
> So if there is an EINVAL result from the lower layers, it comes from
> transmission errors (card has not understand the command, or host has not
> understand the response).
>
> If EINVAL is coming from the HOST CONTROLER, a retry will not harm anyone.
>
> I have seen EINVAL results due to spikes on the SD lines.
>
> Please correct me if I am wrong.
>

You're wrong in that EINVAL should not be sent for any kind of runtime
hardware errors like that. But I might have missed some instances during
reviews, so there might be drivers that give that error message for
things they shouldn't.

From include/mmc/core.h:

* EINVAL Request cannot be performed because of restrictions
* in hardware and/or the driver

I can't really see a case where EINVAL should be a transient condition,
so there shouldn't be any point retrying a request that gives that
error.

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (198.00 B)