2008-06-23 14:11:23

by Andres Salomon

[permalink] [raw]
Subject: [PATCH] [OLPC] sdhci: add quirk for the Marvell CaFe's interrupt timeout


The CaFe chip has a hardware bug that ends up with us getting a timeout
value that's too small, causing the following sorts of problems:

[ 60.525138] mmcblk0: error -110 transferring data
[ 60.531477] end_request: I/O error, dev mmcblk0, sector 1484353
[ 60.533371] Buffer I/O error on device mmcblk0p2, logical block 181632
[ 60.533371] lost page write due to I/O error on mmcblk0p2

Presumably this is an off-by-one error in the hardware. Incrementing
the timeout count value that we stuff into the TIMEOUT_CONTROL register
gets us a value that works. This bug was originally discovered by
Pierre Ossman, I believe.

[thanks to Robert Millan for proving that this was still a problem]

Signed-off-by: Andres Salomon <[email protected]>
---
drivers/mmc/host/sdhci.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 5b74c8c..2b3f06a 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -57,6 +57,8 @@ static unsigned int debug_quirks = 0;
#define SDHCI_QUIRK_RESET_AFTER_REQUEST (1<<8)
/* Controller needs voltage and power writes to happen separately */
#define SDHCI_QUIRK_NO_SIMULT_VDD_AND_POWER (1<<9)
+/* Controller has an off-by-one issue with timeout value */
+#define SDHCI_QUIRK_INCR_TIMEOUT_CONTROL (1<<10)

static const struct pci_device_id pci_ids[] __devinitdata = {
{
@@ -134,7 +136,8 @@ static const struct pci_device_id pci_ids[] __devinitdata = {
.device = PCI_DEVICE_ID_MARVELL_CAFE_SD,
.subvendor = PCI_ANY_ID,
.subdevice = PCI_ANY_ID,
- .driver_data = SDHCI_QUIRK_NO_SIMULT_VDD_AND_POWER,
+ .driver_data = SDHCI_QUIRK_NO_SIMULT_VDD_AND_POWER |
+ SDHCI_QUIRK_INCR_TIMEOUT_CONTROL,
},

{
@@ -479,6 +482,13 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_data *data)
break;
}

+ /*
+ * Compensate for an off-by-one error in the CaFe hardware; otherwise,
+ * a too-small count gives us interrupt timeouts.
+ */
+ if ((host->chip->quirks & SDHCI_QUIRK_INCR_TIMEOUT_CONTROL))
+ count++;
+
if (count >= 0xF) {
printk(KERN_WARNING "%s: Too large timeout requested!\n",
mmc_hostname(host->mmc));
--
1.5.5.3


2008-06-24 00:05:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [OLPC] sdhci: add quirk for the Marvell CaFe's interrupt timeout

On Mon, 23 Jun 2008 10:13:52 -0400
Andres Salomon <[email protected]> wrote:

>
> The CaFe chip has a hardware bug that ends up with us getting a timeout
> value that's too small, causing the following sorts of problems:
>
> [ 60.525138] mmcblk0: error -110 transferring data
> [ 60.531477] end_request: I/O error, dev mmcblk0, sector 1484353
> [ 60.533371] Buffer I/O error on device mmcblk0p2, logical block 181632
> [ 60.533371] lost page write due to I/O error on mmcblk0p2
>
> Presumably this is an off-by-one error in the hardware. Incrementing
> the timeout count value that we stuff into the TIMEOUT_CONTROL register
> gets us a value that works. This bug was originally discovered by
> Pierre Ossman, I believe.
>
> [thanks to Robert Millan for proving that this was still a problem]
>
> Signed-off-by: Andres Salomon <[email protected]>
> ---
> drivers/mmc/host/sdhci.c | 12 +++++++++++-
> 1 files changed, 11 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 5b74c8c..2b3f06a 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -57,6 +57,8 @@ static unsigned int debug_quirks = 0;
> #define SDHCI_QUIRK_RESET_AFTER_REQUEST (1<<8)
> /* Controller needs voltage and power writes to happen separately */
> #define SDHCI_QUIRK_NO_SIMULT_VDD_AND_POWER (1<<9)
> +/* Controller has an off-by-one issue with timeout value */
> +#define SDHCI_QUIRK_INCR_TIMEOUT_CONTROL (1<<10)
>
> static const struct pci_device_id pci_ids[] __devinitdata = {
> {
> @@ -134,7 +136,8 @@ static const struct pci_device_id pci_ids[] __devinitdata = {
> .device = PCI_DEVICE_ID_MARVELL_CAFE_SD,
> .subvendor = PCI_ANY_ID,
> .subdevice = PCI_ANY_ID,
> - .driver_data = SDHCI_QUIRK_NO_SIMULT_VDD_AND_POWER,
> + .driver_data = SDHCI_QUIRK_NO_SIMULT_VDD_AND_POWER |
> + SDHCI_QUIRK_INCR_TIMEOUT_CONTROL,
> },
>
> {
> @@ -479,6 +482,13 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_data *data)
> break;
> }
>
> + /*
> + * Compensate for an off-by-one error in the CaFe hardware; otherwise,
> + * a too-small count gives us interrupt timeouts.
> + */
> + if ((host->chip->quirks & SDHCI_QUIRK_INCR_TIMEOUT_CONTROL))
> + count++;
> +
> if (count >= 0xF) {
> printk(KERN_WARNING "%s: Too large timeout requested!\n",
> mmc_hostname(host->mmc));

This is needed in 2.6.26, I assume?

If so, I can merge it unless Pierre has objections?

And it will cause conflicts with overlapping changes in linux-next.

2008-06-24 00:09:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [OLPC] sdhci: add quirk for the Marvell CaFe's interrupt timeout

On Mon, 23 Jun 2008 17:04:49 -0700
Andrew Morton <[email protected]> wrote:

> And it will cause conflicts with overlapping changes in linux-next.


oops, I lied. The problem was that it secretly depended upon
olpc-sdhci-add-quirk-for-the-marvell-cafes-vdd-powerup-issue.patch

So if we want to fix thsi issue in 2.6.26 we need to merge both

olpc-sdhci-add-quirk-for-the-marvell-cafes-vdd-powerup-issue.patch

and

olpc-sdhci-add-quirk-for-the-marvell-cafes-interrupt-timeout.patch

yes?

2008-06-24 01:35:58

by Andres Salomon

[permalink] [raw]
Subject: Re: [PATCH] [OLPC] sdhci: add quirk for the Marvell CaFe's interrupt timeout

On Mon, 23 Jun 2008 17:08:50 -0700
Andrew Morton <[email protected]> wrote:

> On Mon, 23 Jun 2008 17:04:49 -0700
> Andrew Morton <[email protected]> wrote:
>
> > And it will cause conflicts with overlapping changes in linux-next.
>
>
> oops, I lied. The problem was that it secretly depended upon
> olpc-sdhci-add-quirk-for-the-marvell-cafes-vdd-powerup-issue.patch
>
> So if we want to fix thsi issue in 2.6.26 we need to merge both
>
> olpc-sdhci-add-quirk-for-the-marvell-cafes-vdd-powerup-issue.patch
>
> and
>
> olpc-sdhci-add-quirk-for-the-marvell-cafes-interrupt-timeout.patch
>
> yes?


Correct. I originally wasn't going to send the interrupt-timeout
patch (but was shown that the bug still existed), which is why the two
patches weren't sent as a series. Sorry!

2008-06-24 01:37:56

by Andres Salomon

[permalink] [raw]
Subject: Re: [PATCH] [OLPC] sdhci: add quirk for the Marvell CaFe's interrupt timeout

On Mon, 23 Jun 2008 17:04:49 -0700
Andrew Morton <[email protected]> wrote:

> On Mon, 23 Jun 2008 10:13:52 -0400
> Andres Salomon <[email protected]> wrote:
>
> >
> > The CaFe chip has a hardware bug that ends up with us getting a
> > timeout value that's too small, causing the following sorts of
> > problems:
> >
> > [ 60.525138] mmcblk0: error -110 transferring data
> > [ 60.531477] end_request: I/O error, dev mmcblk0, sector 1484353
> > [ 60.533371] Buffer I/O error on device mmcblk0p2, logical block
> > 181632 [ 60.533371] lost page write due to I/O error on mmcblk0p2
> >
[...]
>
> This is needed in 2.6.26, I assume?
>


Yes, please.


> If so, I can merge it unless Pierre has objections?
>
> And it will cause conflicts with overlapping changes in linux-next.
>

2008-06-27 17:30:23

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] [OLPC] sdhci: add quirk for the Marvell CaFe's interrupt timeout

On Mon, 23 Jun 2008 10:13:52 -0400
Andres Salomon <[email protected]> wrote:

>
> The CaFe chip has a hardware bug that ends up with us getting a timeout
> value that's too small, causing the following sorts of problems:
>
> [ 60.525138] mmcblk0: error -110 transferring data
> [ 60.531477] end_request: I/O error, dev mmcblk0, sector 1484353
> [ 60.533371] Buffer I/O error on device mmcblk0p2, logical block 181632
> [ 60.533371] lost page write due to I/O error on mmcblk0p2
>
> Presumably this is an off-by-one error in the hardware. Incrementing
> the timeout count value that we stuff into the TIMEOUT_CONTROL register
> gets us a value that works. This bug was originally discovered by
> Pierre Ossman, I believe.
>
> [thanks to Robert Millan for proving that this was still a problem]
>
> Signed-off-by: Andres Salomon <[email protected]>

Hmm... I'm not entirely sure about the specifics of the workaround
here. It's likely that we'll have an off-by-minus-one in another
controller, and off-by-two in a third.

Perhaps we should just have "broken timeout" and set the timeout to
0xE. It doesn't cause any side-effects except that the user will have
to wait slightly longer for requests to fail if the card has decided to
crap out.

> @@ -479,6 +482,13 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_data *data)
> break;
> }
>
> + /*
> + * Compensate for an off-by-one error in the CaFe hardware; otherwise,
> + * a too-small count gives us interrupt timeouts.
> + */

Same issue with "CaFE" as the previous patch.

--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (197.00 B)

2008-06-27 17:39:31

by Andres Salomon

[permalink] [raw]
Subject: Re: [PATCH] [OLPC] sdhci: add quirk for the Marvell CaFe's interrupt timeout

On Fri, 27 Jun 2008 19:30:01 +0200
Pierre Ossman <[email protected]> wrote:

> On Mon, 23 Jun 2008 10:13:52 -0400
> Andres Salomon <[email protected]> wrote:
>
> >
> > The CaFe chip has a hardware bug that ends up with us getting a
> > timeout value that's too small, causing the following sorts of
> > problems:
> >
> > [ 60.525138] mmcblk0: error -110 transferring data
> > [ 60.531477] end_request: I/O error, dev mmcblk0, sector 1484353
> > [ 60.533371] Buffer I/O error on device mmcblk0p2, logical block
> > 181632 [ 60.533371] lost page write due to I/O error on mmcblk0p2
> >
> > Presumably this is an off-by-one error in the hardware.
> > Incrementing the timeout count value that we stuff into the
> > TIMEOUT_CONTROL register gets us a value that works. This bug was
> > originally discovered by Pierre Ossman, I believe.
> >
> > [thanks to Robert Millan for proving that this was still a problem]
> >
> > Signed-off-by: Andres Salomon <[email protected]>
>
> Hmm... I'm not entirely sure about the specifics of the workaround
> here. It's likely that we'll have an off-by-minus-one in another
> controller, and off-by-two in a third.
>
> Perhaps we should just have "broken timeout" and set the timeout to
> 0xE. It doesn't cause any side-effects except that the user will have
> to wait slightly longer for requests to fail if the card has decided
> to crap out.
>

That would be fine. OFW actually just hardcodes the timeout to 0xc,
with Mitch citing the same logic. Just setting it to the upper bound
would certainly make it more applicable hardware other than the cafe.