2008-10-02 08:18:13

by Pierre Ossman

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Tue, 23 Sep 2008 23:24:59 +0200
Luca Tettamanti <[email protected]> wrote:

> Hi,
> I'm seeing timeout errors when transfering "big" (over 1MB or so) files to a SD
> card (small files are ok):
>
> [ 251.956666] mmcblk0: error -110 transferring data
> [ 251.979810] end_request: I/O error, dev mmcblk0, sector 1572758

Looks like the card died on you there. Did you build the kernel
yourself? If so, it would be helpful if you could enable
CONFIG_MMC_DEBUG and give me a complete dmesg dump from when this
happens.

> kernel in use is 2.6.27-rc6, and this is the controller:
>
> 09:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
> 09:01.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12)
> 09:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
> 09:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)
>
> Googling around I found a patch[1] that introduced a quirk to address an
> off-by-one issue for the timeout value in certain controllers.
>
> With this patch:
>

The patch doubles the timeout, so something is clearly broken if that
solves the issue. I'd like to identify if it's the driver, controller
or card first though.

Are you experiencing this with just this card, or with all?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


2008-10-02 09:12:11

by Luca Tettamanti

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Thu, Oct 2, 2008 at 10:17 AM, Pierre Ossman <[email protected]> wrote:
> On Tue, 23 Sep 2008 23:24:59 +0200
> Luca Tettamanti <[email protected]> wrote:
>
>> Hi,
>> I'm seeing timeout errors when transfering "big" (over 1MB or so) files to a SD
>> card (small files are ok):
>>
>> [ 251.956666] mmcblk0: error -110 transferring data
>> [ 251.979810] end_request: I/O error, dev mmcblk0, sector 1572758
>
> Looks like the card died on you there. Did you build the kernel
> yourself? If so, it would be helpful if you could enable
> CONFIG_MMC_DEBUG and give me a complete dmesg dump from when this
> happens.

Will do.

>> kernel in use is 2.6.27-rc6, and this is the controller:
>>
>> 09:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
>> 09:01.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12)
>> 09:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
>> 09:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)
>>
>> Googling around I found a patch[1] that introduced a quirk to address an
>> off-by-one issue for the timeout value in certain controllers.
>>
>> With this patch:
>
> The patch doubles the timeout, so something is clearly broken if that
> solves the issue. I'd like to identify if it's the driver, controller
> or card first though.

I was just shooting in the dark ;-) I found the patch floating around
and applied it.

> Are you experiencing this with just this card, or with all?

Actually I have only that card... I'll try to borrow another one for
testing. Btw, that card works fine under windows (on the same machine)
and my N800 is happy with it too.

Luca

2008-10-03 11:28:03

by Luca Tettamanti

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Thu, Oct 2, 2008 at 10:17 AM, Pierre Ossman <[email protected]> wrote:
> On Tue, 23 Sep 2008 23:24:59 +0200
> Luca Tettamanti <[email protected]> wrote:
>
>> Hi,
>> I'm seeing timeout errors when transfering "big" (over 1MB or so) files to a SD
>> card (small files are ok):
>>
>> [ 251.956666] mmcblk0: error -110 transferring data
>> [ 251.979810] end_request: I/O error, dev mmcblk0, sector 1572758
>
> Looks like the card died on you there. Did you build the kernel
> yourself? If so, it would be helpful if you could enable
> CONFIG_MMC_DEBUG and give me a complete dmesg dump from when this
> happens.

Hum, cannot reproduce (but it was consistently failing when I tested
the patch... the only difference is a mkfs in between). I just got a
few retries:

[ 187.158063] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000040
[ 187.656190] mmc0: clock 0Hz busmode 1 powermode 1 cs 0 Vdd 21 width
0 timing 0
[ 187.658366] mmc0: clock 128906Hz busmode 1 powermode 2 cs 0 Vdd 21
width 0 timing 0
[ 187.660500] mmc0: clock 128906Hz busmode 1 powermode 2 cs 1 Vdd 21
width 0 timing 0
[ 187.661634] mmc0: starting CMD0 arg 00000000 flags 000000c0
[ 187.662265] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
[ 187.662283] mmc0: req done (CMD0): 0: 00000000 00000000 00000000 00000000
[ 187.663461] mmc0: clock 128906Hz busmode 1 powermode 2 cs 0 Vdd 21
width 0 timing 0
[ 187.664473] mmc0: starting CMD8 arg 000001aa flags 000002f5
[ 187.665423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
[ 187.665439] mmc0: req done (CMD8): 0: 000001aa 00000000 00000000 00000000
[ 187.665474] mmc0: starting CMD5 arg 00000000 flags 000002e1
[ 187.666423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000
[ 187.668435] mmc0: req failed (CMD5): -110, retrying...
[ 187.669423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000
[ 187.671424] mmc0: req failed (CMD5): -110, retrying...
[ 187.672422] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000
[ 187.674423] mmc0: req failed (CMD5): -110, retrying...
[ 187.675425] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000
[ 187.677436] mmc0: req done (CMD5): -110: 00000000 00000000 00000000 00000000
[ 187.677683] mmc0: starting CMD55 arg 00000000 flags 000000f5
[ 187.678754] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
[ 187.678770] mmc0: req done (CMD55): 0: 00000120 00000000 00000000 00000000
[ 187.678808] mmc0: starting CMD41 arg 00000000 flags 000000e1
[ 187.679877] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
[ 187.679893] mmc0: req done (CMD41): 0: 00ff8000 00000000 00000000 00000000
[ 187.679995] mmc0: clock 128906Hz busmode 1 powermode 2 cs 0 Vdd 20
width 0 timing 0
[ 187.680021] mmc0: clock 128906Hz busmode 1 powermode 2 cs 1 Vdd 20
width 0 timing 0
[cut 80MB of logs]

Is it possible that the error was generated by a faulty flash block?

Luca

2008-10-12 08:52:39

by Pierre Ossman

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Fri, 3 Oct 2008 13:27:52 +0200
"Luca Tettamanti" <[email protected]> wrote:

>
> Hum, cannot reproduce (but it was consistently failing when I tested
> the patch... the only difference is a mkfs in between). I just got a
> few retries:

A silly question, but did you also try disabling the debugging? Since
this is a timing issue, the debugging output could be just enough to
make the problem go away.

>
> Is it possible that the error was generated by a faulty flash block?
>

Maybe. I've never gotten a bad block so I have no experience in how it
behaves.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.

2008-10-14 09:34:38

by Luca Tettamanti

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Sun, Oct 12, 2008 at 10:52 AM, Pierre Ossman <[email protected]> wrote:
> On Fri, 3 Oct 2008 13:27:52 +0200
> "Luca Tettamanti" <[email protected]> wrote:
>>
>> Hum, cannot reproduce (but it was consistently failing when I tested
>> the patch... the only difference is a mkfs in between). I just got a
>> few retries:
>
> A silly question, but did you also try disabling the debugging? Since
> this is a timing issue, the debugging output could be just enough to
> make the problem go away.

Ok, I managed to reproduce without the debugging, but this time it
took over a couple of GB before it started failing. Block numbers
differ from the previous failures (so maybe it's not a cluster of
broken blocks).
Will go back and test with debugging enabled.

Luca

2008-10-14 21:13:51

by Luca Tettamanti

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Tue, Oct 14, 2008 at 11:34 AM, Luca Tettamanti <[email protected]> wrote:
> On Sun, Oct 12, 2008 at 10:52 AM, Pierre Ossman <[email protected]> wrote:
>> On Fri, 3 Oct 2008 13:27:52 +0200
>> "Luca Tettamanti" <[email protected]> wrote:
>>>
>>> Hum, cannot reproduce (but it was consistently failing when I tested
>>> the patch... the only difference is a mkfs in between). I just got a
>>> few retries:
>>
>> A silly question, but did you also try disabling the debugging? Since
>> this is a timing issue, the debugging output could be just enough to
>> make the problem go away.
>
> Ok, I managed to reproduce without the debugging, but this time it
> took over a couple of GB before it started failing. Block numbers
> differ from the previous failures (so maybe it's not a cluster of
> broken blocks).
> Will go back and test with debugging enabled.

I finally managed to capture a failure with debug enabled, I'm
attaching the log.
To recap:
- heavy write loads sometimes result in a timeout
- reads seem unaffected (I read the card multiple times without errors)
- doubling the timeout (as per my first patch) makes the timeout disappear
- windows doesn't suffer from the issue

I've tested two other cards that do not show the problem, but they are
not "high speed":

mmc0: new high speed SD card at address 0002 (troubles)
vs:
mmc0: new SD card at address e624 (OK)

Luca

2008-10-14 21:21:55

by Luca Tettamanti

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Tue, Oct 14, 2008 at 11:13 PM, Luca Tettamanti <[email protected]> wrote:
> I finally managed to capture a failure with debug enabled, I'm
> attaching the log.

...for real this time :-)

Luca


Attachments:
(No filename) (198.00 B)
dmesg.txt.bz2 (262.22 kB)
Download all attachments

2008-10-18 20:47:06

by Pierre Ossman

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Tue, 14 Oct 2008 23:13:39 +0200
"Luca Tettamanti" <[email protected]> wrote:

>
> I finally managed to capture a failure with debug enabled, I'm
> attaching the log.

It seems to be a transfer error at least as it is in a completely
different area of the card compared to the first mail you sent.

> To recap:
> - heavy write loads sometimes result in a timeout
> - reads seem unaffected (I read the card multiple times without errors)

Looking at the dump, the controller seems to be doing the correct
thing. The timeout mandated by the SD specification is 250 ms, and that
time has passed if you look at the timestamps. If you look at some
other writes further down, the card takes around 220 ms for the writes.
IOW, it would seem that this card is cutting it a bit close and
sometimes goes over the alloted time.

> - doubling the timeout (as per my first patch) makes the timeout disappear

Since it is the card that's the problem here, not the controller, we'd
need to fix this in the core. Unfortunately the sdhci hardware is a bit
inflexible so any change I make will result in a doubling of the
timeout.

Could you try modifying line 283 of drivers/mmc/core/core.c where it
says "limit_us = 250000;" to 300000 instead?

> - windows doesn't suffer from the issue

Windows doesn't do proper handling of timeouts. It just sets the
maximum timeout and hopes for the best.

>
> I've tested two other cards that do not show the problem, but they are
> not "high speed":
>
> mmc0: new high speed SD card at address 0002 (troubles)
> vs:
> mmc0: new SD card at address e624 (OK)
>

What's the vendor and model of the failing card?

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.

2008-10-20 12:20:25

by Luca Tettamanti

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Sat, Oct 18, 2008 at 10:46 PM, Pierre Ossman <[email protected]> wrote:
> On Tue, 14 Oct 2008 23:13:39 +0200
> "Luca Tettamanti" <[email protected]> wrote:
>>
>> I finally managed to capture a failure with debug enabled, I'm
>> attaching the log.
>
> It seems to be a transfer error at least as it is in a completely
> different area of the card compared to the first mail you sent.
>
>> To recap:
>> - heavy write loads sometimes result in a timeout
>> - reads seem unaffected (I read the card multiple times without errors)
>
> Looking at the dump, the controller seems to be doing the correct
> thing. The timeout mandated by the SD specification is 250 ms, and that
> time has passed if you look at the timestamps. If you look at some
> other writes further down, the card takes around 220 ms for the writes.
> IOW, it would seem that this card is cutting it a bit close and
> sometimes goes over the alloted time.
>
>> - doubling the timeout (as per my first patch) makes the timeout disappear
>
> Since it is the card that's the problem here, not the controller, we'd
> need to fix this in the core. Unfortunately the sdhci hardware is a bit
> inflexible so any change I make will result in a doubling of the
> timeout.
>
> Could you try modifying line 283 of drivers/mmc/core/core.c where it
> says "limit_us = 250000;" to 300000 instead?

300ms seems to be enough.

>> I've tested two other cards that do not show the problem, but they are
>> not "high speed":
>>
>> mmc0: new high speed SD card at address 0002 (troubles)
>> vs:
>> mmc0: new SD card at address e624 (OK)
>
> What's the vendor and model of the failing card?

The card is from S3+, and it's branded "SD Excel" (el-cheapo 4GB card)

Luca

2008-10-26 11:35:52

by Pierre Ossman

[permalink] [raw]
Subject: Re: SDHCI: timeout during data transfer

On Mon, 20 Oct 2008 14:20:13 +0200
"Luca Tettamanti" <[email protected]> wrote:

> On Sat, Oct 18, 2008 at 10:46 PM, Pierre Ossman <[email protected]> wrote:
> > Could you try modifying line 283 of drivers/mmc/core/core.c where it
> > says "limit_us = 250000;" to 300000 instead?
>
> 300ms seems to be enough.
>

It might be. Your timeout will be in the neigbourhood of 500ms because
of the crappy resolution of the sdhci hardware.

Still, I'll queue up a patch for this...

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.