2011-05-28 23:09:34

by Charles Hannum

[permalink] [raw]
Subject: [RFC] sdhci: timeouts

Between all the devices with special ?quirks? to use
SDHCI_QUIRK_BROKEN_TIMEOUT_VAL and
SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK, and the 19200 Google hits for
?linux sdhci timeout?, it sure seems like there's a problem there
somewhere. Having been bitten by it on my own Dell laptop, I went
poking, and found:

1) The second term of the timeout calculation (based on tacc_clks) is
totally bogus. It's dividing a whole number of SDCLK cycles by the
host clock frequency, but is expecting to get microseconds. See
attached patch sdhci-timeout-clks.diff.

2) The SDHCI spec is very specific that in the presence of both a
Transfer Complete and a Data Timeout Error, that the Transfer Complete
takes precedence. This is documented under the definition of the
Transfer Complete bit (page 53 of the SDHCI 2.0 spec). See attached
patch sdhci-timeout-int.diff.

3) There's a lot of folklore about buggy clocks on various chips, but
no hard data. I always hate this kind of folklore. I found it
helpful to actually measure the timeout and see if it was what we
expected. Ultimately this proved that the controllers in my machines
were in fact delivering timeouts pretty much exactly as expected. See
attached patch sdhci-timeout-log.diff (depends on the previous two
diffs); it outputs message of the form:

May 28 18:53:01 lop-nor kernel: [24056.231401] sdhci: timeout,
requested 508400484ns actual 510006113ns, TMCLK configured 33000
estimated 32896
May 28 18:53:33 lop-nor kernel: [24088.002670] sdhci: timeout,
requested 508400484ns actual 510005815ns, TMCLK configured 33000
estimated 32896
May 28 18:55:29 lop-nor kernel: [24204.139937] sdhci: timeout,
requested 508400484ns actual 500006027ns, TMCLK configured 33000
estimated 33554
May 28 18:55:29 lop-nor kernel: [24204.887654] sdhci: timeout,
requested 508400484ns actual 500006025ns, TMCLK configured 33000
estimated 33554

4) Ultimately I found that some SDHC cards just seem to take a good
long time to respond. I ended up increasing the write timeout to 1s
and I've encountered 0 problems since then. Since this is only an
error condition, and therefore hardly of any performance concerns, I
suggest it may be a good idea to do this in general. See attached
patch sdhci-timeout-limit.diff


Attachments:
sdhci-timeout-clks.diff (440.00 B)
sdhci-timeout-int.diff (430.00 B)
sdhci-timeout-log.diff (2.48 kB)
sdhci-timeout-limit.diff (352.00 B)
Download all attachments

2011-05-28 23:17:53

by Charles Hannum

[permalink] [raw]
Subject: Re: [RFC] sdhci: timeouts

On Sat, May 28, 2011 at 19:09, Charles Hannum <[email protected]> wrote:
> 4) Ultimately I found that some SDHC cards just seem to take a good
> long time to respond. ?I ended up increasing the write timeout to 1s
> and I've encountered 0 problems since then. ?Since this is only an
> error condition, and therefore hardly of any performance concerns, I
> suggest it may be a good idea to do this in general. ?See attached
> patch sdhci-timeout-limit.diff

BTW, I should mention that, with a TMCLK of 33000 (very common,
because it's just the PCI bus clock), increasing the limit from 300000
to 1000000 us actually increases the ?counter? value by exactly 1
(from 11 to 12). This is probably why the older controller-specific
hack of simply incrementing the counter worked.