2015-06-30 13:43:23

by Holger Schurig

[permalink] [raw]
Subject: SDHCI: mdelay() in hot path in esdhc_pltfm_set_clock looses CAN (!) frames

Hi,

I noticed in a kernel 4.0.7 that I loose CAN packets when an incoming
rsync transfer changes my eMMC or SD-Card image. I used CONFIG_FRACE
to find why this is the case, and came to this trace:

# tracer: preemptoff
#
# preemptoff latency trace v1.1.5 on 4.0.7
# --------------------------------------------------------------------
# latency: 1046 us, #756/756, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: mmcqd/0-76 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: sdhci_do_set_ios
# => ended at: sdhci_do_set_ios
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
mmcqd/0-76 2d..1 1us : _raw_spin_lock_irq <-sdhci_do_set_ios
mmcqd/0-76 2d..2 2us : esdhc_writeb_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 4us : esdhc_pltfm_set_bus_width <-sdhci_do_set_ios
mmcqd/0-76 2d..2 5us : l2c210_sync <-esdhc_pltfm_set_bus_width
mmcqd/0-76 2d..2 7us : esdhc_writeb_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 9us : l2c210_sync <-esdhc_writeb_le
mmcqd/0-76 2d..2 10us : esdhc_readw_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 12us : esdhc_writew_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 14us : l2c210_sync <-esdhc_writew_le
mmcqd/0-76 2d..2 16us : l2c210_sync <-esdhc_writew_le
mmcqd/0-76 2d..2 18us : esdhc_readw_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 19us : esdhc_writew_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 21us : l2c210_sync <-esdhc_writew_le
mmcqd/0-76 2d..2 22us : esdhc_set_uhs_signaling <-sdhci_do_set_ios
mmcqd/0-76 2d..2 24us : pinctrl_select_state <-esdhc_set_uhs_signaling
mmcqd/0-76 2d..2 26us : esdhc_readl_le <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 28us : esdhc_writel_le <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 29us : l2c210_sync <-esdhc_writel_le
mmcqd/0-76 2d..2 31us : esdhc_readl_le <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 33us : esdhc_writel_le <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 34us : l2c210_sync <-esdhc_writel_le
mmcqd/0-76 2d..2 36us : l2c210_sync <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 37us : __timer_const_udelay <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 39us : __timer_delay <-__timer_const_udelay
mmcqd/0-76 2d..2 40us : read_current_timer <-__timer_delay
mmcqd/0-76 2d..2 41us : imx_read_current_timer <-read_current_timer
mmcqd/0-76 2d..2 43us : read_current_timer <-__timer_delay
mmcqd/0-76 2d..2 44us : imx_read_current_timer <-read_current_timer
mmcqd/0-76 2d..2 45us : read_current_timer <-__timer_delay
....

So it seems that esdhc_pltfm_set_clock() somehow waits. And really
there is an mdelay(1) there. So it waits a whopping millisecond there!

What's worse: I put a printk() into this function, just before the
mdelay(). And it get's called hundreds of times when I boot, or when I
update via rsync. And I believe that for each call the mdelay() kills
preemption.

This is bad because in the FlexCAN driver the ISR calls
napi_schedule(), which in turn clears the CAN RxFIFO. And if this
scheduling is taking too long, CAN packets are dropped. With 50 bits
per CAN packet anf 500000 bit/s this can happen quite easily (the
RxFIFO is a hardware component that can hold only 6 CAN frames before
data is lost).

The mdelay() is also bad because it makes eMMC/SD-Card access WAY
slower than they could be.

I'm not firm with SDHCI, I only see that esdhc_pltfm_set_clock() is
all the time called with 0 Hz and then with 20000000 Hz. Any tips on
how to get rid of this clock-off-clock-on dance?

Holger


2015-07-07 18:17:14

by Fabio Estevam

[permalink] [raw]
Subject: Re: SDHCI: mdelay() in hot path in esdhc_pltfm_set_clock looses CAN (!) frames

Hi Holger,

Adding some more folks on Cc.

On Tue, Jun 30, 2015 at 10:43 AM, Holger Schurig
<[email protected]> wrote:
> Hi,
>
> I noticed in a kernel 4.0.7 that I loose CAN packets when an incoming
> rsync transfer changes my eMMC or SD-Card image. I used CONFIG_FRACE
> to find why this is the case, and came to this trace:
>
> # tracer: preemptoff
> #
> # preemptoff latency trace v1.1.5 on 4.0.7
> # --------------------------------------------------------------------
> # latency: 1046 us, #756/756, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
> # -----------------
> # | task: mmcqd/0-76 (uid:0 nice:0 policy:0 rt_prio:0)
> # -----------------
> # => started at: sdhci_do_set_ios
> # => ended at: sdhci_do_set_ios
> #
> #
> # _------=> CPU#
> # / _-----=> irqs-off
> # | / _----=> need-resched
> # || / _---=> hardirq/softirq
> # ||| / _--=> preempt-depth
> # |||| / delay
> # cmd pid ||||| time | caller
> # \ / ||||| \ | /
> mmcqd/0-76 2d..1 1us : _raw_spin_lock_irq <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 2us : esdhc_writeb_le <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 4us : esdhc_pltfm_set_bus_width <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 5us : l2c210_sync <-esdhc_pltfm_set_bus_width
> mmcqd/0-76 2d..2 7us : esdhc_writeb_le <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 9us : l2c210_sync <-esdhc_writeb_le
> mmcqd/0-76 2d..2 10us : esdhc_readw_le <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 12us : esdhc_writew_le <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 14us : l2c210_sync <-esdhc_writew_le
> mmcqd/0-76 2d..2 16us : l2c210_sync <-esdhc_writew_le
> mmcqd/0-76 2d..2 18us : esdhc_readw_le <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 19us : esdhc_writew_le <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 21us : l2c210_sync <-esdhc_writew_le
> mmcqd/0-76 2d..2 22us : esdhc_set_uhs_signaling <-sdhci_do_set_ios
> mmcqd/0-76 2d..2 24us : pinctrl_select_state <-esdhc_set_uhs_signaling
> mmcqd/0-76 2d..2 26us : esdhc_readl_le <-esdhc_pltfm_set_clock
> mmcqd/0-76 2d..2 28us : esdhc_writel_le <-esdhc_pltfm_set_clock
> mmcqd/0-76 2d..2 29us : l2c210_sync <-esdhc_writel_le
> mmcqd/0-76 2d..2 31us : esdhc_readl_le <-esdhc_pltfm_set_clock
> mmcqd/0-76 2d..2 33us : esdhc_writel_le <-esdhc_pltfm_set_clock
> mmcqd/0-76 2d..2 34us : l2c210_sync <-esdhc_writel_le
> mmcqd/0-76 2d..2 36us : l2c210_sync <-esdhc_pltfm_set_clock
> mmcqd/0-76 2d..2 37us : __timer_const_udelay <-esdhc_pltfm_set_clock
> mmcqd/0-76 2d..2 39us : __timer_delay <-__timer_const_udelay
> mmcqd/0-76 2d..2 40us : read_current_timer <-__timer_delay
> mmcqd/0-76 2d..2 41us : imx_read_current_timer <-read_current_timer
> mmcqd/0-76 2d..2 43us : read_current_timer <-__timer_delay
> mmcqd/0-76 2d..2 44us : imx_read_current_timer <-read_current_timer
> mmcqd/0-76 2d..2 45us : read_current_timer <-__timer_delay
> ....
>
> So it seems that esdhc_pltfm_set_clock() somehow waits. And really
> there is an mdelay(1) there. So it waits a whopping millisecond there!
>
> What's worse: I put a printk() into this function, just before the
> mdelay(). And it get's called hundreds of times when I boot, or when I
> update via rsync. And I believe that for each call the mdelay() kills
> preemption.
>
> This is bad because in the FlexCAN driver the ISR calls
> napi_schedule(), which in turn clears the CAN RxFIFO. And if this
> scheduling is taking too long, CAN packets are dropped. With 50 bits
> per CAN packet anf 500000 bit/s this can happen quite easily (the
> RxFIFO is a hardware component that can hold only 6 CAN frames before
> data is lost).
>
> The mdelay() is also bad because it makes eMMC/SD-Card access WAY
> slower than they could be.
>
> I'm not firm with SDHCI, I only see that esdhc_pltfm_set_clock() is
> all the time called with 0 Hz and then with 20000000 Hz. Any tips on
> how to get rid of this clock-off-clock-on dance?
>
> Holger

2015-07-07 20:53:11

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: SDHCI: mdelay() in hot path in esdhc_pltfm_set_clock looses CAN (!) frames

On Tue, Jul 07, 2015 at 03:17:08PM -0300, Fabio Estevam wrote:
> On Tue, Jun 30, 2015 at 10:43 AM, Holger Schurig
> <[email protected]> wrote:
> > So it seems that esdhc_pltfm_set_clock() somehow waits. And really
> > there is an mdelay(1) there. So it waits a whopping millisecond there!
> >
> > What's worse: I put a printk() into this function, just before the
> > mdelay(). And it get's called hundreds of times when I boot, or when I
> > update via rsync. And I believe that for each call the mdelay() kills
> > preemption.

The whole SDHCI thing remains a total trainwreck. Sorry guys, that's
precisely what it is. That damned driver needs rewriting to be a
library, the quirks need to be killed, and handled in specific driver
code for the variants with a library of core functions.

Then, the whole set_ios() thing in MMC needs killing and replaced with
a functional interface instead - "turn power on" "initialise/turn clock
on" "set clock speed" etc.

I suspect if that were to happen, then we'd get rid of a lot of those
mdelay()s you're seeing, because the problem sdhci has today is that
it's virtually impossible to work out what each set_ios() call into
the code is trying to achieve (well, we can work that out, what can't
be worked out is whether splitting the code is going to break some
random sdhci host, so the risk of causing breakage is high - hence
sdhci in its current form is basically unmaintainable.)

I've been saying this for quite some time - I put some work into it a
couple of years ago to try and clean up some of the crap there, but
since then people have continued with their same old ideas, adding
yet more quirk bits to those damned quirk variables, adding intercepts
for register accesses which all makes the issue worse, rather than
addressing the fundamental problem.

It needs a huge amount of effort put into it...

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-07-08 06:23:04

by Holger Schurig

[permalink] [raw]
Subject: Re: SDHCI: mdelay() in hot path in esdhc_pltfm_set_clock looses CAN (!) frames

Wow, at last some reaction. And I thought nobody cares ...

BTW, removing CONFIG_MMC_CLKGATE helped a bit, because the pointless
clock-off-clock-on while the device is booting (or accessing multiple
sectors within a short time) isn't going to happen anymore.

But really, a mdelay(1) in a driver ...

2015-07-08 06:43:22

by Sascha Hauer

[permalink] [raw]
Subject: Re: SDHCI: mdelay() in hot path in esdhc_pltfm_set_clock looses CAN (!) frames

On Wed, Jul 08, 2015 at 08:22:57AM +0200, Holger Schurig wrote:
> Wow, at last some reaction. And I thought nobody cares ...
>
> BTW, removing CONFIG_MMC_CLKGATE helped a bit, because the pointless
> clock-off-clock-on while the device is booting (or accessing multiple
> sectors within a short time) isn't going to happen anymore.
>
> But really, a mdelay(1) in a driver ...

Look for the other mdelays in sdhci.c. There are up to a hundred of
them in sdhci_reset() which is called under a spin_lock_irqsave. I
remember that with the wrong type of card detection the MMC core
frequently polls for a card and if none is found the sdhci core was
resetted which falls into this mdelay polling loop in sdhci_reset. So if
you care about realtime you better have a card inserted...

Sascha

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |