2018-05-14 11:31:00

by Faiz Abbas

[permalink] [raw]
Subject: MMC performance degradation due to too many requests

Hi Everyone,

I am debugging a performance degradation issue in some SD cards which
happened around 4.13 kernel on TI's AM335x-evm.

I compare the time taken for a simple dd command (after dropping
caches) to run between kernel versions 4.13 and 4.14. I see that there
is a consistent 15-20% reduction in speed. The amount of decrease
depends on the card used.

command:
# echo 3 > /proc/sys/vm/drop_caches; dd if=/dev/zero of=/dev/mmcblk1p2
bs=1M count=100;


v4.13
~22s for 100M transfer

v4.14
~28s for 100M transfer

Also, I see many times the number of interrupts for mmc1 in 4.14.

v4.14
34: 2122 INTC 64 Level mmc1

v4.13
162: 457 INTC 64 Level mmc1

This increase of interrupts also exists in the latest kernel.

I also printed out the size of the DMA transfers in the omap_hsmmc
driver during the dd command. I see that the maximum size of the dma
transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.

Full log:

v4.13
https://pastebin.ubuntu.com/p/gK2k9GZcNd/

v4.14
https://pastebin.ubuntu.com/p/3YzCbQ79rk/

It looks like something has started splitting the request into 4 times
as many smaller buffers and this is what is causing the performance
decrease.

I also ran the ftrace function profile tool on mmc* and blk* functions
and see that the function mmc_blk_issue_rq() is where the program spends
most of its time and the calls to that and other related functions have
increased around 4 times with the average time per call being reduced.

Function profile Log:

v4.13
https://pastebin.ubuntu.com/p/R5JG6FpTvy/

v4.14
https://pastebin.ubuntu.com/p/Kc6QYYwYcN/

I am not sure why calls to mmc_blk_issue_rq() have increased by so much.
Any ideas why this could be happening?


Thanks,
Faiz


2018-05-14 17:29:23

by Tony Lindgren

[permalink] [raw]
Subject: Re: MMC performance degradation due to too many requests

* Faiz Abbas <[email protected]> [180514 11:31]:
> Hi Everyone,
>
> I am debugging a performance degradation issue in some SD cards which
> happened around 4.13 kernel on TI's AM335x-evm.
>
> I compare the time taken for a simple dd command (after dropping
> caches) to run between kernel versions 4.13 and 4.14. I see that there
> is a consistent 15-20% reduction in speed. The amount of decrease
> depends on the card used.

Care to run git bisect between v4.13 and v4.14? Seems like that
should work just fine between that range for boot and performance
testing.

Regards,

Tony

2018-05-15 12:47:20

by Linus Walleij

[permalink] [raw]
Subject: Re: MMC performance degradation due to too many requests

On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <[email protected]> wrote:

> Also, I see many times the number of interrupts for mmc1 in 4.14.
(...)
> I also printed out the size of the DMA transfers in the omap_hsmmc
> driver during the dd command. I see that the maximum size of the dma
> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.

This seems like my fault for removing the bounce buffer.
(The name bounce buffer is misleading, I suspect the right
term for what is was doing was "merge buffer" or something,
coalescing physically consecutive requests.)

commit de3ee99b097dd51938276e3af388cd4ad0f2750a
"mmc: Delete bounce buffer handling"
should be the culprit.

The bounce buffer only makes any performance impact if the
host DMA controller is unable to handle scatter-gather lists
so the number of segments is set down to 1.

Can you verifty that:
- mmc->max_segs == 1
- You were previously enabling CONFIG_MMC_BLOCK_BOUNCE

I am a bit confused since omap_hsmmc sets max_segs
to 64, indicating that it can do scatter-gather, and the bounce buffer
code will only every kick in if max_segs == 1.

The solution we created for SDHCI is:
commit bd9b902798ab14d19ca116b10bde581ddff8f905
"mmc: sdhci: Implement an SDHCI-specific bounce buffer"
i.e. we push this bouncing down to the driver and does not try
to do it in the MMC core.

I'd be happy to come up with something OMAP-specific as well
if you can only help me to figure out the code path leading to
max_segs == 1.

Yours,
Linus Walleij

2018-05-15 15:50:56

by Faiz Abbas

[permalink] [raw]
Subject: Re: MMC performance degradation due to too many requests

Hi Linus,

On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote:
> On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <[email protected]> wrote:
>
>> Also, I see many times the number of interrupts for mmc1 in 4.14.
> (...)
>> I also printed out the size of the DMA transfers in the omap_hsmmc
>> driver during the dd command. I see that the maximum size of the dma
>> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.
>
> This seems like my fault for removing the bounce buffer.
> (The name bounce buffer is misleading, I suspect the right
> term for what is was doing was "merge buffer" or something,
> coalescing physically consecutive requests.)
>
> commit de3ee99b097dd51938276e3af388cd4ad0f2750a
> "mmc: Delete bounce buffer handling"
> should be the culprit.

I don't think so. I checked out to the commit before that and see that
the interrupts are still too high.

>
> The bounce buffer only makes any performance impact if the
> host DMA controller is unable to handle scatter-gather lists
> so the number of segments is set down to 1.
>
> Can you verifty that:
> - mmc->max_segs == 1

No. It is 64 in v4.14. It was 1024 in v4.13.

v4.14
https://pastebin.ubuntu.com/p/4wXB2jb6zw/

v4.13
https://pastebin.ubuntu.com/p/dpd6kb4YJY/


> - You were previously enabling CONFIG_MMC_BLOCK_BOUNCE

Yes.

Regards,
Faiz

2018-05-16 14:29:42

by Linus Walleij

[permalink] [raw]
Subject: Re: MMC performance degradation due to too many requests

On Tue, May 15, 2018 at 5:11 PM, Faiz Abbas <[email protected]> wrote:
> On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote:
>> On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <[email protected]> wrote:
>>
>>> Also, I see many times the number of interrupts for mmc1 in 4.14.
>> (...)
>>> I also printed out the size of the DMA transfers in the omap_hsmmc
>>> driver during the dd command. I see that the maximum size of the dma
>>> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.
>>
>> This seems like my fault for removing the bounce buffer.
>> (The name bounce buffer is misleading, I suspect the right
>> term for what is was doing was "merge buffer" or something,
>> coalescing physically consecutive requests.)
>>
>> commit de3ee99b097dd51938276e3af388cd4ad0f2750a
>> "mmc: Delete bounce buffer handling"
>> should be the culprit.
>
> I don't think so. I checked out to the commit before that and see that
> the interrupts are still too high.

OK phew then I'm safe :)

>> The bounce buffer only makes any performance impact if the
>> host DMA controller is unable to handle scatter-gather lists
>> so the number of segments is set down to 1.
>>
>> Can you verifty that:
>> - mmc->max_segs == 1
>
> No. It is 64 in v4.14. It was 1024 in v4.13.

This smells like:
commit 9442400486a1ec28915e9ab31517a5665ed35705
"mmc: omap_hsmmc: Reduce max_segs for reliability"

Can you test reverting that and notify Will Newton if this
commit is causing performance regressions.

Yours,
Linus Walleij

2018-05-17 12:53:31

by Faiz Abbas

[permalink] [raw]
Subject: Re: MMC performance degradation due to too many requests

Hi,

On Wednesday 16 May 2018 07:59 PM, Linus Walleij wrote:
> On Tue, May 15, 2018 at 5:11 PM, Faiz Abbas <[email protected]> wrote:
>> On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote:
>>> On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <[email protected]> wrote:
>>>
>>>> Also, I see many times the number of interrupts for mmc1 in 4.14.
>>> (...)
>>>> I also printed out the size of the DMA transfers in the omap_hsmmc
>>>> driver during the dd command. I see that the maximum size of the dma
>>>> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.
>>>
>>> This seems like my fault for removing the bounce buffer.
>>> (The name bounce buffer is misleading, I suspect the right
>>> term for what is was doing was "merge buffer" or something,
>>> coalescing physically consecutive requests.)
>>>
>>> commit de3ee99b097dd51938276e3af388cd4ad0f2750a
>>> "mmc: Delete bounce buffer handling"
>>> should be the culprit.
>>
>> I don't think so. I checked out to the commit before that and see that
>> the interrupts are still too high.
>
> OK phew then I'm safe :)
>
>>> The bounce buffer only makes any performance impact if the
>>> host DMA controller is unable to handle scatter-gather lists
>>> so the number of segments is set down to 1.
>>>
>>> Can you verifty that:
>>> - mmc->max_segs == 1
>>
>> No. It is 64 in v4.14. It was 1024 in v4.13.
>
> This smells like:
> commit 9442400486a1ec28915e9ab31517a5665ed35705
> "mmc: omap_hsmmc: Reduce max_segs for reliability"
>

Yes. I was able to bisect the number of interrupts to the same commit as
well.

On top of v4.14,

*Without revert*:
https://pastebin.ubuntu.com/p/VJgvzrCP9g/

avg speed for 100 MB transfer: 3.7 MBps
Number of interrupts per 100 MB transfer: ~2000

*After revert*:
https://pastebin.ubuntu.com/p/ktgTmKVKkN/

avg speed for 100 MB transfer: 4.6 MBps
Number of interrupts per 100 MB transfer: ~500

Also, I'm getting 1M buffers in the omap_hsmmc DMA now.

@Will will it be OK to change the mmc_segs back to 1024?

Thanks,
Faiz