Hi Everyone,
I am debugging a performance degradation issue in some SD cards which
happened around 4.13 kernel on TI's AM335x-evm.
I compare the time taken for a simple dd command (after dropping
caches) to run between kernel versions 4.13 and 4.14. I see that there
is a consistent 15-20% reduction in speed. The amount of decrease
depends on the card used.
command:
# echo 3 > /proc/sys/vm/drop_caches; dd if=/dev/zero of=/dev/mmcblk1p2
bs=1M count=100;
v4.13
~22s for 100M transfer
v4.14
~28s for 100M transfer
Also, I see many times the number of interrupts for mmc1 in 4.14.
v4.14
34: 2122 INTC 64 Level mmc1
v4.13
162: 457 INTC 64 Level mmc1
This increase of interrupts also exists in the latest kernel.
I also printed out the size of the DMA transfers in the omap_hsmmc
driver during the dd command. I see that the maximum size of the dma
transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.
Full log:
v4.13
https://pastebin.ubuntu.com/p/gK2k9GZcNd/
v4.14
https://pastebin.ubuntu.com/p/3YzCbQ79rk/
It looks like something has started splitting the request into 4 times
as many smaller buffers and this is what is causing the performance
decrease.
I also ran the ftrace function profile tool on mmc* and blk* functions
and see that the function mmc_blk_issue_rq() is where the program spends
most of its time and the calls to that and other related functions have
increased around 4 times with the average time per call being reduced.
Function profile Log:
v4.13
https://pastebin.ubuntu.com/p/R5JG6FpTvy/
v4.14
https://pastebin.ubuntu.com/p/Kc6QYYwYcN/
I am not sure why calls to mmc_blk_issue_rq() have increased by so much.
Any ideas why this could be happening?
Thanks,
Faiz
* Faiz Abbas <[email protected]> [180514 11:31]:
> Hi Everyone,
>
> I am debugging a performance degradation issue in some SD cards which
> happened around 4.13 kernel on TI's AM335x-evm.
>
> I compare the time taken for a simple dd command (after dropping
> caches) to run between kernel versions 4.13 and 4.14. I see that there
> is a consistent 15-20% reduction in speed. The amount of decrease
> depends on the card used.
Care to run git bisect between v4.13 and v4.14? Seems like that
should work just fine between that range for boot and performance
testing.
Regards,
Tony
On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <[email protected]> wrote:
> Also, I see many times the number of interrupts for mmc1 in 4.14.
(...)
> I also printed out the size of the DMA transfers in the omap_hsmmc
> driver during the dd command. I see that the maximum size of the dma
> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.
This seems like my fault for removing the bounce buffer.
(The name bounce buffer is misleading, I suspect the right
term for what is was doing was "merge buffer" or something,
coalescing physically consecutive requests.)
commit de3ee99b097dd51938276e3af388cd4ad0f2750a
"mmc: Delete bounce buffer handling"
should be the culprit.
The bounce buffer only makes any performance impact if the
host DMA controller is unable to handle scatter-gather lists
so the number of segments is set down to 1.
Can you verifty that:
- mmc->max_segs == 1
- You were previously enabling CONFIG_MMC_BLOCK_BOUNCE
I am a bit confused since omap_hsmmc sets max_segs
to 64, indicating that it can do scatter-gather, and the bounce buffer
code will only every kick in if max_segs == 1.
The solution we created for SDHCI is:
commit bd9b902798ab14d19ca116b10bde581ddff8f905
"mmc: sdhci: Implement an SDHCI-specific bounce buffer"
i.e. we push this bouncing down to the driver and does not try
to do it in the MMC core.
I'd be happy to come up with something OMAP-specific as well
if you can only help me to figure out the code path leading to
max_segs == 1.
Yours,
Linus Walleij
Hi Linus,
On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote:
> On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <[email protected]> wrote:
>
>> Also, I see many times the number of interrupts for mmc1 in 4.14.
> (...)
>> I also printed out the size of the DMA transfers in the omap_hsmmc
>> driver during the dd command. I see that the maximum size of the dma
>> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.
>
> This seems like my fault for removing the bounce buffer.
> (The name bounce buffer is misleading, I suspect the right
> term for what is was doing was "merge buffer" or something,
> coalescing physically consecutive requests.)
>
> commit de3ee99b097dd51938276e3af388cd4ad0f2750a
> "mmc: Delete bounce buffer handling"
> should be the culprit.
I don't think so. I checked out to the commit before that and see that
the interrupts are still too high.
>
> The bounce buffer only makes any performance impact if the
> host DMA controller is unable to handle scatter-gather lists
> so the number of segments is set down to 1.
>
> Can you verifty that:
> - mmc->max_segs == 1
No. It is 64 in v4.14. It was 1024 in v4.13.
v4.14
https://pastebin.ubuntu.com/p/4wXB2jb6zw/
v4.13
https://pastebin.ubuntu.com/p/dpd6kb4YJY/
> - You were previously enabling CONFIG_MMC_BLOCK_BOUNCE
Yes.
Regards,
Faiz
On Tue, May 15, 2018 at 5:11 PM, Faiz Abbas <[email protected]> wrote:
> On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote:
>> On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <[email protected]> wrote:
>>
>>> Also, I see many times the number of interrupts for mmc1 in 4.14.
>> (...)
>>> I also printed out the size of the DMA transfers in the omap_hsmmc
>>> driver during the dd command. I see that the maximum size of the dma
>>> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.
>>
>> This seems like my fault for removing the bounce buffer.
>> (The name bounce buffer is misleading, I suspect the right
>> term for what is was doing was "merge buffer" or something,
>> coalescing physically consecutive requests.)
>>
>> commit de3ee99b097dd51938276e3af388cd4ad0f2750a
>> "mmc: Delete bounce buffer handling"
>> should be the culprit.
>
> I don't think so. I checked out to the commit before that and see that
> the interrupts are still too high.
OK phew then I'm safe :)
>> The bounce buffer only makes any performance impact if the
>> host DMA controller is unable to handle scatter-gather lists
>> so the number of segments is set down to 1.
>>
>> Can you verifty that:
>> - mmc->max_segs == 1
>
> No. It is 64 in v4.14. It was 1024 in v4.13.
This smells like:
commit 9442400486a1ec28915e9ab31517a5665ed35705
"mmc: omap_hsmmc: Reduce max_segs for reliability"
Can you test reverting that and notify Will Newton if this
commit is causing performance regressions.
Yours,
Linus Walleij
Hi,
On Wednesday 16 May 2018 07:59 PM, Linus Walleij wrote:
> On Tue, May 15, 2018 at 5:11 PM, Faiz Abbas <[email protected]> wrote:
>> On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote:
>>> On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <[email protected]> wrote:
>>>
>>>> Also, I see many times the number of interrupts for mmc1 in 4.14.
>>> (...)
>>>> I also printed out the size of the DMA transfers in the omap_hsmmc
>>>> driver during the dd command. I see that the maximum size of the dma
>>>> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14.
>>>
>>> This seems like my fault for removing the bounce buffer.
>>> (The name bounce buffer is misleading, I suspect the right
>>> term for what is was doing was "merge buffer" or something,
>>> coalescing physically consecutive requests.)
>>>
>>> commit de3ee99b097dd51938276e3af388cd4ad0f2750a
>>> "mmc: Delete bounce buffer handling"
>>> should be the culprit.
>>
>> I don't think so. I checked out to the commit before that and see that
>> the interrupts are still too high.
>
> OK phew then I'm safe :)
>
>>> The bounce buffer only makes any performance impact if the
>>> host DMA controller is unable to handle scatter-gather lists
>>> so the number of segments is set down to 1.
>>>
>>> Can you verifty that:
>>> - mmc->max_segs == 1
>>
>> No. It is 64 in v4.14. It was 1024 in v4.13.
>
> This smells like:
> commit 9442400486a1ec28915e9ab31517a5665ed35705
> "mmc: omap_hsmmc: Reduce max_segs for reliability"
>
Yes. I was able to bisect the number of interrupts to the same commit as
well.
On top of v4.14,
*Without revert*:
https://pastebin.ubuntu.com/p/VJgvzrCP9g/
avg speed for 100 MB transfer: 3.7 MBps
Number of interrupts per 100 MB transfer: ~2000
*After revert*:
https://pastebin.ubuntu.com/p/ktgTmKVKkN/
avg speed for 100 MB transfer: 4.6 MBps
Number of interrupts per 100 MB transfer: ~500
Also, I'm getting 1M buffers in the omap_hsmmc DMA now.
@Will will it be OK to change the mmc_segs back to 1024?
Thanks,
Faiz