2022-03-17 05:22:37

by Ahmad Fatoum

[permalink] [raw]
Subject: Possible performance regression with CONFIG_SQUASHFS_DECOMP_SINGLE

Hello,

This an issue we had with v5.15 that we have since successfully worked around.
I am reporting it here as a pointer in case someone else runs into this and as
a heads up that there seems to be an underlying performance regression, so
here it goes:

We have an i.MX8MM (4x Cortex-A53) system with squashfs on eMMC as a root file
system. The system originally ran NXP's "imx_5.4.24_2.1.0" which has about
5000 patches on top of upstream v5.4.24 including PREEMPT_RT.

The system was updated to mainline Linux + PREEMPT_RT and boot time suffered
considerably growing from 40s with vendor kernel to 1m20s with mainline-based
kernel.

The slowdown on mainline was reproducible for all scheduling models (with
or without PREEMPT_RT) except for PREEMPT_NONE, which was back at 40s.

The services most impacted by the slowdown were C++ applications with many
shared libraries dynamically loaded from the rootfs.

Looking through the original kernel configuration we found that it has
CONFIG_SQUASHFS_DECOMP_SINGLE=y and CONFIG_SQUASHFS_FILE_CACHE=y.

Once changed to CONFIG_SQUASHFS_FILE_DIRECT=y and
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y, we were below 40s as we want.

That's clearly the preferred configuration and it resolves our problem
It doesn't solve the underlying issue though:

- CONFIG_PREEMPT_VOLUNTARY performs much worse than CONFIG_PREEMPT_NONE
for some workloads when CONFIG_SQUASHFS_DECOMP_SINGLE=y

- And this might not have been the case with v5.4. Unfortunately we can't
bisect, because there wasn't enough i.MX8MM support mainline back then
to boot the system. Earliest mainline-based kernel we reproduced this on
was v5.11.

TL;DR: Check if CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y in your configuration

Cheers,
Ahmad

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


2022-03-17 06:25:40

by Phillip Lougher

[permalink] [raw]
Subject: Re: Possible performance regression with CONFIG_SQUASHFS_DECOMP_SINGLE

Ahmad Fatoum <[email protected]> wrote:

> Hello,
>
> This an issue we had with v5.15 that we have since successfully worked around.
> I am reporting it here as a pointer in case someone else runs into this and as
> a heads up that there seems to be an underlying performance regression, so
> here it goes:
>

[snip]

>
> Looking through the original kernel configuration we found that it has
> CONFIG_SQUASHFS_DECOMP_SINGLE=y and CONFIG_SQUASHFS_FILE_CACHE=y.
>
> Once changed to CONFIG_SQUASHFS_FILE_DIRECT=y and
> CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y, we were below 40s as we want.

This sounds like the slow-down that was introduced by commit c1f6925e1091
"mm: put readahead pages in cache earlier" (Linux V5.8)

This commit prevents Squashfs from doing it's own readahead, which
causes a slow-down in performance. The slow-down is noticable when
using a single decompressor (CONFIG_SQUASHFS_DECOMP_SINGLE=y), and
can be solved by moving to a multi-decompressor configuration, because
it removes contention on a single buffer in the single decompressor case.

This has been already been fixed by commit 9eec1d897139
"squashfs: provide backing_dev_info in order to disable read-ahead"
which is in Linux 5.17-rc1.

Phillip

2022-04-05 03:18:43

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: Possible performance regression with CONFIG_SQUASHFS_DECOMP_SINGLE

Hello Phillip,

On 16.03.22 06:34, Phillip Lougher wrote:
> Ahmad Fatoum <[email protected]> wrote:
>
>> Hello,
>>
>> This an issue we had with v5.15 that we have since successfully worked around.
>> I am reporting it here as a pointer in case someone else runs into this and as
>> a heads up that there seems to be an underlying performance regression, so
>> here it goes:
>>
>
> [snip]
>
>>
>> Looking through the original kernel configuration we found that it has
>> CONFIG_SQUASHFS_DECOMP_SINGLE=y and CONFIG_SQUASHFS_FILE_CACHE=y.
>>
>> Once changed to CONFIG_SQUASHFS_FILE_DIRECT=y and
>> CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y, we were below 40s as we want.
>
> This sounds like the slow-down that was introduced by commit c1f6925e1091
> "mm: put readahead pages in cache earlier" (Linux V5.8)
>
> This commit prevents Squashfs from doing it's own readahead, which
> causes a slow-down in performance. The slow-down is noticable when
> using a single decompressor (CONFIG_SQUASHFS_DECOMP_SINGLE=y), and
> can be solved by moving to a multi-decompressor configuration, because
> it removes contention on a single buffer in the single decompressor case.
>
> This has been already been fixed by commit 9eec1d897139
> "squashfs: provide backing_dev_info in order to disable read-ahead"
> which is in Linux 5.17-rc1.

I just updated to v5.17.1 and I can confirm that this commit fixes the
performance regression. Single decompressor case is now nearly as fast
as multi decompressor. Reverting the fix increased boot time from
30s~ to 2min30s.

Thanks for clearing this up!

Cheers,
Ahmad

>
> Phillip
>


--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |