2021-11-17 20:08:30

by Nick Terrell

[permalink] [raw]
Subject: [PATCH v2 0/3] Fix stack usage on parisc & improve code size bloat

From: Nick Terrell <[email protected]>

I'll be sending these patches to Linus through my tree. Just submitting them
for comment before I do so. I'm somewhat unsure of the correct workflow for
patches I submit myself, so let me know if there is something different I
should do.

This patch set contains 3 commits:
1. Fixes a minor unused variable warning reported by Kernel test robot [0].
2. Improves the reported code bloat (-88KB / 374KB) [1] by outlining
some functions that are unlikely to be used in performance sensitive
workloads.
3. Fixes the reported excess stack usage on parisc [2] by removing -O3
from zstd's compilation flags. -O3 triggered bugs in the hppa-linux-gnu
gcc-8 compiler. -O2 performance is acceptable: neutral compression,
about -1% decompression speed. We also reduce code bloat
(-105KB / 374KB).

After this commit our code bloat is cut from 374KB to 105KB with gcc-11.
If we wanted to cut the remaining 105KB we'd likely have to trade
signicant performance, so I want to say that this is enough for now.

We should be able to get further gains without sacrificing speed, but
that will take some significant optimization effort, and isn't suitable
for a quick fix. I've opened an upstream issue [3] to track the code size,
and try to avoid future regressions, and improve it in the long term.

[0] https://lore.kernel.org/linux-mm/[email protected]/T/
[1] https://lkml.org/lkml/2021/11/15/710
[2] https://lkml.org/lkml/2021/11/14/189
[3] https://github.com/facebook/zstd/issues/2867

Signed-off-by: Nick Terrell <[email protected]>

v1 -> v2:
* (1/3) Update commit message & add a comment.

Nick Terrell (3):
lib: zstd: Fix unused variable warning
lib: zstd: Don't inline functions in zstd_opt.c
lib: zstd: Don't add -O3 to cflags

lib/zstd/Makefile | 2 --
lib/zstd/common/compiler.h | 7 +++++++
lib/zstd/compress/zstd_compress_superblock.c | 2 ++
lib/zstd/compress/zstd_opt.c | 14 +++++++++++++-
4 files changed, 22 insertions(+), 3 deletions(-)

--
2.33.1



2021-11-17 20:08:34

by Nick Terrell

[permalink] [raw]
Subject: [PATCH v2 1/3] lib: zstd: Fix unused variable warning

From: Nick Terrell <[email protected]>

The variable `litLengthSum` is only used by an `assert()`, so when
asserts are disabled the compiler doesn't see any usage and warns.

This issue is already fixed upstream by PR #2838 [0]. It was reported
by the Kernel test robot in [1].

Another approach would be to change zstd's disabled `assert()`
definition to use the argument in a disabled branch, instead of
ignoring the argument. I've avoided this approach because there are
some small changes necessary to get zstd to build, and I would
want to thoroughly re-test for performance, since that is slightly
changing the code in every function in zstd. It seems like a
trivial change, but some functions are pretty sensitive to small
changes. However, I think it is a valid approach that I would
like to see upstream take, so I've opened Issue #2868 to attempt
this upstream.

[0] https://github.com/facebook/zstd/pull/2838
[1] https://lore.kernel.org/linux-mm/[email protected]/T/
[2] https://github.com/facebook/zstd/issues/2868

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Nick Terrell <[email protected]>
---
lib/zstd/compress/zstd_compress_superblock.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/lib/zstd/compress/zstd_compress_superblock.c b/lib/zstd/compress/zstd_compress_superblock.c
index ee03e0aedb03..b0610b255653 100644
--- a/lib/zstd/compress/zstd_compress_superblock.c
+++ b/lib/zstd/compress/zstd_compress_superblock.c
@@ -411,6 +411,8 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
const seqDef* sp = sstart;
size_t matchLengthSum = 0;
size_t litLengthSum = 0;
+ /* Only used by assert(), suppress unused variable warnings in production. */
+ (void)litLengthSum;
while (send-sp > 0) {
ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
litLengthSum += seqLen.litLength;
--
2.33.1


2021-11-17 20:08:36

by Nick Terrell

[permalink] [raw]
Subject: [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c

From: Nick Terrell <[email protected]>

`zstd_opt.c` contains the match finder for the highest compression
levels. These levels are already very slow, and are unlikely to be used
in the kernel. If they are used, they shouldn't be used in latency
sensitive workloads, so slowing them down shouldn't be a big deal.

This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
I've also opened an issue upstream [1] so that we can properly tackle
the code size issue in `zstd_opt.c` for all users, and can hopefully
remove this hack in the next zstd version we import.

Bloat-o-meter output on x86-64:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
Function old new delta
ZSTD_compressBlock_opt_generic.constprop - 7559 +7559
ZSTD_insertBtAndGetAllMatches - 6304 +6304
ZSTD_insertBt1 - 1731 +1731
ZSTD_storeSeq - 693 +693
ZSTD_BtGetAllMatches - 255 +255
ZSTD_updateRep - 128 +128
ZSTD_updateTree 96 99 +3
ZSTD_insertAndFindFirstIndexHash3 81 - -81
ZSTD_setBasePrices.constprop 98 - -98
ZSTD_litLengthPrice.constprop 138 - -138
ZSTD_count 362 181 -181
ZSTD_count_2segments 1407 938 -469
ZSTD_insertBt1.constprop 2689 - -2689
ZSTD_compressBlock_btultra2 19990 423 -19567
ZSTD_compressBlock_btultra 19633 15 -19618
ZSTD_initStats_ultra 19825 - -19825
ZSTD_compressBlock_btopt 20374 12 -20362
ZSTD_compressBlock_btopt_extDict 29984 12 -29972
ZSTD_compressBlock_btultra_extDict 30718 15 -30703
ZSTD_compressBlock_btopt_dictMatchState 32689 12 -32677
ZSTD_compressBlock_btultra_dictMatchState 33574 15 -33559
Total: Before=6611828, After=6418562, chg -2.92%
```

[0] https://lkml.org/lkml/2021/11/14/189
[1] https://github.com/facebook/zstd/issues/2862

Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Nick Terrell <[email protected]>
---
lib/zstd/common/compiler.h | 7 +++++++
lib/zstd/compress/zstd_opt.c | 14 +++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h
index a1a051e4bce6..f5a9c70a228a 100644
--- a/lib/zstd/common/compiler.h
+++ b/lib/zstd/common/compiler.h
@@ -16,6 +16,7 @@
*********************************************************/
/* force inlining */

+#if !defined(ZSTD_NO_INLINE)
#if (defined(__GNUC__) && !defined(__STRICT_ANSI__)) || defined(__cplusplus) || defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L /* C99 */
# define INLINE_KEYWORD inline
#else
@@ -24,6 +25,12 @@

#define FORCE_INLINE_ATTR __attribute__((always_inline))

+#else
+
+#define INLINE_KEYWORD
+#define FORCE_INLINE_ATTR
+
+#endif

/*
On MSVC qsort requires that functions passed into it use the __cdecl calling conversion(CC).
diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c
index 04337050fe9a..09483f518dc3 100644
--- a/lib/zstd/compress/zstd_opt.c
+++ b/lib/zstd/compress/zstd_opt.c
@@ -8,6 +8,18 @@
* You may select, at your option, one of the above-listed licenses.
*/

+/*
+ * Disable inlining for the optimal parser for the kernel build.
+ * It is unlikely to be used in the kernel, and where it is used
+ * latency shouldn't matter because it is very slow to begin with.
+ * We prefer a ~180KB binary size win over faster optimal parsing.
+ *
+ * TODO(https://github.com/facebook/zstd/issues/2862):
+ * Improve the code size of the optimal parser in general, so we
+ * don't need this hack for the kernel build.
+ */
+#define ZSTD_NO_INLINE 1
+
#include "zstd_compress_internal.h"
#include "hist.h"
#include "zstd_opt.h"
@@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
*/
U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
- }
+ }
ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
}
ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);
--
2.33.1


2021-11-17 20:08:38

by Nick Terrell

[permalink] [raw]
Subject: [PATCH v2 3/3] lib: zstd: Don't add -O3 to cflags

From: Nick Terrell <[email protected]>

After the update to zstd-1.4.10 passing -O3 is no longer necessary to
get good performance from zstd. Using the default optimization level -O2
is sufficient to get good performance.

I've measured no significant change to compression speed, and a ~1%
decompression speed loss, which is acceptable.

This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
stacks that are ~3KB. With -O2 these same functions generate stacks in
the < 100B, completely fixing the problem. Function size deltas are
listed below:

ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68
ZSTD_compressBlock_fast: 2216 -> 40
ZSTD_compressBlock_fast_dictMatchState: 1848 -> 64
ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76
ZSTD_fillDoubleHashTable: 3252 -> 0
ZSTD_compressBlock_doubleFast: 5856 -> 36
ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84
ZSTD_copmressBlock_lazy2: 2420 -> 72

Additionally, this improves the reported code bloat [1]. With gcc-11
bloat-o-meter shows an 80KB code size improvement:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190)
Total: Before=6418562, After=6336372, chg -1.28%
```

Compared to before the zstd-1.4.10 update we see a total code size
regression of 105KB, down from 374KB at v5.16-rc1:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522)
Total: Before=6228850, After=6336372, chg +1.73%
```

[0] https://lkml.org/lkml/2021/11/15/710
[1] https://lkml.org/lkml/2021/11/14/189

Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Nick Terrell <[email protected]>
---
lib/zstd/Makefile | 2 --
1 file changed, 2 deletions(-)

diff --git a/lib/zstd/Makefile b/lib/zstd/Makefile
index 65218ec5b8f2..fc45339fc3a3 100644
--- a/lib/zstd/Makefile
+++ b/lib/zstd/Makefile
@@ -11,8 +11,6 @@
obj-$(CONFIG_ZSTD_COMPRESS) += zstd_compress.o
obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd_decompress.o

-ccflags-y += -O3
-
zstd_compress-y := \
zstd_compress_module.o \
common/debug.o \
--
2.33.1


2021-11-18 07:51:10

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] lib: zstd: Fix unused variable warning

Hi Nick,

On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <[email protected]> wrote:
> The variable `litLengthSum` is only used by an `assert()`, so when
> asserts are disabled the compiler doesn't see any usage and warns.
>
> This issue is already fixed upstream by PR #2838 [0]. It was reported
> by the Kernel test robot in [1].
>
> Another approach would be to change zstd's disabled `assert()`
> definition to use the argument in a disabled branch, instead of
> ignoring the argument. I've avoided this approach because there are
> some small changes necessary to get zstd to build, and I would
> want to thoroughly re-test for performance, since that is slightly
> changing the code in every function in zstd. It seems like a
> trivial change, but some functions are pretty sensitive to small
> changes. However, I think it is a valid approach that I would
> like to see upstream take, so I've opened Issue #2868 to attempt
> this upstream.
>
> [0] https://github.com/facebook/zstd/pull/2838
> [1] https://lore.kernel.org/linux-mm/[email protected]/T/
> [2] https://github.com/facebook/zstd/issues/2868
>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Nick Terrell <[email protected]>

Thanks for your patch!

> --- a/lib/zstd/compress/zstd_compress_superblock.c
> +++ b/lib/zstd/compress/zstd_compress_superblock.c
> @@ -411,6 +411,8 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
> const seqDef* sp = sstart;
> size_t matchLengthSum = 0;
> size_t litLengthSum = 0;
> + /* Only used by assert(), suppress unused variable warnings in production. */
> + (void)litLengthSum;

The Linux way-to do this is to add __maybe_unused.
But perhaps you don't want to introduce that in the upstream codebase.

> while (send-sp > 0) {
> ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
> litLengthSum += seqLen.litLength;

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-11-18 07:53:19

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c

Hi Nick,

On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <[email protected]> wrote:
> `zstd_opt.c` contains the match finder for the highest compression
> levels. These levels are already very slow, and are unlikely to be used
> in the kernel. If they are used, they shouldn't be used in latency
> sensitive workloads, so slowing them down shouldn't be a big deal.
>
> This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
> I've also opened an issue upstream [1] so that we can properly tackle
> the code size issue in `zstd_opt.c` for all users, and can hopefully
> remove this hack in the next zstd version we import.
>
> Bloat-o-meter output on x86-64:
>
> ```
> > ../scripts/bloat-o-meter vmlinux.old vmlinux
> add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
> Function old new delta
> ZSTD_compressBlock_opt_generic.constprop - 7559 +7559
> ZSTD_insertBtAndGetAllMatches - 6304 +6304
> ZSTD_insertBt1 - 1731 +1731
> ZSTD_storeSeq - 693 +693
> ZSTD_BtGetAllMatches - 255 +255
> ZSTD_updateRep - 128 +128
> ZSTD_updateTree 96 99 +3
> ZSTD_insertAndFindFirstIndexHash3 81 - -81
> ZSTD_setBasePrices.constprop 98 - -98
> ZSTD_litLengthPrice.constprop 138 - -138
> ZSTD_count 362 181 -181
> ZSTD_count_2segments 1407 938 -469
> ZSTD_insertBt1.constprop 2689 - -2689
> ZSTD_compressBlock_btultra2 19990 423 -19567
> ZSTD_compressBlock_btultra 19633 15 -19618
> ZSTD_initStats_ultra 19825 - -19825
> ZSTD_compressBlock_btopt 20374 12 -20362
> ZSTD_compressBlock_btopt_extDict 29984 12 -29972
> ZSTD_compressBlock_btultra_extDict 30718 15 -30703
> ZSTD_compressBlock_btopt_dictMatchState 32689 12 -32677
> ZSTD_compressBlock_btultra_dictMatchState 33574 15 -33559
> Total: Before=6611828, After=6418562, chg -2.92%
> ```
>
> [0] https://lkml.org/lkml/2021/11/14/189
> [1] https://github.com/facebook/zstd/issues/2862
>
> Reported-by: Geert Uytterhoeven <[email protected]>
> Signed-off-by: Nick Terrell <[email protected]>

Thanks for your patch!

Impact on lib/zstd/zstd_compress.ko for atari_defconfig:

add/remove: 5/4 grow/shrink: 1/9 up/down: 15392/-167214 (-151822)

Nice!

Tested-by: Geert Uytterhoeven <[email protected]>

> --- a/lib/zstd/compress/zstd_opt.c
> +++ b/lib/zstd/compress/zstd_opt.c
> @@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
> */
> U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
> ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
> - }
> + }
> ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
> }
> ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);

This change is unrelated. With that removed:
Reviewed-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-11-18 07:56:30

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] lib: zstd: Don't add -O3 to cflags

Hi Nick,

On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <[email protected]> wrote:
> From: Nick Terrell <[email protected]>
>
> After the update to zstd-1.4.10 passing -O3 is no longer necessary to
> get good performance from zstd. Using the default optimization level -O2
> is sufficient to get good performance.
>
> I've measured no significant change to compression speed, and a ~1%
> decompression speed loss, which is acceptable.
>
> This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
> gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
> stacks that are ~3KB. With -O2 these same functions generate stacks in
> the < 100B, completely fixing the problem. Function size deltas are
> listed below:
>
> ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68
> ZSTD_compressBlock_fast: 2216 -> 40
> ZSTD_compressBlock_fast_dictMatchState: 1848 -> 64
> ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76
> ZSTD_fillDoubleHashTable: 3252 -> 0
> ZSTD_compressBlock_doubleFast: 5856 -> 36
> ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84
> ZSTD_copmressBlock_lazy2: 2420 -> 72
>
> Additionally, this improves the reported code bloat [1]. With gcc-11
> bloat-o-meter shows an 80KB code size improvement:
>
> ```
> > ../scripts/bloat-o-meter vmlinux.old vmlinux
> add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190)
> Total: Before=6418562, After=6336372, chg -1.28%
> ```
>
> Compared to before the zstd-1.4.10 update we see a total code size
> regression of 105KB, down from 374KB at v5.16-rc1:
>
> ```
> > ../scripts/bloat-o-meter vmlinux.old vmlinux
> add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522)
> Total: Before=6228850, After=6336372, chg +1.73%
> ```
>
> [0] https://lkml.org/lkml/2021/11/15/710
> [1] https://lkml.org/lkml/2021/11/14/189
>
> Reported-by: Geert Uytterhoeven <[email protected]>
> Signed-off-by: Nick Terrell <[email protected]>

Impact on vmlinux for atari_defconfig:

add/remove: 22/3 grow/shrink: 7/91 up/down: 3246/-35548 (-32302)

Impact on lib/zstd/zstd_compress.ko for atari_defconfig:

add/remove: 63/5 grow/shrink: 23/197 up/down: 13410/-168604 (-155194)

Tested-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-11-18 08:21:22

by Nick Terrell

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] lib: zstd: Fix unused variable warning



> On Nov 17, 2021, at 11:50 PM, Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Nick,
>
> On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <[email protected]> wrote:
>> The variable `litLengthSum` is only used by an `assert()`, so when
>> asserts are disabled the compiler doesn't see any usage and warns.
>>
>> This issue is already fixed upstream by PR #2838 [0]. It was reported
>> by the Kernel test robot in [1].
>>
>> Another approach would be to change zstd's disabled `assert()`
>> definition to use the argument in a disabled branch, instead of
>> ignoring the argument. I've avoided this approach because there are
>> some small changes necessary to get zstd to build, and I would
>> want to thoroughly re-test for performance, since that is slightly
>> changing the code in every function in zstd. It seems like a
>> trivial change, but some functions are pretty sensitive to small
>> changes. However, I think it is a valid approach that I would
>> like to see upstream take, so I've opened Issue #2868 to attempt
>> this upstream.
>>
>> [0] https://github.com/facebook/zstd/pull/2838
>> [1] https://lore.kernel.org/linux-mm/[email protected]/T/
>> [2] https://github.com/facebook/zstd/issues/2868
>>
>> Reported-by: kernel test robot <[email protected]>
>> Signed-off-by: Nick Terrell <[email protected]>
>
> Thanks for your patch!
>
>> --- a/lib/zstd/compress/zstd_compress_superblock.c
>> +++ b/lib/zstd/compress/zstd_compress_superblock.c
>> @@ -411,6 +411,8 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
>> const seqDef* sp = sstart;
>> size_t matchLengthSum = 0;
>> size_t litLengthSum = 0;
>> + /* Only used by assert(), suppress unused variable warnings in production. */
>> + (void)litLengthSum;
>
> The Linux way-to do this is to add __maybe_unused.
> But perhaps you don't want to introduce that in the upstream codebase.

Upstream zstd can't use __maybe_unused (or its own version) because
not every compiler supports it. So compilers that don't support it may emit
unused variable warnings. We like using attributes (like fallthrough) when
applicable, and when there is a portable fallback. In this case, there isn’t
really a way to write __maybe_unused that works portably.

One of the design goals of lib/zstd/ in the kernel is to not maintain any
long-term patches on top of upstream. That is so we can automatically
import upstream into the kernel, so it is easy to keep up to date. We
can accept short-term fixes, but all patches in lib/zstd/ in the kernel
need to be ported upstream before the next import.

That does incur non-linux style in lib/zstd/. However, we mitigate that
by providing a linux-style wrapper API in <linux/zstd.h>, so that the
callers of zstd don’t get “infected” with zstd’s upstream style.

Best,
Nick Terrell

>> while (send-sp > 0) {
>> ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
>> litLengthSum += seqLen.litLength;
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds

2021-11-18 08:23:14

by Nick Terrell

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c



> On Nov 17, 2021, at 11:52 PM, Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Nick,
>
> On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <[email protected]> wrote:
>> `zstd_opt.c` contains the match finder for the highest compression
>> levels. These levels are already very slow, and are unlikely to be used
>> in the kernel. If they are used, they shouldn't be used in latency
>> sensitive workloads, so slowing them down shouldn't be a big deal.
>>
>> This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
>> I've also opened an issue upstream [1] so that we can properly tackle
>> the code size issue in `zstd_opt.c` for all users, and can hopefully
>> remove this hack in the next zstd version we import.
>>
>> Bloat-o-meter output on x86-64:
>>
>> ```
>>> ../scripts/bloat-o-meter vmlinux.old vmlinux
>> add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
>> Function old new delta
>> ZSTD_compressBlock_opt_generic.constprop - 7559 +7559
>> ZSTD_insertBtAndGetAllMatches - 6304 +6304
>> ZSTD_insertBt1 - 1731 +1731
>> ZSTD_storeSeq - 693 +693
>> ZSTD_BtGetAllMatches - 255 +255
>> ZSTD_updateRep - 128 +128
>> ZSTD_updateTree 96 99 +3
>> ZSTD_insertAndFindFirstIndexHash3 81 - -81
>> ZSTD_setBasePrices.constprop 98 - -98
>> ZSTD_litLengthPrice.constprop 138 - -138
>> ZSTD_count 362 181 -181
>> ZSTD_count_2segments 1407 938 -469
>> ZSTD_insertBt1.constprop 2689 - -2689
>> ZSTD_compressBlock_btultra2 19990 423 -19567
>> ZSTD_compressBlock_btultra 19633 15 -19618
>> ZSTD_initStats_ultra 19825 - -19825
>> ZSTD_compressBlock_btopt 20374 12 -20362
>> ZSTD_compressBlock_btopt_extDict 29984 12 -29972
>> ZSTD_compressBlock_btultra_extDict 30718 15 -30703
>> ZSTD_compressBlock_btopt_dictMatchState 32689 12 -32677
>> ZSTD_compressBlock_btultra_dictMatchState 33574 15 -33559
>> Total: Before=6611828, After=6418562, chg -2.92%
>> ```
>>
>> [0] https://lkml.org/lkml/2021/11/14/189
>> [1] https://github.com/facebook/zstd/issues/2862
>>
>> Reported-by: Geert Uytterhoeven <[email protected]>
>> Signed-off-by: Nick Terrell <[email protected]>
>
> Thanks for your patch!
>
> Impact on lib/zstd/zstd_compress.ko for atari_defconfig:
>
> add/remove: 5/4 grow/shrink: 1/9 up/down: 15392/-167214 (-151822)
>
> Nice!
>
> Tested-by: Geert Uytterhoeven <[email protected]>
>
>> --- a/lib/zstd/compress/zstd_opt.c
>> +++ b/lib/zstd/compress/zstd_opt.c
>> @@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
>> */
>> U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
>> ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
>> - }
>> + }
>> ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
>> }
>> ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);
>
> This change is unrelated. With that removed:
> Reviewed-by: Geert Uytterhoeven <[email protected]>

Thanks for the review! I will remove this change before submitting
the PR. Unless you have a strong preference, I won’t put up another
version of the patch set with just this change.

Best,
Nick Terrell

> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds