From: Nick Terrell <[email protected]>
I'll be sending these patches to Linus through my tree. Just submitting them
for comment before I do so. I'm somewhat unsure of the correct workflow for
patches I submit myself, so let me know if there is something different I
should do.
This patch set contains 3 commits:
1. Fixes a minor unused variable warning reported by Kernel test robot [0].
2. Improves the reported code bloat (-88KB / 374KB) [1] by outlining
some functions that are unlikely to be used in performance sensitive
workloads.
3. Fixes the reported excess stack usage on parisc [2] by removing -O3
from zstd's compilation flags. -O3 triggered bugs in the hppa-linux-gnu
gcc-8 compiler. -O2 performance is acceptable: neutral compression,
about -1% decompression speed. We also reduce code bloat
(-105KB / 374KB).
After this commit our code bloat is cut from 374KB to 105KB with gcc-11.
If we wanted to cut the remaining 105KB we'd likely have to trade
signicant performance, so I want to say that this is enough for now.
We should be able to get further gains without sacrificing speed, but
that will take some significant optimization effort, and isn't suitable
for a quick fix. I've opened an upstream issue [3] to track the code size,
and try to avoid future regressions, and improve it in the long term.
[0] https://lore.kernel.org/linux-mm/[email protected]/T/
[1] https://lkml.org/lkml/2021/11/15/710
[2] https://lkml.org/lkml/2021/11/14/189
[3] https://github.com/facebook/zstd/issues/2867
Signed-off-by: Nick Terrell <[email protected]>
Nick Terrell (3):
lib: zstd: Fix unused variable warning
lib: zstd: Don't inline functions in zstd_opt.c
lib: zstd: Don't add -O3 to cflags
lib/zstd/Makefile | 2 --
lib/zstd/common/compiler.h | 7 +++++++
lib/zstd/compress/zstd_compress_superblock.c | 1 +
lib/zstd/compress/zstd_opt.c | 14 +++++++++++++-
4 files changed, 21 insertions(+), 3 deletions(-)
--
2.33.1
From: Nick Terrell <[email protected]>
Backport the fix from upstream PR #2838 [0]. Found by the Kernel test
robot in [1].
[0] https://github.com/facebook/zstd/pull/2838
[1] https://lore.kernel.org/linux-mm/[email protected]/T/
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Nick Terrell <[email protected]>
---
lib/zstd/compress/zstd_compress_superblock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/zstd/compress/zstd_compress_superblock.c b/lib/zstd/compress/zstd_compress_superblock.c
index ee03e0aedb03..a6a8e9a2aa0e 100644
--- a/lib/zstd/compress/zstd_compress_superblock.c
+++ b/lib/zstd/compress/zstd_compress_superblock.c
@@ -411,6 +411,7 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
const seqDef* sp = sstart;
size_t matchLengthSum = 0;
size_t litLengthSum = 0;
+ (void)litLengthSum;
while (send-sp > 0) {
ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
litLengthSum += seqLen.litLength;
--
2.33.1
From: Nick Terrell <[email protected]>
`zstd_opt.c` contains the match finder for the highest compression
levels. These levels are already very slow, and are unlikely to be used
in the kernel. If they are used, they shouldn't be used in latency
sensitive workloads, so slowing them down shouldn't be a big deal.
This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
I've also opened an issue upstream [1] so that we can properly tackle
the code size issue in `zstd_opt.c` for all users, and can hopefully
remove this hack in the next zstd version we import.
Bloat-o-meter output on x86-64:
```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
Function old new delta
ZSTD_compressBlock_opt_generic.constprop - 7559 +7559
ZSTD_insertBtAndGetAllMatches - 6304 +6304
ZSTD_insertBt1 - 1731 +1731
ZSTD_storeSeq - 693 +693
ZSTD_BtGetAllMatches - 255 +255
ZSTD_updateRep - 128 +128
ZSTD_updateTree 96 99 +3
ZSTD_insertAndFindFirstIndexHash3 81 - -81
ZSTD_setBasePrices.constprop 98 - -98
ZSTD_litLengthPrice.constprop 138 - -138
ZSTD_count 362 181 -181
ZSTD_count_2segments 1407 938 -469
ZSTD_insertBt1.constprop 2689 - -2689
ZSTD_compressBlock_btultra2 19990 423 -19567
ZSTD_compressBlock_btultra 19633 15 -19618
ZSTD_initStats_ultra 19825 - -19825
ZSTD_compressBlock_btopt 20374 12 -20362
ZSTD_compressBlock_btopt_extDict 29984 12 -29972
ZSTD_compressBlock_btultra_extDict 30718 15 -30703
ZSTD_compressBlock_btopt_dictMatchState 32689 12 -32677
ZSTD_compressBlock_btultra_dictMatchState 33574 15 -33559
Total: Before=6611828, After=6418562, chg -2.92%
```
[0] https://lkml.org/lkml/2021/11/14/189
[1] https://github.com/facebook/zstd/issues/2862
Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Nick Terrell <[email protected]>
---
lib/zstd/common/compiler.h | 7 +++++++
lib/zstd/compress/zstd_opt.c | 14 +++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h
index a1a051e4bce6..f5a9c70a228a 100644
--- a/lib/zstd/common/compiler.h
+++ b/lib/zstd/common/compiler.h
@@ -16,6 +16,7 @@
*********************************************************/
/* force inlining */
+#if !defined(ZSTD_NO_INLINE)
#if (defined(__GNUC__) && !defined(__STRICT_ANSI__)) || defined(__cplusplus) || defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L /* C99 */
# define INLINE_KEYWORD inline
#else
@@ -24,6 +25,12 @@
#define FORCE_INLINE_ATTR __attribute__((always_inline))
+#else
+
+#define INLINE_KEYWORD
+#define FORCE_INLINE_ATTR
+
+#endif
/*
On MSVC qsort requires that functions passed into it use the __cdecl calling conversion(CC).
diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c
index 04337050fe9a..09483f518dc3 100644
--- a/lib/zstd/compress/zstd_opt.c
+++ b/lib/zstd/compress/zstd_opt.c
@@ -8,6 +8,18 @@
* You may select, at your option, one of the above-listed licenses.
*/
+/*
+ * Disable inlining for the optimal parser for the kernel build.
+ * It is unlikely to be used in the kernel, and where it is used
+ * latency shouldn't matter because it is very slow to begin with.
+ * We prefer a ~180KB binary size win over faster optimal parsing.
+ *
+ * TODO(https://github.com/facebook/zstd/issues/2862):
+ * Improve the code size of the optimal parser in general, so we
+ * don't need this hack for the kernel build.
+ */
+#define ZSTD_NO_INLINE 1
+
#include "zstd_compress_internal.h"
#include "hist.h"
#include "zstd_opt.h"
@@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
*/
U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
- }
+ }
ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
}
ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);
--
2.33.1
From: Nick Terrell <[email protected]>
After the update to zstd-1.4.10 passing -O3 is no longer necessary to
get good performance from zstd. Using the default optimization level -O2
is sufficient to get good performance.
I've measured no significant change to compression speed, and a ~1%
decompression speed loss, which is acceptable.
This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
stacks that are ~3KB. With -O2 these same functions generate stacks in
the < 100B, completely fixing the problem. Function size deltas are
listed below:
ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68
ZSTD_compressBlock_fast: 2216 -> 40
ZSTD_compressBlock_fast_dictMatchState: 1848 -> 64
ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76
ZSTD_fillDoubleHashTable: 3252 -> 0
ZSTD_compressBlock_doubleFast: 5856 -> 36
ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84
ZSTD_copmressBlock_lazy2: 2420 -> 72
Additionally, this improves the reported code bloat [1]. With gcc-11
bloat-o-meter shows an 80KB code size improvement:
```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190)
Total: Before=6418562, After=6336372, chg -1.28%
```
Compared to before the zstd-1.4.10 update we see a total code size
regression of 105KB, down from 374KB at v5.16-rc1:
```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522)
Total: Before=6228850, After=6336372, chg +1.73%
```
[0] https://lkml.org/lkml/2021/11/15/710
[1] https://lkml.org/lkml/2021/11/14/189
Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Nick Terrell <[email protected]>
---
lib/zstd/Makefile | 2 --
1 file changed, 2 deletions(-)
diff --git a/lib/zstd/Makefile b/lib/zstd/Makefile
index 65218ec5b8f2..fc45339fc3a3 100644
--- a/lib/zstd/Makefile
+++ b/lib/zstd/Makefile
@@ -11,8 +11,6 @@
obj-$(CONFIG_ZSTD_COMPRESS) += zstd_compress.o
obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd_decompress.o
-ccflags-y += -O3
-
zstd_compress-y := \
zstd_compress_module.o \
common/debug.o \
--
2.33.1
On Tue, Nov 16, 2021 at 5:43 PM Nick Terrell <[email protected]> wrote:
>
> From: Nick Terrell <[email protected]>
>
> Backport the fix from upstream PR #2838 [0]. Found by the Kernel test
> robot in [1].
Ugh. Mind having a better commit message?
This just tells you that it's a backport. It doesn't actually talk
about what it fixes.
Yes, the summary line says "Fix unused variable warning", but it
doesn't talk about why that variable is unused and why it's not
removed entirely.
And it's not obvious in the diff either, because the context isn't
sufficiently large.
So a comment along the lines of "the variable is only used by an
'assert()', so when asserts are disabled the compiler sees no use at
all" or similar would be appreciated.
Of course, the alternative would be to make the backup definition of
'assert()' actually evaluate the argument. That's not what the
standard assert() is supposed to do, but whatever, and the zstd use
doesn't care.
So using
#define assert(condition) ((void)(condition))
instead would also fix the warning at least in kernel use (but not
necessarily outside the kernel - the standard C 'assert.h' is just
evil).
Linus
> On Nov 17, 2021, at 8:45 AM, Linus Torvalds <[email protected]> wrote:
>
> On Tue, Nov 16, 2021 at 5:43 PM Nick Terrell <[email protected]> wrote:
>>
>> From: Nick Terrell <[email protected]>
>>
>> Backport the fix from upstream PR #2838 [0]. Found by the Kernel test
>> robot in [1].
>
> Ugh. Mind having a better commit message?
>
> This just tells you that it's a backport. It doesn't actually talk
> about what it fixes.
>
> Yes, the summary line says "Fix unused variable warning", but it
> doesn't talk about why that variable is unused and why it's not
> removed entirely.
>
> And it's not obvious in the diff either, because the context isn't
> sufficiently large.
>
> So a comment along the lines of "the variable is only used by an
> 'assert()', so when asserts are disabled the compiler sees no use at
> all" or similar would be appreciated.
Yeah of course, thanks for the review! I’ll put up a fix shortly.
> Of course, the alternative would be to make the backup definition of
> 'assert()' actually evaluate the argument. That's not what the
> standard assert() is supposed to do, but whatever, and the zstd use
> doesn't care.
>
> So using
>
> #define assert(condition) ((void)(condition))
>
> instead would also fix the warning at least in kernel use (but not
> necessarily outside the kernel - the standard C 'assert.h' is just
> evil).
I totally agree that the standard C assert is not ideal, however I’d
be hesitant to do that in a quick fix. Just because zstd assumes
that asserts are removed from production builds, so there are at
least a few places where it makes potentially expensive function
calls.
I may be able to try something like:
#define assert(condition) do { if (0) { (void)(condition) } } while (0)
But, the code in the asserts hasn’t yet been tested to build. So we
may run into build issues, like the presence of division, or
dependency on libc.
I’ll take a stab at it, but if it ends up requiring too large of a change,
I will tend to continue with the current approach.
Best,
Nick Terrell
> Linus