2023-06-29 22:28:12

by Julian Pidancet

[permalink] [raw]
Subject: [PATCH v2] mm/slub: disable slab merging in the default configuration

Make CONFIG_SLAB_MERGE_DEFAULT default to n unless CONFIG_SLUB_TINY is
enabled. Benefits of slab merging is limited on systems that are not
memory constrained: the memory overhead is low and evidence of its
effect on cache hotness is hard to come by.

On the other hand, distinguishing allocations into different slabs will
make attacks that rely on "heap spraying" more difficult to carry out
with success.

Take sides with security in the default kernel configuration over
questionnable performance benefits/memory efficiency.

A timed kernel compilation test, on x86 with 4K pages, conducted 10
times with slab_merge, and the same test then conducted with
slab_nomerge on the same hardware in a similar state do not show any
sign of performance hit one way or another:

| slab_merge | slab_nomerge |
------+------------------+------------------|
Time | 588.080 ± 0.799 | 587.308 ± 1.411 |
Min | 586.267 | 584.640 |
Max | 589.248 | 590.091 |

Peaks in slab usage during the test workload reveal a memory overhead
of 2.2 MiB when using slab_nomerge. Slab usage overhead after a fresh boot
amounts to 2.3 MiB:

Slab Usage | slab_merge | slab_nomerge |
-------------------+------------+--------------|
After fresh boot | 79908 kB | 82284 kB |
During test (peak) | 127940 kB | 130204 kB |

Signed-off-by: Julian Pidancet <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---

v2:
- Re-run benchmark to minimize variance in results due to CPU
frequency scaling.
- Record slab usage after boot and peaks during tests workload.
- Include benchmark results in commit message.
- Fix typo: s/MEGE/MERGE/.
- Specify that "overhead" refers to memory overhead in SLUB doc.

v1:
- Link: https://lore.kernel.org/linux-mm/[email protected]/

.../admin-guide/kernel-parameters.txt | 29 ++++++++++---------
Documentation/mm/slub.rst | 7 +++--
mm/Kconfig | 6 ++--
3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c5e7bb4babf0..7e78471a96b7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5652,21 +5652,22 @@

slram= [HW,MTD]

- slab_merge [MM]
- Enable merging of slabs with similar size when the
- kernel is built without CONFIG_SLAB_MERGE_DEFAULT.
-
slab_nomerge [MM]
- Disable merging of slabs with similar size. May be
- necessary if there is some reason to distinguish
- allocs to different slabs, especially in hardened
- environments where the risk of heap overflows and
- layout control by attackers can usually be
- frustrated by disabling merging. This will reduce
- most of the exposure of a heap attack to a single
- cache (risks via metadata attacks are mostly
- unchanged). Debug options disable merging on their
- own.
+ Disable merging of slabs with similar size when
+ the kernel is built with CONFIG_SLAB_MERGE_DEFAULT.
+ Allocations of the same size made in distinct
+ caches will be placed in separate slabs. In
+ hardened environment, the risk of heap overflows
+ and layout control by attackers can usually be
+ frustrated by disabling merging.
+
+ slab_merge [MM]
+ Enable merging of slabs with similar size. May be
+ necessary to reduce overhead or increase cache
+ hotness of objects, at the cost of increased
+ exposure in case of a heap attack to a single
+ cache. (risks via metadata attacks are mostly
+ unchanged).
For more information see Documentation/mm/slub.rst.

slab_max_order= [MM, SLAB]
diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst
index be75971532f5..0e2ce82177c0 100644
--- a/Documentation/mm/slub.rst
+++ b/Documentation/mm/slub.rst
@@ -122,9 +122,10 @@ used on the wrong slab.
Slab merging
============

-If no debug options are specified then SLUB may merge similar slabs together
-in order to reduce overhead and increase cache hotness of objects.
-``slabinfo -a`` displays which slabs were merged together.
+If the kernel is built with ``CONFIG_SLAB_MERGE_DEFAULT`` or if ``slab_merge``
+is specified on the kernel command line, then SLUB may merge similar slabs
+together in order to reduce memory overhead and increase cache hotness of
+objects. ``slabinfo -a`` displays which slabs were merged together.

Slab validation
===============
diff --git a/mm/Kconfig b/mm/Kconfig
index 7672a22647b4..05b0304302d4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -255,7 +255,7 @@ config SLUB_TINY

config SLAB_MERGE_DEFAULT
bool "Allow slab caches to be merged"
- default y
+ default n
depends on SLAB || SLUB
help
For reduced kernel memory fragmentation, slab caches can be
@@ -264,8 +264,8 @@ config SLAB_MERGE_DEFAULT
overwrite objects from merged caches (and more easily control
cache layout), which makes such heap attacks easier to exploit
by attackers. By keeping caches unmerged, these kinds of exploits
- can usually only damage objects in the same cache. To disable
- merging at runtime, "slab_nomerge" can be passed on the kernel
+ can usually only damage objects in the same cache. To enable
+ merging at runtime, "slab_merge" can be passed on the kernel
command line.

config SLAB_FREELIST_RANDOM
--
2.40.1



2023-07-03 00:38:25

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Fri, 30 Jun 2023, Julian Pidancet wrote:

> Make CONFIG_SLAB_MERGE_DEFAULT default to n unless CONFIG_SLUB_TINY is
> enabled. Benefits of slab merging is limited on systems that are not
> memory constrained: the memory overhead is low and evidence of its
> effect on cache hotness is hard to come by.
>
> On the other hand, distinguishing allocations into different slabs will
> make attacks that rely on "heap spraying" more difficult to carry out
> with success.
>
> Take sides with security in the default kernel configuration over
> questionnable performance benefits/memory efficiency.
>
> A timed kernel compilation test, on x86 with 4K pages, conducted 10
> times with slab_merge, and the same test then conducted with
> slab_nomerge on the same hardware in a similar state do not show any
> sign of performance hit one way or another:
>
> | slab_merge | slab_nomerge |
> ------+------------------+------------------|
> Time | 588.080 ± 0.799 | 587.308 ± 1.411 |
> Min | 586.267 | 584.640 |
> Max | 589.248 | 590.091 |
>
> Peaks in slab usage during the test workload reveal a memory overhead
> of 2.2 MiB when using slab_nomerge. Slab usage overhead after a fresh boot
> amounts to 2.3 MiB:
>
> Slab Usage | slab_merge | slab_nomerge |
> -------------------+------------+--------------|
> After fresh boot | 79908 kB | 82284 kB |
> During test (peak) | 127940 kB | 130204 kB |
>
> Signed-off-by: Julian Pidancet <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>

Thanks for continuing to work on this.

I think we need more data beyond just kernbench. Christoph's point about
different page sizes is interesting. In the above results, I don't know
the page orders for the various slab caches that this workload will
stress. I think the memory overhead data may be different depending on
how slab_max_order is being used, if at all.

We should be able to run this through a variety of different benchmarks
and measure peak slab usage at the same time for due diligence. I support
the change in the default, I would just prefer to know what the
implications of it is.

Is it possible to collect data for other microbenchmarks and real-world
workloads? And perhaps also with different page sizes where this will
impact memory overhead more? I can help running more workloads once we
have the next set of data.

> ---
>
> v2:
> - Re-run benchmark to minimize variance in results due to CPU
> frequency scaling.
> - Record slab usage after boot and peaks during tests workload.
> - Include benchmark results in commit message.
> - Fix typo: s/MEGE/MERGE/.
> - Specify that "overhead" refers to memory overhead in SLUB doc.
>
> v1:
> - Link: https://lore.kernel.org/linux-mm/[email protected]/
>
> .../admin-guide/kernel-parameters.txt | 29 ++++++++++---------
> Documentation/mm/slub.rst | 7 +++--
> mm/Kconfig | 6 ++--
> 3 files changed, 22 insertions(+), 20 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c5e7bb4babf0..7e78471a96b7 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5652,21 +5652,22 @@
>
> slram= [HW,MTD]
>
> - slab_merge [MM]
> - Enable merging of slabs with similar size when the
> - kernel is built without CONFIG_SLAB_MERGE_DEFAULT.
> -
> slab_nomerge [MM]
> - Disable merging of slabs with similar size. May be
> - necessary if there is some reason to distinguish
> - allocs to different slabs, especially in hardened
> - environments where the risk of heap overflows and
> - layout control by attackers can usually be
> - frustrated by disabling merging. This will reduce
> - most of the exposure of a heap attack to a single
> - cache (risks via metadata attacks are mostly
> - unchanged). Debug options disable merging on their
> - own.
> + Disable merging of slabs with similar size when
> + the kernel is built with CONFIG_SLAB_MERGE_DEFAULT.
> + Allocations of the same size made in distinct
> + caches will be placed in separate slabs. In
> + hardened environment, the risk of heap overflows
> + and layout control by attackers can usually be
> + frustrated by disabling merging.
> +
> + slab_merge [MM]
> + Enable merging of slabs with similar size. May be
> + necessary to reduce overhead or increase cache
> + hotness of objects, at the cost of increased
> + exposure in case of a heap attack to a single
> + cache. (risks via metadata attacks are mostly
> + unchanged).
> For more information see Documentation/mm/slub.rst.
>
> slab_max_order= [MM, SLAB]
> diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst
> index be75971532f5..0e2ce82177c0 100644
> --- a/Documentation/mm/slub.rst
> +++ b/Documentation/mm/slub.rst
> @@ -122,9 +122,10 @@ used on the wrong slab.
> Slab merging
> ============
>
> -If no debug options are specified then SLUB may merge similar slabs together
> -in order to reduce overhead and increase cache hotness of objects.
> -``slabinfo -a`` displays which slabs were merged together.
> +If the kernel is built with ``CONFIG_SLAB_MERGE_DEFAULT`` or if ``slab_merge``
> +is specified on the kernel command line, then SLUB may merge similar slabs
> +together in order to reduce memory overhead and increase cache hotness of
> +objects. ``slabinfo -a`` displays which slabs were merged together.
>

Suggest mentioning that one of the primary goals of slab cache merging is
to reduce cache footprint.

> Slab validation
> ===============

2023-07-03 10:48:45

by Julian Pidancet

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Mon Jul 3, 2023 at 02:09, David Rientjes wrote:
> I think we need more data beyond just kernbench. Christoph's point about
> different page sizes is interesting. In the above results, I don't know
> the page orders for the various slab caches that this workload will
> stress. I think the memory overhead data may be different depending on
> how slab_max_order is being used, if at all.
>
> We should be able to run this through a variety of different benchmarks
> and measure peak slab usage at the same time for due diligence. I support
> the change in the default, I would just prefer to know what the
> implications of it is.
>
> Is it possible to collect data for other microbenchmarks and real-world
> workloads? And perhaps also with different page sizes where this will
> impact memory overhead more? I can help running more workloads once we
> have the next set of data.
>

David,

I agree about the need to perform those tests on hardware using larger
pages. I will collect data if I have the chance to get my hands on one
of these systems.

Do you have specific tests or workload in mind ? Compiling the kernel
with files sitting on an XFS partition is not exhaustive but it is the
only test I could think of that is both easy to set up and can be
reproduced while keeping external interferences as little as possible.

--
Julian


Attachments:
signature.asc (273.00 B)

2023-07-03 18:44:30

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Mon, Jul 03, 2023 at 12:33:25PM +0200, Julian Pidancet wrote:
> On Mon Jul 3, 2023 at 02:09, David Rientjes wrote:
> > I think we need more data beyond just kernbench. Christoph's point about
> > different page sizes is interesting. In the above results, I don't know
> > the page orders for the various slab caches that this workload will
> > stress. I think the memory overhead data may be different depending on
> > how slab_max_order is being used, if at all.
> >
> > We should be able to run this through a variety of different benchmarks
> > and measure peak slab usage at the same time for due diligence. I support
> > the change in the default, I would just prefer to know what the
> > implications of it is.
> >
> > Is it possible to collect data for other microbenchmarks and real-world
> > workloads? And perhaps also with different page sizes where this will
> > impact memory overhead more? I can help running more workloads once we
> > have the next set of data.
> >
>
> David,
>
> I agree about the need to perform those tests on hardware using larger
> pages. I will collect data if I have the chance to get my hands on one
> of these systems.
>
> Do you have specific tests or workload in mind ? Compiling the kernel
> with files sitting on an XFS partition is not exhaustive but it is the
> only test I could think of that is both easy to set up and can be
> reproduced while keeping external interferences as little as possible.

I think it is a sufficiently complicated heap allocation workload (and
real-world). I'd prefer we get this change landed in -next after -rc1 so
we can see if there are any regressions reported by the 0day and other
CI performance tests.

-Kees

--
Kees Cook

2023-07-03 20:29:52

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Mon, 3 Jul 2023, Julian Pidancet wrote:

> On Mon Jul 3, 2023 at 02:09, David Rientjes wrote:
> > I think we need more data beyond just kernbench. Christoph's point about
> > different page sizes is interesting. In the above results, I don't know
> > the page orders for the various slab caches that this workload will
> > stress. I think the memory overhead data may be different depending on
> > how slab_max_order is being used, if at all.
> >
> > We should be able to run this through a variety of different benchmarks
> > and measure peak slab usage at the same time for due diligence. I support
> > the change in the default, I would just prefer to know what the
> > implications of it is.
> >
> > Is it possible to collect data for other microbenchmarks and real-world
> > workloads? And perhaps also with different page sizes where this will
> > impact memory overhead more? I can help running more workloads once we
> > have the next set of data.
> >
>
> David,
>
> I agree about the need to perform those tests on hardware using larger
> pages. I will collect data if I have the chance to get my hands on one
> of these systems.
>

Thanks. I think arm64 should suffice for things like 64KB pages that
Christoph was referring to.

We also may want to play around with slub_min_order on the kernel command
line since that will inflate the size of slab pages and we may see some
different results because of the increased page size.

> Do you have specific tests or workload in mind ? Compiling the kernel
> with files sitting on an XFS partition is not exhaustive but it is the
> only test I could think of that is both easy to set up and can be
> reproduced while keeping external interferences as little as possible.
>

The ones that Binder, cc'd, used to evaluate SLAB vs SLUB memory overhead:

hackbench
netperf
redis
specjbb2015
unixbench
will-it-scale

And Vlastimil had also suggested a few XFS specific benchmarks.

I can try to help run benchmarks that you're not able to run or if you
can't get your hands on an arm64 system.

Additionally, I wouldn't consider this to be super urgent: slab cache
merging has been this way for several years, we have some time to do an
assessment of the implications of changing an important aspect of kernel
memory allocation that will affect everybody. I agree with the patch if
we can make it work, I'd just like to study the effect of it more fully
beyond some kernbench runs.

2023-07-06 08:13:33

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Mon, 3 Jul 2023, David Rientjes wrote:

> hackbench

Running hackbench on Skylake with v6.1.30 (A) and v6.1.30 + your patch
(B), for example:

LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
--------------------------------+-------+------------+------------+------------+------------+-----------+----------------
SReclaimable | | | | | | |
(A) v6.1.30 | 11 | 129480.000 | 233208.000 | 189936.364 | 204316.000 | 31465.625 |
(B) <same sha> | 11 | 139084.000 | 236772.000 | 198931.273 | 213672.000 | 30013.204 |
| | +7.42% | +1.53% | +4.74% | +4.58% | -4.62% | <not defined>
SUnreclaim | | | | | | |
(A) v6.1.30 | 11 | 305400.000 | 538744.000 | 422148.000 | 449344.000 | 65005.045 |
(B) <same sha> | 11 | 305780.000 | 518300.000 | 422219.636 | 450252.000 | 61245.137 |
| | +0.12% | -3.79% | +0.02% | +0.20% | -5.78% | <not defined>

Amount of reclaimable slab significantly increases which is likely not a
problem because, well, it's reclaimable. But I suspect we'll find other
interesting data points with the other suggested benchmarks.

And benchmark results:

LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
--------------------------------+-------+------------+------------+------------+------------+-----------+----------------
hackbench_process_pipes_234 | | | | | | |
(A) v6.1.30 | 7 | 1.735 | 1.979 | 1.831 | 1.835 | 0.086291 |
(B) <same sha> | 7 | 1.687 | 2.023 | 1.886 | 1.911 | 0.10276 |
| | -2.77% | +2.22% | +3.00% | +4.14% | +19.09% | <not defined>
hackbench_process_pipes_max | | | | | | |
(A) v6.1.30 | 7 | 1.735 | 1.979 | 1.831 | 1.835 | 0.086291 |
(B) <same sha> | 7 | 1.687 | 2.023 | 1.886 | 1.911 | 0.10276 |
| | -2.77% | +2.22% | +3.00% | +4.14% | +19.09% | - is good
hackbench_process_sockets_234 | | | | | | |
(A) v6.1.30 | 7 | 7.883 | 7.909 | 7.899 | 7.899 | 0.0087808 |
(B) <same sha> | 7 | 7.872 | 7.961 | 7.907 | 7.904 | 0.028019 |
| | -0.14% | +0.66% | +0.10% | +0.06% | +219.09% | <not defined>
hackbench_process_sockets_max | | | | | | |
(A) v6.1.30 | 7 | 7.883 | 7.909 | 7.899 | 7.899 | 0.0087808 |
(B) <same sha> | 7 | 7.872 | 7.961 | 7.907 | 7.904 | 0.028019 |
| | -0.14% | +0.66% | +0.10% | +0.06% | +219.09% | - is good
hackbench_thread_pipes_234 | | | | | | |
(A) v6.1.30 | 7 | 2.146 | 2.677 | 2.410 | 2.418 | 0.18143 |
(B) <same sha> | 7 | 2.016 | 2.514 | 2.268 | 2.241 | 0.17474 |
| | -6.06% | -6.09% | -5.88% | -7.32% | -3.69% | <not defined>
hackbench_thread_pipes_max | | | | | | |
(A) v6.1.30 | 7 | 2.146 | 2.677 | 2.410 | 2.418 | 0.18143 |
(B) <same sha> | 7 | 2.016 | 2.514 | 2.268 | 2.241 | 0.17474 |
| | -6.06% | -6.09% | -5.88% | -7.32% | -3.69% | - is good
hackbench_thread_sockets_234 | | | | | | |
(A) v6.1.30 | 7 | 8.025 | 8.127 | 8.084 | 8.085 | 0.029755 |
(B) <same sha> | 7 | 7.990 | 8.093 | 8.042 | 8.035 | 0.035152 |
| | -0.44% | -0.42% | -0.53% | -0.62% | +18.14% | <not defined>
hackbench_thread_sockets_max | | | | | | |
(A) v6.1.30 | 7 | 8.025 | 8.127 | 8.084 | 8.085 | 0.029755 |
(B) <same sha> | 7 | 7.990 | 8.093 | 8.042 | 8.035 | 0.035152 |
| | -0.44% | -0.42% | -0.53% | -0.62% | +18.14% | - is good

2023-07-09 10:03:16

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Thu, 6 Jul 2023, David Rientjes wrote:

> On Mon, 3 Jul 2023, David Rientjes wrote:
>
> > hackbench
>
> Running hackbench on Skylake with v6.1.30 (A) and v6.1.30 + your patch
> (B), for example:
>
> LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
> --------------------------------+-------+------------+------------+------------+------------+-----------+----------------
> SReclaimable | | | | | | |
> (A) v6.1.30 | 11 | 129480.000 | 233208.000 | 189936.364 | 204316.000 | 31465.625 |
> (B) <same sha> | 11 | 139084.000 | 236772.000 | 198931.273 | 213672.000 | 30013.204 |
> | | +7.42% | +1.53% | +4.74% | +4.58% | -4.62% | <not defined>
> SUnreclaim | | | | | | |
> (A) v6.1.30 | 11 | 305400.000 | 538744.000 | 422148.000 | 449344.000 | 65005.045 |
> (B) <same sha> | 11 | 305780.000 | 518300.000 | 422219.636 | 450252.000 | 61245.137 |
> | | +0.12% | -3.79% | +0.02% | +0.20% | -5.78% | <not defined>
>
> Amount of reclaimable slab significantly increases which is likely not a
> problem because, well, it's reclaimable. But I suspect we'll find other
> interesting data points with the other suggested benchmarks.
>
> And benchmark results:
>
> LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
> --------------------------------+-------+------------+------------+------------+------------+-----------+----------------
> hackbench_process_pipes_234 | | | | | | |
> (A) v6.1.30 | 7 | 1.735 | 1.979 | 1.831 | 1.835 | 0.086291 |
> (B) <same sha> | 7 | 1.687 | 2.023 | 1.886 | 1.911 | 0.10276 |
> | | -2.77% | +2.22% | +3.00% | +4.14% | +19.09% | <not defined>
> hackbench_process_pipes_max | | | | | | |
> (A) v6.1.30 | 7 | 1.735 | 1.979 | 1.831 | 1.835 | 0.086291 |
> (B) <same sha> | 7 | 1.687 | 2.023 | 1.886 | 1.911 | 0.10276 |
> | | -2.77% | +2.22% | +3.00% | +4.14% | +19.09% | - is good
> hackbench_process_sockets_234 | | | | | | |
> (A) v6.1.30 | 7 | 7.883 | 7.909 | 7.899 | 7.899 | 0.0087808 |
> (B) <same sha> | 7 | 7.872 | 7.961 | 7.907 | 7.904 | 0.028019 |
> | | -0.14% | +0.66% | +0.10% | +0.06% | +219.09% | <not defined>
> hackbench_process_sockets_max | | | | | | |
> (A) v6.1.30 | 7 | 7.883 | 7.909 | 7.899 | 7.899 | 0.0087808 |
> (B) <same sha> | 7 | 7.872 | 7.961 | 7.907 | 7.904 | 0.028019 |
> | | -0.14% | +0.66% | +0.10% | +0.06% | +219.09% | - is good
> hackbench_thread_pipes_234 | | | | | | |
> (A) v6.1.30 | 7 | 2.146 | 2.677 | 2.410 | 2.418 | 0.18143 |
> (B) <same sha> | 7 | 2.016 | 2.514 | 2.268 | 2.241 | 0.17474 |
> | | -6.06% | -6.09% | -5.88% | -7.32% | -3.69% | <not defined>
> hackbench_thread_pipes_max | | | | | | |
> (A) v6.1.30 | 7 | 2.146 | 2.677 | 2.410 | 2.418 | 0.18143 |
> (B) <same sha> | 7 | 2.016 | 2.514 | 2.268 | 2.241 | 0.17474 |
> | | -6.06% | -6.09% | -5.88% | -7.32% | -3.69% | - is good
> hackbench_thread_sockets_234 | | | | | | |
> (A) v6.1.30 | 7 | 8.025 | 8.127 | 8.084 | 8.085 | 0.029755 |
> (B) <same sha> | 7 | 7.990 | 8.093 | 8.042 | 8.035 | 0.035152 |
> | | -0.44% | -0.42% | -0.53% | -0.62% | +18.14% | <not defined>
> hackbench_thread_sockets_max | | | | | | |
> (A) v6.1.30 | 7 | 8.025 | 8.127 | 8.084 | 8.085 | 0.029755 |
> (B) <same sha> | 7 | 7.990 | 8.093 | 8.042 | 8.035 | 0.035152 |
> | | -0.44% | -0.42% | -0.53% | -0.62% | +18.14% | - is good

My takeaway from running half a dozen benchmarks on Intel is that
performance is more impacted than slab memory usage. There are slight
regressions in memory usage, but only measurable for SReclaimable which
would be the better form (as opposed to SUnreclaimable).

There are some substantial performance degradations, most notably
context_switch1_per_thread_ops which regressed ~21%. I'll need to repeat
that test to confirm it and can also try on cascadelake if it reproduces.

There are some more negligible redis, specjbb, and will-it-scale
regressions which don't look terribly concerning.

I'll try running performance tests on AMD Zen3 and also ARM with
PAGE_SIZE == 4KB and 64KB.

Unixbench memory usage and performance is within +/- 1% for every metric,
so it's not presented here.

Full results for Skylake, removing results where mean is +/- 1% of
baseline:

============================== MEMORY USAGE ==============================

hackbench
LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
--------------------------------+-------+------------+------------+------------+------------+-----------+----------------
SReclaimable | | | | | | |
(A) v6.1.30 | 11 | 129480.000 | 233208.000 | 189936.364 | 204316.000 | 31465.625 |
(B) v6.1.30 slab_nomerge | 11 | 139084.000 | 236772.000 | 198931.273 | 213672.000 | 30013.204 |
| | +7.42% | +1.53% | +4.74% | +4.58% | -4.62% | - is good

redis
LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
-------------------------------+-------+------------+------------+------------+------------+-----------+----------------
SReclaimable | | | | | | |
(A) v6.1.30 | 298 | 137056.000 | 238664.000 | 226005.477 | 226940.000 | 8109.328 |
(B) v6.1.30 slab_nomerge | 302 | 139664.000 | 242664.000 | 229096.689 | 230098.000 | 8215.134 |
| | +1.90% | +1.68% | +1.37% | +1.39% | +1.30% | - is good

specjbb2015
LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
-----------------------------------+-------+------------+------------+------------+------------+----------+----------------
SReclaimable | | | | | | |
(A) v6.1.30 | 1602 | 118344.000 | 217932.000 | 203559.618 | 205372.000 | 5314.410 |
(B) v6.1.30 slab_nomerge | 1655 | 128000.000 | 222536.000 | 208099.973 | 209396.000 | 4608.582 |
| | +8.16% | +2.11% | +2.23% | +1.96% | -13.28% | - is good

============================== PERFORMANCE ==============================

hackbench
LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
--------------------------------+-------+------------+------------+------------+------------+-----------+----------------
hackbench_process_pipes_234 | | | | | | |
(A) v6.1.30 | 7 | 1.735 | 1.979 | 1.831 | 1.835 | 0.086291 |
(B) v6.1.30 slab_nomerge | 7 | 1.687 | 2.023 | 1.886 | 1.911 | 0.10276 |
| | -2.77% | +2.22% | +3.00% | +4.14% | +19.09% | - is good
hackbench_thread_pipes_234 | | | | | | |
(A) v6.1.30 | 7 | 2.146 | 2.677 | 2.410 | 2.418 | 0.18143 |
(B) v6.1.30 slab_nomerge | 7 | 2.016 | 2.514 | 2.268 | 2.241 | 0.17474 |
| | -6.06% | -6.09% | -5.88% | -7.32% | -3.69% | - is good

redis
LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
-------------------------------+-------+------------+------------+------------+------------+-----------+----------------
redis_medium_max_INCR | | | | | | |
(A) v6.1.30 | 5 | 108695.660 | 112637.980 | 110639.626 | 109757.440 | 1668.190 |
(B) v6.1.30 slab_nomerge | 5 | 101853.740 | 106564.370 | 104166.478 | 104942.800 | 1833.377 |
| | -6.29% | -5.39% | -5.85% | -4.39% | +9.90% | + is good
redis_medium_max_LPOP | | | | | | |
(A) v6.1.30 | 5 | 102944.200 | 108471.630 | 105572.750 | 106303.820 | 2016.986 |
(B) v6.1.30 slab_nomerge | 5 | 101471.340 | 104231.810 | 103361.688 | 104090.770 | 1064.277 |
| | -1.43% | -3.91% | -2.09% | -2.08% | -47.23% | + is good
redis_medium_max_LPUSH | | | | | | |
(A) v6.1.30 | 10 | 99255.590 | 108295.430 | 105960.440 | 106338.120 | 2553.802 |
(B) v6.1.30 slab_nomerge | 10 | 100130.160 | 107032.000 | 104335.070 | 105091.705 | 2169.708 |
| | +0.88% | -1.17% | -1.53% | -1.17% | -15.04% | + is good
redis_medium_max_LRANGE_100 | | | | | | |
(A) v6.1.30 | 5 | 72427.030 | 73046.020 | 72671.814 | 72626.910 | 202.812 |
(B) v6.1.30 slab_nomerge | 5 | 70811.500 | 72030.540 | 71519.286 | 71761.750 | 450.918 |
| | -2.23% | -1.39% | -1.59% | -1.19% | +122.33% | + is good
redis_medium_max_MSET_10 | | | | | | |
(A) v6.1.30 | 5 | 87642.420 | 89798.850 | 89044.390 | 89102.740 | 769.933 |
(B) v6.1.30 slab_nomerge | 5 | 85287.840 | 89758.550 | 87876.598 | 88386.070 | 1641.608 |
| | -2.69% | -0.04% | -1.31% | -0.80% | +113.21% | + is good
redis_medium_max_PING_BULK | | | | | | |
(A) v6.1.30 | 5 | 101729.400 | 108189.980 | 105003.228 | 105307.490 | 2171.756 |
(B) v6.1.30 slab_nomerge | 5 | 100553.050 | 105340.770 | 102561.464 | 101947.190 | 1789.953 |
| | -1.16% | -2.63% | -2.33% | -3.19% | -17.58% | + is good
redis_medium_max_PING_INLINE | | | | | | |
(A) v6.1.30 | 5 | 102522.050 | 107503.770 | 105209.902 | 106033.300 | 1981.499 |
(B) v6.1.30 slab_nomerge | 5 | 97541.950 | 107319.170 | 103729.414 | 104854.780 | 3304.256 |
| | -4.86% | -0.17% | -1.41% | -1.11% | +66.76% | + is good
redis_medium_max_SET | | | | | | |
(A) v6.1.30 | 5 | 105663.570 | 112283.850 | 108917.118 | 109469.070 | 2663.234 |
(B) v6.1.30 slab_nomerge | 5 | 103071.540 | 106723.590 | 105128.226 | 106179.660 | 1666.892 |
| | -2.45% | -4.95% | -3.48% | -3.00% | -37.41% | + is good
redis_medium_max_SPOP | | | | | | |
(A) v6.1.30 | 5 | 104079.940 | 107238.610 | 105140.616 | 104964.840 | 1150.370 |
(B) v6.1.30 slab_nomerge | 5 | 102637.790 | 103885.300 | 103343.934 | 103412.620 | 437.159 |
| | -1.39% | -3.13% | -1.71% | -1.48% | -62.00% | + is good
redis_small_max_INCR | | | | | | |
(A) v6.1.30 | 5 | 98814.230 | 114942.530 | 107744.856 | 108813.920 | 6150.540 |
(B) v6.1.30 slab_nomerge | 5 | 99800.400 | 109529.020 | 104451.708 | 104058.270 | 3732.461 |
| | +1.00% | -4.71% | -3.06% | -4.37% | -39.31% | + is good
redis_small_max_LPOP | | | | | | |
(A) v6.1.30 | 5 | 104275.290 | 118764.840 | 108648.192 | 106951.880 | 5208.918 |
(B) v6.1.30 slab_nomerge | 5 | 97560.980 | 115074.800 | 103120.496 | 99800.400 | 6353.203 |
| | -6.44% | -3.11% | -5.09% | -6.69% | +21.97% | + is good
redis_small_max_LRANGE_100 | | | | | | |
(A) v6.1.30 | 5 | 67980.970 | 72992.700 | 71589.644 | 72150.070 | 1832.810 |
(B) v6.1.30 slab_nomerge | 5 | 64977.260 | 72046.110 | 70273.716 | 71684.590 | 2680.854 |
| | -4.42% | -1.30% | -1.84% | -0.65% | +46.27% | + is good
redis_small_max_MSET_10 | | | | | | |
(A) v6.1.30 | 5 | 90497.730 | 106044.540 | 100756.422 | 102880.660 | 5455.768 |
(B) v6.1.30 slab_nomerge | 5 | 97276.270 | 106951.880 | 102818.856 | 102880.660 | 3293.135 |
| | +7.49% | +0.86% | +2.05% | +0.00% | -39.64% | + is good
redis_small_max_PING_INLINE | | | | | | |
(A) v6.1.30 | 5 | 96153.850 | 108459.870 | 102493.414 | 102459.020 | 4995.757 |
(B) v6.1.30 slab_nomerge | 5 | 84317.030 | 116144.020 | 99995.920 | 98039.220 | 11045.861 |
| | -12.31% | +7.08% | -2.44% | -4.31% | +121.10% | + is good
redis_small_max_SADD | | | | | | |
(A) v6.1.30 | 5 | 106044.540 | 115606.940 | 109804.052 | 110375.270 | 3451.251 |
(B) v6.1.30 slab_nomerge | 5 | 95693.780 | 109769.480 | 102329.518 | 102249.490 | 4602.161 |
| | -9.76% | -5.05% | -6.81% | -7.36% | +33.35% | + is good
redis_small_max_SET | | | | | | |
(A) v6.1.30 | 5 | 91911.760 | 116686.120 | 104509.200 | 102354.150 | 8993.532 |
(B) v6.1.30 slab_nomerge | 5 | 100502.520 | 113636.370 | 108815.700 | 109649.120 | 4750.002 |
| | +9.35% | -2.61% | +4.12% | +7.13% | -47.18% | + is good
redis_small_max_SPOP | | | | | | |
(A) v6.1.30 | 5 | 96899.230 | 108695.650 | 103648.652 | 104931.800 | 3901.567 |
(B) v6.1.30 slab_nomerge | 5 | 93457.940 | 108108.110 | 101680.560 | 101626.020 | 5096.944 |
| | -3.55% | -0.54% | -1.90% | -3.15% | +30.64% | + is good

specjbb2015
LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
-----------------------------------+-------+------------+------------+------------+------------+----------+----------------
specjbb2015_single_Critical_JOPS | | | | | | |
(A) v6.1.30 | 1 | 46294.000 | 46294.000 | 46294.000 | 46294.000 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 46167.000 | 46167.000 | 46167.000 | 46167.000 | 0 |
| | -0.27% | -0.27% | -0.27% | -0.27% | --- | + is good
specjbb2015_single_Max_JOPS | | | | | | |
(A) v6.1.30 | 1 | 68842.000 | 68842.000 | 68842.000 | 68842.000 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 67801.000 | 67801.000 | 67801.000 | 67801.000 | 0 |
| | -1.51% | -1.51% | -1.51% | -1.51% | --- | + is good

vm-scalability
LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
---------------------------------------+-------+-----------------+-----------------+-----------------+-----------------+---------------+------------
300s_128G_truncate_throughput | | | | | | |
(A) v6.1.30 | 15 | 16398714804.000 | 17010339870.000 | 16772025703.867 | 16834675132.000 | 232697088.501 |
(B) v6.1.30 slab_nomerge | 15 | 16704416343.000 | 17271437122.000 | 16948419991.200 | 16821799877.000 | 233146680.475 |
| | +1.86% | +1.53% | +1.05% | -0.08% | +0.19% | + is good
300s_512G_anon_wx_rand_mt_throughput | | | | | | |
(A) v6.1.30 | 15 | 7198561.000 | 7359712.000 | 7263944.200 | 7259418.000 | 50394.115 |
(B) v6.1.30 slab_nomerge | 15 | 7191842.000 | 7628158.000 | 7390629.000 | 7407204.000 | 171602.612 |
| | -0.09% | +3.65% | +1.74% | +2.04% | +240.52% | + is good

will-it-scale
LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
-----------------------------------+-------+--------------+--------------+--------------+--------------+-----------+----------------
context_switch1_per_thread_ops | | | | | | |
(A) v6.1.30 | 1 | 324721.000 | 324721.000 | 324721.000 | 324721.000 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 255999.000 | 255999.000 | 255999.000 | 255999.000 | 0 |
!! REGRESSED !! | | -21.16% | -21.16% | -21.16% | -21.16% | --- | + is good
getppid1_scalability | | | | | | |
(A) v6.1.30 | 1 | 0.71943 | 0.71943 | 0.71943 | 0.71943 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 0.70923 | 0.70923 | 0.70923 | 0.70923 | 0 |
| | -1.42% | -1.42% | -1.42% | -1.42% | --- | + is good
mmap1_scalability | | | | | | |
(A) v6.1.30 | 1 | 0.18831 | 0.18831 | 0.18831 | 0.18831 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 0.18413 | 0.18413 | 0.18413 | 0.18413 | 0 |
| | -2.22% | -2.22% | -2.22% | -2.22% | --- | + is good
poll2_scalability | | | | | | |
(A) v6.1.30 | 1 | 0.45608 | 0.45608 | 0.45608 | 0.45608 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 0.44207 | 0.44207 | 0.44207 | 0.44207 | 0 |
| | -3.07% | -3.07% | -3.07% | -3.07% | --- | + is good
pthread_mutex1_scalability | | | | | | |
(A) v6.1.30 | 1 | 0.45207 | 0.45207 | 0.45207 | 0.45207 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 0.44194 | 0.44194 | 0.44194 | 0.44194 | 0 |
| | -2.24% | -2.24% | -2.24% | -2.24% | --- | + is good
pthread_mutex2_per_process_ops | | | | | | |
(A) v6.1.30 | 1 | 36292960.000 | 36292960.000 | 36292960.000 | 36292960.000 | 0 |
(B) <v6.1.30 slab_nomerge | 1 | 35446930.000 | 35446930.000 | 35446930.000 | 35446930.000 | 0 |
| | -2.33% | -2.33% | -2.33% | -2.33% | --- | + is good
signal1_scalability | | | | | | |
(A) v6.1.30 | 1 | 0.55541 | 0.55541 | 0.55541 | 0.55541 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 0.54773 | 0.54773 | 0.54773 | 0.54773 | 0 |
| | -1.38% | -1.38% | -1.38% | -1.38% | --- | + is good
unix1_scalability | | | | | | |
(A) v6.1.30 | 1 | 0.55085 | 0.55085 | 0.55085 | 0.55085 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 0.53957 | 0.53957 | 0.53957 | 0.53957 | 0 |
| | -2.05% | -2.05% | -2.05% | -2.05% | --- | + is good

2023-07-10 03:06:49

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Sun, 9 Jul 2023, David Rientjes wrote:

> There are some substantial performance degradations, most notably
> context_switch1_per_thread_ops which regressed ~21%. I'll need to repeat
> that test to confirm it and can also try on cascadelake if it reproduces.
>

So the regression on skylake for will-it-scale appears to be real:

LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
----------------------------------+-------+------------+------------+------------+------------+--------+------------
context_switch1_per_thread_ops | | | | | | |
(A) v6.1.30 | 1 | 314507.000 | 314507.000 | 314507.000 | 314507.000 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 257403.000 | 257403.000 | 257403.000 | 257403.000 | 0 |
!! REGRESSED !! | | -18.16% | -18.16% | -18.16% | -18.16% | --- | + is good

but I can't reproduce this on cascadelake:

LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
----------------------------------+-------+------------+------------+------------+------------+--------+------------
context_switch1_per_thread_ops | | | | | | |
(A) v6.1.30 | 1 | 301128.000 | 301128.000 | 301128.000 | 301128.000 | 0 |
(B) v6.1.30 slab_nomerge | 1 | 301282.000 | 301282.000 | 301282.000 | 301282.000 | 0 |
| | +0.05% | +0.05% | +0.05% | +0.05% | --- | + is good

So I'm a bit baffled at the moment.

I'll try to dig deeper and see what slab caches this benchmark exercises
that apparently no other benchmarks do. (I'm really hoping that the only
way to recover this performance is by something like
kmem_cache_create(SLAB_MERGE).)

2023-07-10 15:50:27

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On 7/3/23 22:17, David Rientjes wrote:
> Additionally, I wouldn't consider this to be super urgent: slab cache
> merging has been this way for several years, we have some time to do an
> assessment of the implications of changing an important aspect of kernel
> memory allocation that will affect everybody.

Agreed, although I wouldn't say "affect everybody" because the changed
upstream default may not automatically translate to what distros will use,
and I'd expect most people rely on distro kernels.

> I agree with the patch if
> we can make it work, I'd just like to study the effect of it more fully
> beyond some kernbench runs.


2023-07-18 12:28:08

by Julian Pidancet

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Mon Jul 10, 2023 at 04:40, David Rientjes wrote:
> On Sun, 9 Jul 2023, David Rientjes wrote:
>
> > There are some substantial performance degradations, most notably
> > context_switch1_per_thread_ops which regressed ~21%. I'll need to repeat
> > that test to confirm it and can also try on cascadelake if it reproduces.
> >
>
> So the regression on skylake for will-it-scale appears to be real:
>
> LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
> ----------------------------------+-------+------------+------------+------------+------------+--------+------------
> context_switch1_per_thread_ops | | | | | | |
> (A) v6.1.30 | 1 | 314507.000 | 314507.000 | 314507.000 | 314507.000 | 0 |
> (B) v6.1.30 slab_nomerge | 1 | 257403.000 | 257403.000 | 257403.000 | 257403.000 | 0 |
> !! REGRESSED !! | | -18.16% | -18.16% | -18.16% | -18.16% | --- | + is good
>
> but I can't reproduce this on cascadelake:
>
> LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV | DIRECTION
> ----------------------------------+-------+------------+------------+------------+------------+--------+------------
> context_switch1_per_thread_ops | | | | | | |
> (A) v6.1.30 | 1 | 301128.000 | 301128.000 | 301128.000 | 301128.000 | 0 |
> (B) v6.1.30 slab_nomerge | 1 | 301282.000 | 301282.000 | 301282.000 | 301282.000 | 0 |
> | | +0.05% | +0.05% | +0.05% | +0.05% | --- | + is good
>
> So I'm a bit baffled at the moment.
>
> I'll try to dig deeper and see what slab caches this benchmark exercises
> that apparently no other benchmarks do. (I'm really hoping that the only
> way to recover this performance is by something like
> kmem_cache_create(SLAB_MERGE).)

Hi David,

Many thanks for running all these tests. The amount of attention you've
given this change is simply amazing. I wish I could have been able to
assist you by doing more tests, but I've been lacking the necessary
resources to do so.

I'm as surprised as you are regarding the skylake regression. 20% is
quite a large number, but perhaps it's less worrying than it looks given
that benchmarks are usually very different from real-world workloads?

As Kees Cook was suggesting in his own reply, have you given a thought
about including this change in -next and see if there are regressions
showing up in CI performance tests results?

Regards,

--
Julian


Attachments:
signature.asc (273.00 B)

2023-07-26 00:30:01

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On Tue, 18 Jul 2023, Julian Pidancet wrote:

> Hi David,
>
> Many thanks for running all these tests. The amount of attention you've
> given this change is simply amazing. I wish I could have been able to
> assist you by doing more tests, but I've been lacking the necessary
> resources to do so.
>
> I'm as surprised as you are regarding the skylake regression. 20% is
> quite a large number, but perhaps it's less worrying than it looks given
> that benchmarks are usually very different from real-world workloads?
>

I'm not an expert on context_switch1_per_thread_ops so I can't infere
which workloads would be most affected by such a regression other than to
point out that -18% is quite substantial.

I'm still hoping to run some benchmarks with 64KB page sizes as Christoph
suggested, I should be able to do this with arm64.

It's ceratinly good news that the overall memory footprint doesn't change
much with this change.

> As Kees Cook was suggesting in his own reply, have you given a thought
> about including this change in -next and see if there are regressions
> showing up in CI performance tests results?
>

I assume that anything we can run with CI performance tests can also be
run without merging into -next?

The performance degradation is substantial for a microbenchmark, I'd like
to complete the picture on other benchmarks and do a complete analysis
with 64KB page sizes since I think the concern Christoph mentions could be
quite real. We just don't have the data yet to make an informed
assessment of it. Certainly would welcome any help that others would like
to provide for running benchmarks with this change as well :P

Once we have a complete picture, we might also want to discuss what we are
hoping to achieve with such a change. I was very supportive of it prior
to the -18% benchmark result. But if most users are simply using whatever
their distro defaults to and other users may already be opting into this
either by the kernel command line or .config, it's hard to determine
exactly the set of users that would be affected by this change. Suddenly
causing a -18% regression overnight for this would be surprising for them.

2023-07-26 09:02:41

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2] mm/slub: disable slab merging in the default configuration

On 7/26/23 01:25, David Rientjes wrote:
> On Tue, 18 Jul 2023, Julian Pidancet wrote:
>
>> Hi David,
>>
>> Many thanks for running all these tests. The amount of attention you've
>> given this change is simply amazing. I wish I could have been able to
>> assist you by doing more tests, but I've been lacking the necessary
>> resources to do so.
>>
>> I'm as surprised as you are regarding the skylake regression. 20% is
>> quite a large number, but perhaps it's less worrying than it looks given
>> that benchmarks are usually very different from real-world workloads?
>>
>
> I'm not an expert on context_switch1_per_thread_ops so I can't infere
> which workloads would be most affected by such a regression other than to
> point out that -18% is quite substantial.

It might turn out that this regression is accidental in that merging happens
to result in a better caching that benefits the particular skylake cache
hierarchy (but not others), because the workload happens to use two
different classes of objects that are compatible for merging, and uses them
with identical lifetime.

But that would be arguably still a corner case and not something to result
in a hard go/no-go for the change, as similar corner cases would likely
exist that would benefit from not merging.

But it's possible the reason for the regression is something less expectable
than the above hypotehsis, so indeed we should investigate first.

> I'm still hoping to run some benchmarks with 64KB page sizes as Christoph
> suggested, I should be able to do this with arm64.
>
> It's ceratinly good news that the overall memory footprint doesn't change
> much with this change.
>
>> As Kees Cook was suggesting in his own reply, have you given a thought
>> about including this change in -next and see if there are regressions
>> showing up in CI performance tests results?
>>
>
> I assume that anything we can run with CI performance tests can also be
> run without merging into -next?
>
> The performance degradation is substantial for a microbenchmark, I'd like
> to complete the picture on other benchmarks and do a complete analysis
> with 64KB page sizes since I think the concern Christoph mentions could be
> quite real. We just don't have the data yet to make an informed
> assessment of it. Certainly would welcome any help that others would like
> to provide for running benchmarks with this change as well :P
>
> Once we have a complete picture, we might also want to discuss what we are
> hoping to achieve with such a change. I was very supportive of it prior
> to the -18% benchmark result. But if most users are simply using whatever
> their distro defaults to and other users may already be opting into this
> either by the kernel command line or .config, it's hard to determine
> exactly the set of users that would be affected by this change. Suddenly
> causing a -18% regression overnight for this would be surprising for them.

What I'd hope to achieve is that if we find out that the differences of
merging/not-merging are negligible (modulo corner cases) for both
performance and memory, we'd not only change the default, but even make
merging more exceptional. It should still be done under SLUB_TINY, and maybe
we can keep the slab_merge boot option, but that's it?

Because in case they are comparable, not merging has indeed benefits -
/proc/slabinfo accounting is not misleading, so in case a bug is reported,
it's not neccessary to reboot with nomerge to get the real picture, then
there are the security benefits mentioned etc.