Hi,
this continues the discussion from [1]. Reasons to remove SLOB are
outlined there and no-one has objected so far. The last patch of this
series therefore deprecates CONFIG_SLOB and updates all the defconfigs
using CONFIG_SLOB=y in the tree.
There is a k210 board with 8MB RAM where switching to SLUB caused issues
[2] and the lkp bot wasn't also happy about code bloat [3]. To address
both, this series introduces CONFIG_SLUB_TINY to perform some rather
low-hanging fruit modifications to SLUB to reduce its memory overhead.
This seems to have been successful at least in the k210 case [4]. I
consider this as an acceptable tradeoff for getting rid of SLOB.
The series is also available in git:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-tiny-v1r2
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/Y25E9cJbhDAKi1vd@99bb1221be19/
[4] https://lore.kernel.org/all/[email protected]/
Vlastimil Babka (12):
mm, slab: ignore hardened usercopy parameters when disabled
mm, slub: add CONFIG_SLUB_TINY
mm, slub: disable SYSFS support with CONFIG_SLUB_TINY
mm, slub: retain no free slabs on partial list with CONFIG_SLUB_TINY
mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY
mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY
mm, slab: ignore SLAB_RECLAIM_ACCOUNT with CONFIG_SLUB_TINY
mm, slub: refactor free debug processing
mm, slub: split out allocations from pre/post hooks
mm, slub: remove percpu slabs with CONFIG_SLUB_TINY
mm, slub: don't aggressively inline with CONFIG_SLUB_TINY
mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED
arch/arm/configs/clps711x_defconfig | 3 +-
arch/arm/configs/collie_defconfig | 3 +-
arch/arm/configs/multi_v4t_defconfig | 3 +-
arch/arm/configs/omap1_defconfig | 3 +-
arch/arm/configs/pxa_defconfig | 3 +-
arch/arm/configs/tct_hammer_defconfig | 3 +-
arch/arm/configs/xcep_defconfig | 3 +-
arch/openrisc/configs/or1ksim_defconfig | 3 +-
arch/openrisc/configs/simple_smp_defconfig | 3 +-
arch/riscv/configs/nommu_k210_defconfig | 3 +-
.../riscv/configs/nommu_k210_sdcard_defconfig | 3 +-
arch/riscv/configs/nommu_virt_defconfig | 3 +-
arch/sh/configs/rsk7201_defconfig | 3 +-
arch/sh/configs/rsk7203_defconfig | 3 +-
arch/sh/configs/se7206_defconfig | 3 +-
arch/sh/configs/shmin_defconfig | 3 +-
arch/sh/configs/shx3_defconfig | 3 +-
include/linux/slab.h | 8 +
include/linux/slub_def.h | 6 +-
kernel/configs/tiny.config | 5 +-
mm/Kconfig | 38 +-
mm/Kconfig.debug | 2 +-
mm/slab_common.c | 16 +-
mm/slub.c | 415 ++++++++++++------
24 files changed, 377 insertions(+), 164 deletions(-)
--
2.38.1
As explained in [1], we would like to remove SLOB if possible.
- There are no known users that need its somewhat lower memory footprint
so much that they cannot handle SLUB (after some modifications by the
previous patches) instead.
- It is an extra maintenance burden, and a number of features are
incompatible with it.
- It blocks the API improvement of allowing kfree() on objects allocated
via kmem_cache_alloc().
As the first step, rename the CONFIG_SLOB option in the slab allocator
configuration choice to CONFIG_SLOB_DEPRECATED. Add CONFIG_SLOB
depending on CONFIG_SLOB_DEPRECATED as an internal option to avoid code
churn. This will cause existing .config files and defconfigs with
CONFIG_SLOB=y to silently switch to the default (and recommended
replacement) SLUB, while still allowing SLOB to be configured by anyone
that notices and needs it. But those should contact the slab maintainers
and [email protected] as explained in the updated help. With no valid
objections, the plan is to update the existing defconfigs to SLUB and
remove SLOB in a few cycles.
To make SLUB more suitable replacement for SLOB, a CONFIG_SLUB_TINY
option was introduced to limit SLUB's memory overhead.
There is a number of defconfigs specifying CONFIG_SLOB=y. As part of
this patch, update them to select CONFIG_SLUB and CONFIG_SLUB_TINY.
[1] https://lore.kernel.org/all/[email protected]/
Cc: Russell King <[email protected]>
Cc: Aaro Koskinen <[email protected]>
Cc: Janusz Krzysztofik <[email protected]>
Cc: Tony Lindgren <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Conor Dooley <[email protected]>
Cc: Damien Le Moal <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
arch/arm/configs/clps711x_defconfig | 3 ++-
arch/arm/configs/collie_defconfig | 3 ++-
arch/arm/configs/multi_v4t_defconfig | 3 ++-
arch/arm/configs/omap1_defconfig | 3 ++-
arch/arm/configs/pxa_defconfig | 3 ++-
arch/arm/configs/tct_hammer_defconfig | 3 ++-
arch/arm/configs/xcep_defconfig | 3 ++-
arch/openrisc/configs/or1ksim_defconfig | 3 ++-
arch/openrisc/configs/simple_smp_defconfig | 3 ++-
arch/riscv/configs/nommu_k210_defconfig | 3 ++-
arch/riscv/configs/nommu_k210_sdcard_defconfig | 3 ++-
arch/riscv/configs/nommu_virt_defconfig | 3 ++-
arch/sh/configs/rsk7201_defconfig | 3 ++-
arch/sh/configs/rsk7203_defconfig | 3 ++-
arch/sh/configs/se7206_defconfig | 3 ++-
arch/sh/configs/shmin_defconfig | 3 ++-
arch/sh/configs/shx3_defconfig | 3 ++-
kernel/configs/tiny.config | 5 +++--
mm/Kconfig | 17 +++++++++++++++--
19 files changed, 52 insertions(+), 21 deletions(-)
diff --git a/arch/arm/configs/clps711x_defconfig b/arch/arm/configs/clps711x_defconfig
index 92481b2a88fa..adcee238822a 100644
--- a/arch/arm/configs/clps711x_defconfig
+++ b/arch/arm/configs/clps711x_defconfig
@@ -14,7 +14,8 @@ CONFIG_ARCH_EDB7211=y
CONFIG_ARCH_P720T=y
CONFIG_AEABI=y
# CONFIG_COREDUMP is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
diff --git a/arch/arm/configs/collie_defconfig b/arch/arm/configs/collie_defconfig
index 2a2d2cb3ce2e..69341c33e0cc 100644
--- a/arch/arm/configs/collie_defconfig
+++ b/arch/arm/configs/collie_defconfig
@@ -13,7 +13,8 @@ CONFIG_CMDLINE="noinitrd root=/dev/mtdblock2 rootfstype=jffs2 fbcon=rotate:1"
CONFIG_FPE_NWFPE=y
CONFIG_PM=y
# CONFIG_SWAP is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
diff --git a/arch/arm/configs/multi_v4t_defconfig b/arch/arm/configs/multi_v4t_defconfig
index e2fd822f741a..b60000a89aff 100644
--- a/arch/arm/configs/multi_v4t_defconfig
+++ b/arch/arm/configs/multi_v4t_defconfig
@@ -25,7 +25,8 @@ CONFIG_ARM_CLPS711X_CPUIDLE=y
CONFIG_JUMP_LABEL=y
CONFIG_PARTITION_ADVANCED=y
# CONFIG_COREDUMP is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_MTD=y
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_BLOCK=y
diff --git a/arch/arm/configs/omap1_defconfig b/arch/arm/configs/omap1_defconfig
index 70511fe4b3ec..246f1bba7df5 100644
--- a/arch/arm/configs/omap1_defconfig
+++ b/arch/arm/configs/omap1_defconfig
@@ -42,7 +42,8 @@ CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_BINFMT_MISC=y
# CONFIG_SWAP is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_NET=y
CONFIG_PACKET=y
diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig
index d60cc9cc4c21..0a0f12df40b5 100644
--- a/arch/arm/configs/pxa_defconfig
+++ b/arch/arm/configs/pxa_defconfig
@@ -49,7 +49,8 @@ CONFIG_PARTITION_ADVANCED=y
CONFIG_LDM_PARTITION=y
CONFIG_CMDLINE_PARTITION=y
CONFIG_BINFMT_MISC=y
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
# CONFIG_COMPACTION is not set
CONFIG_NET=y
CONFIG_PACKET=y
diff --git a/arch/arm/configs/tct_hammer_defconfig b/arch/arm/configs/tct_hammer_defconfig
index 3b29ae1fb750..6bd38b6f22c4 100644
--- a/arch/arm/configs/tct_hammer_defconfig
+++ b/arch/arm/configs/tct_hammer_defconfig
@@ -19,7 +19,8 @@ CONFIG_FPE_NWFPE=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_SWAP is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
diff --git a/arch/arm/configs/xcep_defconfig b/arch/arm/configs/xcep_defconfig
index ea59e4b6bfc5..6bd9f71b71fc 100644
--- a/arch/arm/configs/xcep_defconfig
+++ b/arch/arm/configs/xcep_defconfig
@@ -26,7 +26,8 @@ CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_BLOCK is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_NET=y
diff --git a/arch/openrisc/configs/or1ksim_defconfig b/arch/openrisc/configs/or1ksim_defconfig
index 6e1e004047c7..0116e465238f 100644
--- a/arch/openrisc/configs/or1ksim_defconfig
+++ b/arch/openrisc/configs/or1ksim_defconfig
@@ -10,7 +10,8 @@ CONFIG_EXPERT=y
# CONFIG_AIO is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_MODULES=y
# CONFIG_BLOCK is not set
CONFIG_OPENRISC_BUILTIN_DTB="or1ksim"
diff --git a/arch/openrisc/configs/simple_smp_defconfig b/arch/openrisc/configs/simple_smp_defconfig
index ff49d868e040..b990cb6c9309 100644
--- a/arch/openrisc/configs/simple_smp_defconfig
+++ b/arch/openrisc/configs/simple_smp_defconfig
@@ -16,7 +16,8 @@ CONFIG_EXPERT=y
# CONFIG_AIO is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_MODULES=y
# CONFIG_BLOCK is not set
CONFIG_OPENRISC_BUILTIN_DTB="simple_smp"
diff --git a/arch/riscv/configs/nommu_k210_defconfig b/arch/riscv/configs/nommu_k210_defconfig
index 96fe8def644c..79b3ccd58ff0 100644
--- a/arch/riscv/configs/nommu_k210_defconfig
+++ b/arch/riscv/configs/nommu_k210_defconfig
@@ -25,7 +25,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_EMBEDDED=y
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
# CONFIG_MMU is not set
CONFIG_SOC_CANAAN=y
CONFIG_NONPORTABLE=y
diff --git a/arch/riscv/configs/nommu_k210_sdcard_defconfig b/arch/riscv/configs/nommu_k210_sdcard_defconfig
index 379740654373..6b80bb13b8ed 100644
--- a/arch/riscv/configs/nommu_k210_sdcard_defconfig
+++ b/arch/riscv/configs/nommu_k210_sdcard_defconfig
@@ -17,7 +17,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_EMBEDDED=y
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
# CONFIG_MMU is not set
CONFIG_SOC_CANAAN=y
CONFIG_NONPORTABLE=y
diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
index 1a56eda5ce46..4cf0f297091e 100644
--- a/arch/riscv/configs/nommu_virt_defconfig
+++ b/arch/riscv/configs/nommu_virt_defconfig
@@ -22,7 +22,8 @@ CONFIG_EXPERT=y
# CONFIG_KALLSYMS is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
# CONFIG_MMU is not set
CONFIG_SOC_VIRT=y
CONFIG_NONPORTABLE=y
diff --git a/arch/sh/configs/rsk7201_defconfig b/arch/sh/configs/rsk7201_defconfig
index 619c18699459..376e95fa77bc 100644
--- a/arch/sh/configs/rsk7201_defconfig
+++ b/arch/sh/configs/rsk7201_defconfig
@@ -10,7 +10,8 @@ CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_AIO is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_PROFILING=y
CONFIG_MODULES=y
# CONFIG_BLK_DEV_BSG is not set
diff --git a/arch/sh/configs/rsk7203_defconfig b/arch/sh/configs/rsk7203_defconfig
index d00fafc021e1..1d5fd67a3949 100644
--- a/arch/sh/configs/rsk7203_defconfig
+++ b/arch/sh/configs/rsk7203_defconfig
@@ -11,7 +11,8 @@ CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_KALLSYMS_ALL=y
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_PROFILING=y
CONFIG_MODULES=y
# CONFIG_BLK_DEV_BSG is not set
diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
index 122216123e63..78e0e7be57ee 100644
--- a/arch/sh/configs/se7206_defconfig
+++ b/arch/sh/configs/se7206_defconfig
@@ -21,7 +21,8 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_ELF_CORE is not set
# CONFIG_COMPAT_BRK is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_PROFILING=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
diff --git a/arch/sh/configs/shmin_defconfig b/arch/sh/configs/shmin_defconfig
index c0b6f40d01cc..e078b193a78a 100644
--- a/arch/sh/configs/shmin_defconfig
+++ b/arch/sh/configs/shmin_defconfig
@@ -9,7 +9,8 @@ CONFIG_LOG_BUF_SHIFT=14
# CONFIG_FUTEX is not set
# CONFIG_EPOLL is not set
# CONFIG_SHMEM is not set
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_CPU_SUBTYPE_SH7706=y
CONFIG_MEMORY_START=0x0c000000
diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
index 32ec6eb1eabc..aa353dff7f19 100644
--- a/arch/sh/configs/shx3_defconfig
+++ b/arch/sh/configs/shx3_defconfig
@@ -20,7 +20,8 @@ CONFIG_USER_NS=y
CONFIG_PID_NS=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_KALLSYMS_ALL=y
-CONFIG_SLOB=y
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
CONFIG_PROFILING=y
CONFIG_KPROBES=y
CONFIG_MODULES=y
diff --git a/kernel/configs/tiny.config b/kernel/configs/tiny.config
index 8a44b93da0f3..c2f9c912df1c 100644
--- a/kernel/configs/tiny.config
+++ b/kernel/configs/tiny.config
@@ -7,5 +7,6 @@ CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_SLAB is not set
-# CONFIG_SLUB is not set
-CONFIG_SLOB=y
+# CONFIG_SLOB_DEPRECATED is not set
+CONFIG_SLUB=y
+CONFIG_SLUB_TINY=y
diff --git a/mm/Kconfig b/mm/Kconfig
index 5941cb34e30d..dcc49c69552f 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -219,17 +219,30 @@ config SLUB
and has enhanced diagnostics. SLUB is the default choice for
a slab allocator.
-config SLOB
+config SLOB_DEPRECATED
depends on EXPERT
- bool "SLOB (Simple Allocator)"
+ bool "SLOB (Simple Allocator - DEPRECATED)"
depends on !PREEMPT_RT
help
+ Deprecated and scheduled for removal in a few cycles. SLUB
+ recommended as replacement. CONFIG_SLUB_TINY can be considered
+ on systems with 16MB or less RAM.
+
+ If you need SLOB to stay, please contact [email protected] and
+ people listed in the SLAB ALLOCATOR section of MAINTAINERS file,
+ with your use case.
+
SLOB replaces the stock allocator with a drastically simpler
allocator. SLOB is generally more space efficient but
does not perform as well on large systems.
endchoice
+config SLOB
+ bool
+ default y
+ depends on SLOB_DEPRECATED
+
config SLUB_TINY
bool "Configure SLUB for minimal memory footprint"
depends on SLUB && EXPERT
--
2.38.1
SLUB gets most of its scalability by percpu slabs. However for
CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
associated code. Additionally to the slab page savings, this reduces
percpu allocator usage, and code size.
This change builds on recent commit c7323a5ad078 ("mm/slub: restrict
sysfs validation to debug caches and make it safe"), as caches with
enabled debugging also avoid percpu slabs and all allocations and
freeing ends up working with the partial list. With a bit more
refactoring by the preceding patches, use the same code paths with
CONFIG_SLUB_TINY.
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/slub_def.h | 4 ++
mm/slub.c | 102 +++++++++++++++++++++++++++++++++++++--
2 files changed, 103 insertions(+), 3 deletions(-)
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index c186f25c8148..79df64eb054e 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -41,6 +41,7 @@ enum stat_item {
CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */
NR_SLUB_STAT_ITEMS };
+#ifndef CONFIG_SLUB_TINY
/*
* When changing the layout, make sure freelist and tid are still compatible
* with this_cpu_cmpxchg_double() alignment requirements.
@@ -57,6 +58,7 @@ struct kmem_cache_cpu {
unsigned stat[NR_SLUB_STAT_ITEMS];
#endif
};
+#endif /* CONFIG_SLUB_TINY */
#ifdef CONFIG_SLUB_CPU_PARTIAL
#define slub_percpu_partial(c) ((c)->partial)
@@ -88,7 +90,9 @@ struct kmem_cache_order_objects {
* Slab cache management.
*/
struct kmem_cache {
+#ifndef CONFIG_SLUB_TINY
struct kmem_cache_cpu __percpu *cpu_slab;
+#endif
/* Used for retrieving partial slabs, etc. */
slab_flags_t flags;
unsigned long min_partial;
diff --git a/mm/slub.c b/mm/slub.c
index 5677db3f6d15..7f1cd702c3b4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -337,10 +337,12 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si)
*/
static nodemask_t slab_nodes;
+#ifndef CONFIG_SLUB_TINY
/*
* Workqueue used for flush_cpu_slab().
*/
static struct workqueue_struct *flushwq;
+#endif
/********************************************************************
* Core slab cache functions
@@ -386,10 +388,12 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)
return freelist_dereference(s, object + s->offset);
}
+#ifndef CONFIG_SLUB_TINY
static void prefetch_freepointer(const struct kmem_cache *s, void *object)
{
prefetchw(object + s->offset);
}
+#endif
/*
* When running under KMSAN, get_freepointer_safe() may return an uninitialized
@@ -1681,11 +1685,13 @@ static inline void inc_slabs_node(struct kmem_cache *s, int node,
static inline void dec_slabs_node(struct kmem_cache *s, int node,
int objects) {}
+#ifndef CONFIG_SLUB_TINY
static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
void **freelist, void *nextfree)
{
return false;
}
+#endif
#endif /* CONFIG_SLUB_DEBUG */
/*
@@ -2219,7 +2225,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
if (!pfmemalloc_match(slab, pc->flags))
continue;
- if (kmem_cache_debug(s)) {
+ if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
object = alloc_single_from_partial(s, n, slab,
pc->orig_size);
if (object)
@@ -2334,6 +2340,8 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context
return get_any_partial(s, pc);
}
+#ifndef CONFIG_SLUB_TINY
+
#ifdef CONFIG_PREEMPTION
/*
* Calculate the next globally unique transaction for disambiguation
@@ -2347,7 +2355,7 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context
* different cpus.
*/
#define TID_STEP 1
-#endif
+#endif /* CONFIG_PREEMPTION */
static inline unsigned long next_tid(unsigned long tid)
{
@@ -2808,6 +2816,13 @@ static int slub_cpu_dead(unsigned int cpu)
return 0;
}
+#else /* CONFIG_SLUB_TINY */
+static inline void flush_all_cpus_locked(struct kmem_cache *s) { }
+static inline void flush_all(struct kmem_cache *s) { }
+static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu) { }
+static inline int slub_cpu_dead(unsigned int cpu) { return 0; }
+#endif /* CONFIG_SLUB_TINY */
+
/*
* Check if the objects in a per cpu structure fit numa
* locality expectations.
@@ -2955,6 +2970,7 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags)
return true;
}
+#ifndef CONFIG_SLUB_TINY
/*
* Check the slab->freelist and either transfer the freelist to the
* per cpu freelist or deactivate the slab.
@@ -3320,6 +3336,33 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s,
return object;
}
+#else /* CONFIG_SLUB_TINY */
+static void *__slab_alloc_node(struct kmem_cache *s,
+ gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
+{
+ struct partial_context pc;
+ struct slab *slab;
+ void *object;
+
+ pc.flags = gfpflags;
+ pc.slab = &slab;
+ pc.orig_size = orig_size;
+ object = get_partial(s, node, &pc);
+
+ if (object)
+ return object;
+
+ slab = new_slab(s, gfpflags, node);
+ if (unlikely(!slab)) {
+ slab_out_of_memory(s, gfpflags, node);
+ return NULL;
+ }
+
+ object = alloc_single_from_new_slab(s, slab, orig_size);
+
+ return object;
+}
+#endif /* CONFIG_SLUB_TINY */
/*
* If the object has been wiped upon free, make sure it's fully initialized by
@@ -3503,7 +3546,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
if (kfence_free(head))
return;
- if (kmem_cache_debug(s)) {
+ if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
free_to_partial_list(s, slab, head, tail, cnt, addr);
return;
}
@@ -3604,6 +3647,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
discard_slab(s, slab);
}
+#ifndef CONFIG_SLUB_TINY
/*
* Fastpath with forced inlining to produce a kfree and kmem_cache_free that
* can perform fastpath freeing without additional function calls.
@@ -3678,6 +3722,16 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
}
stat(s, FREE_FASTPATH);
}
+#else /* CONFIG_SLUB_TINY */
+static void do_slab_free(struct kmem_cache *s,
+ struct slab *slab, void *head, void *tail,
+ int cnt, unsigned long addr)
+{
+ void *tail_obj = tail ? : head;
+
+ __slab_free(s, slab, head, tail_obj, cnt, addr);
+}
+#endif /* CONFIG_SLUB_TINY */
static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab,
void *head, void *tail, void **p, int cnt,
@@ -3812,6 +3866,7 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
}
EXPORT_SYMBOL(kmem_cache_free_bulk);
+#ifndef CONFIG_SLUB_TINY
static inline int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
size_t size, void **p, struct obj_cgroup *objcg)
{
@@ -3880,6 +3935,36 @@ static inline int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
return 0;
}
+#else /* CONFIG_SLUB_TINY */
+static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
+ size_t size, void **p, struct obj_cgroup *objcg)
+{
+ int i;
+
+ for (i = 0; i < size; i++) {
+ void *object = kfence_alloc(s, s->object_size, flags);
+
+ if (unlikely(object)) {
+ p[i] = object;
+ continue;
+ }
+
+ p[i] = __slab_alloc_node(s, flags, NUMA_NO_NODE,
+ _RET_IP_, s->object_size);
+ if (unlikely(!p[i]))
+ goto error;
+
+ maybe_wipe_obj_freeptr(s, p[i]);
+ }
+
+ return i;
+
+error:
+ slab_post_alloc_hook(s, objcg, flags, i, p, false);
+ kmem_cache_free_bulk(s, i, p);
+ return 0;
+}
+#endif /* CONFIG_SLUB_TINY */
/* Note that interrupts must be enabled when calling this function. */
int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
@@ -4059,6 +4144,7 @@ init_kmem_cache_node(struct kmem_cache_node *n)
#endif
}
+#ifndef CONFIG_SLUB_TINY
static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
{
BUILD_BUG_ON(PERCPU_DYNAMIC_EARLY_SIZE <
@@ -4078,6 +4164,12 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
return 1;
}
+#else
+static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
+{
+ return 1;
+}
+#endif /* CONFIG_SLUB_TINY */
static struct kmem_cache *kmem_cache_node;
@@ -4140,7 +4232,9 @@ static void free_kmem_cache_nodes(struct kmem_cache *s)
void __kmem_cache_release(struct kmem_cache *s)
{
cache_random_seq_destroy(s);
+#ifndef CONFIG_SLUB_TINY
free_percpu(s->cpu_slab);
+#endif
free_kmem_cache_nodes(s);
}
@@ -4917,8 +5011,10 @@ void __init kmem_cache_init(void)
void __init kmem_cache_init_late(void)
{
+#ifndef CONFIG_SLUB_TINY
flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0);
WARN_ON(!flushwq);
+#endif
}
struct kmem_cache *
--
2.38.1
SLUB will leave a number of slabs on the partial list even if they are
empty, to avoid some slab freeing and reallocation. The goal of
CONFIG_SLUB_TINY is to minimize memory overhead, so set the limits to 0
for immediate slab page freeing.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/slub.c b/mm/slub.c
index ab085aa2f1f0..917b79278bad 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -241,6 +241,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
/* Enable to log cmpxchg failures */
#undef SLUB_DEBUG_CMPXCHG
+#ifndef CONFIG_SLUB_TINY
/*
* Minimum number of partial slabs. These will be left on the partial
* lists even if they are empty. kmem_cache_shrink may reclaim them.
@@ -253,6 +254,10 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
* sort the partial list by the number of objects in use.
*/
#define MAX_PARTIAL 10
+#else
+#define MIN_PARTIAL 0
+#define MAX_PARTIAL 0
+#endif
#define DEBUG_DEFAULT_FLAGS (SLAB_CONSISTENCY_CHECKS | SLAB_RED_ZONE | \
SLAB_POISON | SLAB_STORE_USER)
--
2.38.1
SLAB_RECLAIM_ACCOUNT caches allocate their slab pages with
__GFP_RECLAIMABLE and can help against fragmentation by grouping pages
by mobility, but on tiny systems mobility grouping is likely disabled
anyway and ignoring SLAB_RECLAIM_ACCOUNT might instead lead to merging
of caches that are made incompatible just by the flag.
Thus with CONFIG_SLUB_TINY, make SLAB_RECLAIM_ACCOUNT ineffective.
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/slab.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 3ce9474c90ab..1cbbda03ad06 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -129,7 +129,11 @@
/* The following flags affect the page allocator grouping pages by mobility */
/* Objects are reclaimable */
+#ifndef CONFIG_SLUB_TINY
#define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U)
+#else
+#define SLAB_RECLAIM_ACCOUNT 0
+#endif
#define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */
/*
--
2.38.1
With CONFIG_HARDENED_USERCOPY not enabled, there are no
__check_heap_object() checks happening that would use the kmem_cache
useroffset and usersize fields. Yet the fields are still initialized,
preventing merging of otherwise compatible caches. Thus ignore the
values passed to cache creation and leave them zero when
CONFIG_HARDENED_USERCOPY is disabled.
In a quick virtme boot test, this has reduced the number of caches in
/proc/slabinfo from 131 to 111.
Cc: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slab_common.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0042fb2730d1..a8cb5de255fc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -317,7 +317,8 @@ kmem_cache_create_usercopy(const char *name,
flags &= CACHE_CREATE_MASK;
/* Fail closed on bad usersize of useroffset values. */
- if (WARN_ON(!usersize && useroffset) ||
+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
+ WARN_ON(!usersize && useroffset) ||
WARN_ON(size < usersize || size - usersize < useroffset))
usersize = useroffset = 0;
@@ -640,6 +641,9 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
align = max(align, size);
s->align = calculate_alignment(flags, align, size);
+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY))
+ useroffset = usersize = 0;
+
s->useroffset = useroffset;
s->usersize = usersize;
--
2.38.1
Since commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
caches and make it safe"), caches with debugging enabled use the
free_debug_processing() function to do both freeing checks and actual
freeing to partial list under list_lock, bypassing the fast paths.
We will want to use the same path for CONFIG_SLUB_TINY, but without the
debugging checks, so refactor the code so that free_debug_processing()
does only the checks, while the freeing is handled by a new function
free_to_partial_list().
For consistency, change return parameter alloc_debug_processing() from
int to bool and correct the !SLUB_DEBUG variant to return true and not
false. This didn't matter until now, but will in the following changes.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 154 +++++++++++++++++++++++++++++-------------------------
1 file changed, 83 insertions(+), 71 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index bf726dd00f7d..fd56d7cca9c2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1368,7 +1368,7 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
return 1;
}
-static noinline int alloc_debug_processing(struct kmem_cache *s,
+static noinline bool alloc_debug_processing(struct kmem_cache *s,
struct slab *slab, void *object, int orig_size)
{
if (s->flags & SLAB_CONSISTENCY_CHECKS) {
@@ -1380,7 +1380,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
trace(s, slab, object, 1);
set_orig_size(s, object, orig_size);
init_object(s, object, SLUB_RED_ACTIVE);
- return 1;
+ return true;
bad:
if (folio_test_slab(slab_folio(slab))) {
@@ -1393,7 +1393,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
slab->inuse = slab->objects;
slab->freelist = NULL;
}
- return 0;
+ return false;
}
static inline int free_consistency_checks(struct kmem_cache *s,
@@ -1646,17 +1646,17 @@ static inline void setup_object_debug(struct kmem_cache *s, void *object) {}
static inline
void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
-static inline int alloc_debug_processing(struct kmem_cache *s,
- struct slab *slab, void *object, int orig_size) { return 0; }
+static inline bool alloc_debug_processing(struct kmem_cache *s,
+ struct slab *slab, void *object, int orig_size) { return true; }
-static inline void free_debug_processing(
- struct kmem_cache *s, struct slab *slab,
- void *head, void *tail, int bulk_cnt,
- unsigned long addr) {}
+static inline bool free_debug_processing(struct kmem_cache *s,
+ struct slab *slab, void *head, void *tail, int *bulk_cnt,
+ unsigned long addr, depot_stack_handle_t handle) { return true; }
static inline void slab_pad_check(struct kmem_cache *s, struct slab *slab) {}
static inline int check_object(struct kmem_cache *s, struct slab *slab,
void *object, u8 val) { return 1; }
+static inline depot_stack_handle_t set_track_prepare(void) { return 0; }
static inline void set_track(struct kmem_cache *s, void *object,
enum track_item alloc, unsigned long addr) {}
static inline void add_full(struct kmem_cache *s, struct kmem_cache_node *n,
@@ -2833,38 +2833,28 @@ static inline unsigned long node_nr_objs(struct kmem_cache_node *n)
}
/* Supports checking bulk free of a constructed freelist */
-static noinline void free_debug_processing(
- struct kmem_cache *s, struct slab *slab,
- void *head, void *tail, int bulk_cnt,
- unsigned long addr)
+static inline bool free_debug_processing(struct kmem_cache *s,
+ struct slab *slab, void *head, void *tail, int *bulk_cnt,
+ unsigned long addr, depot_stack_handle_t handle)
{
- struct kmem_cache_node *n = get_node(s, slab_nid(slab));
- struct slab *slab_free = NULL;
+ bool checks_ok = false;
void *object = head;
int cnt = 0;
- unsigned long flags;
- bool checks_ok = false;
- depot_stack_handle_t handle = 0;
-
- if (s->flags & SLAB_STORE_USER)
- handle = set_track_prepare();
-
- spin_lock_irqsave(&n->list_lock, flags);
if (s->flags & SLAB_CONSISTENCY_CHECKS) {
if (!check_slab(s, slab))
goto out;
}
- if (slab->inuse < bulk_cnt) {
+ if (slab->inuse < *bulk_cnt) {
slab_err(s, slab, "Slab has %d allocated objects but %d are to be freed\n",
- slab->inuse, bulk_cnt);
+ slab->inuse, *bulk_cnt);
goto out;
}
next_object:
- if (++cnt > bulk_cnt)
+ if (++cnt > *bulk_cnt)
goto out_cnt;
if (s->flags & SLAB_CONSISTENCY_CHECKS) {
@@ -2886,57 +2876,18 @@ static noinline void free_debug_processing(
checks_ok = true;
out_cnt:
- if (cnt != bulk_cnt)
+ if (cnt != *bulk_cnt) {
slab_err(s, slab, "Bulk free expected %d objects but found %d\n",
- bulk_cnt, cnt);
-
-out:
- if (checks_ok) {
- void *prior = slab->freelist;
-
- /* Perform the actual freeing while we still hold the locks */
- slab->inuse -= cnt;
- set_freepointer(s, tail, prior);
- slab->freelist = head;
-
- /*
- * If the slab is empty, and node's partial list is full,
- * it should be discarded anyway no matter it's on full or
- * partial list.
- */
- if (slab->inuse == 0 && n->nr_partial >= s->min_partial)
- slab_free = slab;
-
- if (!prior) {
- /* was on full list */
- remove_full(s, n, slab);
- if (!slab_free) {
- add_partial(n, slab, DEACTIVATE_TO_TAIL);
- stat(s, FREE_ADD_PARTIAL);
- }
- } else if (slab_free) {
- remove_partial(n, slab);
- stat(s, FREE_REMOVE_PARTIAL);
- }
+ *bulk_cnt, cnt);
+ *bulk_cnt = cnt;
}
- if (slab_free) {
- /*
- * Update the counters while still holding n->list_lock to
- * prevent spurious validation warnings
- */
- dec_slabs_node(s, slab_nid(slab_free), slab_free->objects);
- }
-
- spin_unlock_irqrestore(&n->list_lock, flags);
+out:
if (!checks_ok)
slab_fix(s, "Object at 0x%p not freed", object);
- if (slab_free) {
- stat(s, FREE_SLAB);
- free_slab(s, slab_free);
- }
+ return checks_ok;
}
#endif /* CONFIG_SLUB_DEBUG */
@@ -3453,6 +3404,67 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
}
EXPORT_SYMBOL(kmem_cache_alloc_node);
+static noinline void free_to_partial_list(
+ struct kmem_cache *s, struct slab *slab,
+ void *head, void *tail, int bulk_cnt,
+ unsigned long addr)
+{
+ struct kmem_cache_node *n = get_node(s, slab_nid(slab));
+ struct slab *slab_free = NULL;
+ int cnt = bulk_cnt;
+ unsigned long flags;
+ depot_stack_handle_t handle = 0;
+
+ if (s->flags & SLAB_STORE_USER)
+ handle = set_track_prepare();
+
+ spin_lock_irqsave(&n->list_lock, flags);
+
+ if (free_debug_processing(s, slab, head, tail, &cnt, addr, handle)) {
+ void *prior = slab->freelist;
+
+ /* Perform the actual freeing while we still hold the locks */
+ slab->inuse -= cnt;
+ set_freepointer(s, tail, prior);
+ slab->freelist = head;
+
+ /*
+ * If the slab is empty, and node's partial list is full,
+ * it should be discarded anyway no matter it's on full or
+ * partial list.
+ */
+ if (slab->inuse == 0 && n->nr_partial >= s->min_partial)
+ slab_free = slab;
+
+ if (!prior) {
+ /* was on full list */
+ remove_full(s, n, slab);
+ if (!slab_free) {
+ add_partial(n, slab, DEACTIVATE_TO_TAIL);
+ stat(s, FREE_ADD_PARTIAL);
+ }
+ } else if (slab_free) {
+ remove_partial(n, slab);
+ stat(s, FREE_REMOVE_PARTIAL);
+ }
+ }
+
+ if (slab_free) {
+ /*
+ * Update the counters while still holding n->list_lock to
+ * prevent spurious validation warnings
+ */
+ dec_slabs_node(s, slab_nid(slab_free), slab_free->objects);
+ }
+
+ spin_unlock_irqrestore(&n->list_lock, flags);
+
+ if (slab_free) {
+ stat(s, FREE_SLAB);
+ free_slab(s, slab_free);
+ }
+}
+
/*
* Slow path handling. This may still be called frequently since objects
* have a longer lifetime than the cpu slabs in most processing loads.
@@ -3479,7 +3491,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
return;
if (kmem_cache_debug(s)) {
- free_debug_processing(s, slab, head, tail, cnt, addr);
+ free_to_partial_list(s, slab, head, tail, cnt, addr);
return;
}
--
2.38.1
SLUB fastpaths use __always_inline to avoid function calls. With
CONFIG_SLUB_TINY we would rather save the memory. Add a
__fastpath_inline macro that's __always_inline normally but empty with
CONFIG_SLUB_TINY.
bloat-o-meter results on x86_64 mm/slub.o:
add/remove: 3/1 grow/shrink: 1/8 up/down: 865/-1784 (-919)
Function old new delta
kmem_cache_free 20 281 +261
slab_alloc_node.isra - 245 +245
slab_free.constprop.isra - 231 +231
__kmem_cache_alloc_lru.isra - 128 +128
__kmem_cache_release 88 83 -5
__kmem_cache_create 1446 1436 -10
__kmem_cache_free 271 142 -129
kmem_cache_alloc_node 330 127 -203
kmem_cache_free_bulk.part 826 613 -213
__kmem_cache_alloc_node 230 10 -220
kmem_cache_alloc_lru 325 12 -313
kmem_cache_alloc 325 10 -315
kmem_cache_free.part 376 - -376
Total: Before=26103, After=25184, chg -3.52%
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 7f1cd702c3b4..d54466e76503 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -187,6 +187,12 @@ do { \
#define USE_LOCKLESS_FAST_PATH() (false)
#endif
+#ifndef CONFIG_SLUB_TINY
+#define __fastpath_inline __always_inline
+#else
+#define __fastpath_inline
+#endif
+
#ifdef CONFIG_SLUB_DEBUG
#ifdef CONFIG_SLUB_DEBUG_ON
DEFINE_STATIC_KEY_TRUE(slub_debug_enabled);
@@ -3386,7 +3392,7 @@ static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s,
*
* Otherwise we can simply pick the next object from the lockless free list.
*/
-static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
+static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
void *object;
@@ -3412,13 +3418,13 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l
return object;
}
-static __always_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
+static __fastpath_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, unsigned long addr, size_t orig_size)
{
return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size);
}
-static __always_inline
+static __fastpath_inline
void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags)
{
@@ -3733,7 +3739,7 @@ static void do_slab_free(struct kmem_cache *s,
}
#endif /* CONFIG_SLUB_TINY */
-static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab,
+static __fastpath_inline void slab_free(struct kmem_cache *s, struct slab *slab,
void *head, void *tail, void **p, int cnt,
unsigned long addr)
{
--
2.38.1
Hi,
On Mon, Nov 21, 2022 at 06:12:02PM +0100, Vlastimil Babka wrote:
> As explained in [1], we would like to remove SLOB if possible.
>
> - There are no known users that need its somewhat lower memory footprint
> so much that they cannot handle SLUB (after some modifications by the
> previous patches) instead.
>
> - It is an extra maintenance burden, and a number of features are
> incompatible with it.
>
> - It blocks the API improvement of allowing kfree() on objects allocated
> via kmem_cache_alloc().
>
> As the first step, rename the CONFIG_SLOB option in the slab allocator
> configuration choice to CONFIG_SLOB_DEPRECATED. Add CONFIG_SLOB
> depending on CONFIG_SLOB_DEPRECATED as an internal option to avoid code
> churn. This will cause existing .config files and defconfigs with
> CONFIG_SLOB=y to silently switch to the default (and recommended
> replacement) SLUB, while still allowing SLOB to be configured by anyone
> that notices and needs it. But those should contact the slab maintainers
> and [email protected] as explained in the updated help. With no valid
> objections, the plan is to update the existing defconfigs to SLUB and
> remove SLOB in a few cycles.
>
> To make SLUB more suitable replacement for SLOB, a CONFIG_SLUB_TINY
> option was introduced to limit SLUB's memory overhead.
> There is a number of defconfigs specifying CONFIG_SLOB=y. As part of
> this patch, update them to select CONFIG_SLUB and CONFIG_SLUB_TINY.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Cc: Russell King <[email protected]>
> Cc: Aaro Koskinen <[email protected]>
Acked-by: Aaro Koskinen <[email protected]> # OMAP1
A.
> Cc: Janusz Krzysztofik <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Jonas Bonn <[email protected]>
> Cc: Stefan Kristiansson <[email protected]>
> Cc: Stafford Horne <[email protected]>
> Cc: Yoshinori Sato <[email protected]>
> Cc: Rich Felker <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Conor Dooley <[email protected]>
> Cc: Damien Le Moal <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> arch/arm/configs/clps711x_defconfig | 3 ++-
> arch/arm/configs/collie_defconfig | 3 ++-
> arch/arm/configs/multi_v4t_defconfig | 3 ++-
> arch/arm/configs/omap1_defconfig | 3 ++-
> arch/arm/configs/pxa_defconfig | 3 ++-
> arch/arm/configs/tct_hammer_defconfig | 3 ++-
> arch/arm/configs/xcep_defconfig | 3 ++-
> arch/openrisc/configs/or1ksim_defconfig | 3 ++-
> arch/openrisc/configs/simple_smp_defconfig | 3 ++-
> arch/riscv/configs/nommu_k210_defconfig | 3 ++-
> arch/riscv/configs/nommu_k210_sdcard_defconfig | 3 ++-
> arch/riscv/configs/nommu_virt_defconfig | 3 ++-
> arch/sh/configs/rsk7201_defconfig | 3 ++-
> arch/sh/configs/rsk7203_defconfig | 3 ++-
> arch/sh/configs/se7206_defconfig | 3 ++-
> arch/sh/configs/shmin_defconfig | 3 ++-
> arch/sh/configs/shx3_defconfig | 3 ++-
> kernel/configs/tiny.config | 5 +++--
> mm/Kconfig | 17 +++++++++++++++--
> 19 files changed, 52 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm/configs/clps711x_defconfig b/arch/arm/configs/clps711x_defconfig
> index 92481b2a88fa..adcee238822a 100644
> --- a/arch/arm/configs/clps711x_defconfig
> +++ b/arch/arm/configs/clps711x_defconfig
> @@ -14,7 +14,8 @@ CONFIG_ARCH_EDB7211=y
> CONFIG_ARCH_P720T=y
> CONFIG_AEABI=y
> # CONFIG_COREDUMP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/collie_defconfig b/arch/arm/configs/collie_defconfig
> index 2a2d2cb3ce2e..69341c33e0cc 100644
> --- a/arch/arm/configs/collie_defconfig
> +++ b/arch/arm/configs/collie_defconfig
> @@ -13,7 +13,8 @@ CONFIG_CMDLINE="noinitrd root=/dev/mtdblock2 rootfstype=jffs2 fbcon=rotate:1"
> CONFIG_FPE_NWFPE=y
> CONFIG_PM=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/multi_v4t_defconfig b/arch/arm/configs/multi_v4t_defconfig
> index e2fd822f741a..b60000a89aff 100644
> --- a/arch/arm/configs/multi_v4t_defconfig
> +++ b/arch/arm/configs/multi_v4t_defconfig
> @@ -25,7 +25,8 @@ CONFIG_ARM_CLPS711X_CPUIDLE=y
> CONFIG_JUMP_LABEL=y
> CONFIG_PARTITION_ADVANCED=y
> # CONFIG_COREDUMP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MTD=y
> CONFIG_MTD_CMDLINE_PARTS=y
> CONFIG_MTD_BLOCK=y
> diff --git a/arch/arm/configs/omap1_defconfig b/arch/arm/configs/omap1_defconfig
> index 70511fe4b3ec..246f1bba7df5 100644
> --- a/arch/arm/configs/omap1_defconfig
> +++ b/arch/arm/configs/omap1_defconfig
> @@ -42,7 +42,8 @@ CONFIG_MODULE_FORCE_UNLOAD=y
> CONFIG_PARTITION_ADVANCED=y
> CONFIG_BINFMT_MISC=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> CONFIG_NET=y
> CONFIG_PACKET=y
> diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig
> index d60cc9cc4c21..0a0f12df40b5 100644
> --- a/arch/arm/configs/pxa_defconfig
> +++ b/arch/arm/configs/pxa_defconfig
> @@ -49,7 +49,8 @@ CONFIG_PARTITION_ADVANCED=y
> CONFIG_LDM_PARTITION=y
> CONFIG_CMDLINE_PARTITION=y
> CONFIG_BINFMT_MISC=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_COMPACTION is not set
> CONFIG_NET=y
> CONFIG_PACKET=y
> diff --git a/arch/arm/configs/tct_hammer_defconfig b/arch/arm/configs/tct_hammer_defconfig
> index 3b29ae1fb750..6bd38b6f22c4 100644
> --- a/arch/arm/configs/tct_hammer_defconfig
> +++ b/arch/arm/configs/tct_hammer_defconfig
> @@ -19,7 +19,8 @@ CONFIG_FPE_NWFPE=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/xcep_defconfig b/arch/arm/configs/xcep_defconfig
> index ea59e4b6bfc5..6bd9f71b71fc 100644
> --- a/arch/arm/configs/xcep_defconfig
> +++ b/arch/arm/configs/xcep_defconfig
> @@ -26,7 +26,8 @@ CONFIG_MODULE_UNLOAD=y
> CONFIG_MODVERSIONS=y
> CONFIG_MODULE_SRCVERSION_ALL=y
> # CONFIG_BLOCK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_COMPAT_BRK is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> CONFIG_NET=y
> diff --git a/arch/openrisc/configs/or1ksim_defconfig b/arch/openrisc/configs/or1ksim_defconfig
> index 6e1e004047c7..0116e465238f 100644
> --- a/arch/openrisc/configs/or1ksim_defconfig
> +++ b/arch/openrisc/configs/or1ksim_defconfig
> @@ -10,7 +10,8 @@ CONFIG_EXPERT=y
> # CONFIG_AIO is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MODULES=y
> # CONFIG_BLOCK is not set
> CONFIG_OPENRISC_BUILTIN_DTB="or1ksim"
> diff --git a/arch/openrisc/configs/simple_smp_defconfig b/arch/openrisc/configs/simple_smp_defconfig
> index ff49d868e040..b990cb6c9309 100644
> --- a/arch/openrisc/configs/simple_smp_defconfig
> +++ b/arch/openrisc/configs/simple_smp_defconfig
> @@ -16,7 +16,8 @@ CONFIG_EXPERT=y
> # CONFIG_AIO is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MODULES=y
> # CONFIG_BLOCK is not set
> CONFIG_OPENRISC_BUILTIN_DTB="simple_smp"
> diff --git a/arch/riscv/configs/nommu_k210_defconfig b/arch/riscv/configs/nommu_k210_defconfig
> index 96fe8def644c..79b3ccd58ff0 100644
> --- a/arch/riscv/configs/nommu_k210_defconfig
> +++ b/arch/riscv/configs/nommu_k210_defconfig
> @@ -25,7 +25,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EMBEDDED=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_CANAAN=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/riscv/configs/nommu_k210_sdcard_defconfig b/arch/riscv/configs/nommu_k210_sdcard_defconfig
> index 379740654373..6b80bb13b8ed 100644
> --- a/arch/riscv/configs/nommu_k210_sdcard_defconfig
> +++ b/arch/riscv/configs/nommu_k210_sdcard_defconfig
> @@ -17,7 +17,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EMBEDDED=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_CANAAN=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
> index 1a56eda5ce46..4cf0f297091e 100644
> --- a/arch/riscv/configs/nommu_virt_defconfig
> +++ b/arch/riscv/configs/nommu_virt_defconfig
> @@ -22,7 +22,8 @@ CONFIG_EXPERT=y
> # CONFIG_KALLSYMS is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_VIRT=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/sh/configs/rsk7201_defconfig b/arch/sh/configs/rsk7201_defconfig
> index 619c18699459..376e95fa77bc 100644
> --- a/arch/sh/configs/rsk7201_defconfig
> +++ b/arch/sh/configs/rsk7201_defconfig
> @@ -10,7 +10,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_BLK_DEV_INITRD=y
> # CONFIG_AIO is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> # CONFIG_BLK_DEV_BSG is not set
> diff --git a/arch/sh/configs/rsk7203_defconfig b/arch/sh/configs/rsk7203_defconfig
> index d00fafc021e1..1d5fd67a3949 100644
> --- a/arch/sh/configs/rsk7203_defconfig
> +++ b/arch/sh/configs/rsk7203_defconfig
> @@ -11,7 +11,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_KALLSYMS_ALL=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> # CONFIG_BLK_DEV_BSG is not set
> diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
> index 122216123e63..78e0e7be57ee 100644
> --- a/arch/sh/configs/se7206_defconfig
> +++ b/arch/sh/configs/se7206_defconfig
> @@ -21,7 +21,8 @@ CONFIG_BLK_DEV_INITRD=y
> CONFIG_KALLSYMS_ALL=y
> # CONFIG_ELF_CORE is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> diff --git a/arch/sh/configs/shmin_defconfig b/arch/sh/configs/shmin_defconfig
> index c0b6f40d01cc..e078b193a78a 100644
> --- a/arch/sh/configs/shmin_defconfig
> +++ b/arch/sh/configs/shmin_defconfig
> @@ -9,7 +9,8 @@ CONFIG_LOG_BUF_SHIFT=14
> # CONFIG_FUTEX is not set
> # CONFIG_EPOLL is not set
> # CONFIG_SHMEM is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_BLK_DEV_BSG is not set
> CONFIG_CPU_SUBTYPE_SH7706=y
> CONFIG_MEMORY_START=0x0c000000
> diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
> index 32ec6eb1eabc..aa353dff7f19 100644
> --- a/arch/sh/configs/shx3_defconfig
> +++ b/arch/sh/configs/shx3_defconfig
> @@ -20,7 +20,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
> CONFIG_KALLSYMS_ALL=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_KPROBES=y
> CONFIG_MODULES=y
> diff --git a/kernel/configs/tiny.config b/kernel/configs/tiny.config
> index 8a44b93da0f3..c2f9c912df1c 100644
> --- a/kernel/configs/tiny.config
> +++ b/kernel/configs/tiny.config
> @@ -7,5 +7,6 @@ CONFIG_KERNEL_XZ=y
> # CONFIG_KERNEL_LZO is not set
> # CONFIG_KERNEL_LZ4 is not set
> # CONFIG_SLAB is not set
> -# CONFIG_SLUB is not set
> -CONFIG_SLOB=y
> +# CONFIG_SLOB_DEPRECATED is not set
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 5941cb34e30d..dcc49c69552f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -219,17 +219,30 @@ config SLUB
> and has enhanced diagnostics. SLUB is the default choice for
> a slab allocator.
>
> -config SLOB
> +config SLOB_DEPRECATED
> depends on EXPERT
> - bool "SLOB (Simple Allocator)"
> + bool "SLOB (Simple Allocator - DEPRECATED)"
> depends on !PREEMPT_RT
> help
> + Deprecated and scheduled for removal in a few cycles. SLUB
> + recommended as replacement. CONFIG_SLUB_TINY can be considered
> + on systems with 16MB or less RAM.
> +
> + If you need SLOB to stay, please contact [email protected] and
> + people listed in the SLAB ALLOCATOR section of MAINTAINERS file,
> + with your use case.
> +
> SLOB replaces the stock allocator with a drastically simpler
> allocator. SLOB is generally more space efficient but
> does not perform as well on large systems.
>
> endchoice
>
> +config SLOB
> + bool
> + default y
> + depends on SLOB_DEPRECATED
> +
> config SLUB_TINY
> bool "Configure SLUB for minimal memory footprint"
> depends on SLUB && EXPERT
> --
> 2.38.1
>
On 11/21/22 18:12, Vlastimil Babka wrote:
> As explained in [1], we would like to remove SLOB if possible.
>
> - There are no known users that need its somewhat lower memory footprint
> so much that they cannot handle SLUB (after some modifications by the
> previous patches) instead.
>
> - It is an extra maintenance burden, and a number of features are
> incompatible with it.
>
> - It blocks the API improvement of allowing kfree() on objects allocated
> via kmem_cache_alloc().
>
> As the first step, rename the CONFIG_SLOB option in the slab allocator
> configuration choice to CONFIG_SLOB_DEPRECATED. Add CONFIG_SLOB
> depending on CONFIG_SLOB_DEPRECATED as an internal option to avoid code
> churn. This will cause existing .config files and defconfigs with
> CONFIG_SLOB=y to silently switch to the default (and recommended
> replacement) SLUB, while still allowing SLOB to be configured by anyone
> that notices and needs it. But those should contact the slab maintainers
> and [email protected] as explained in the updated help. With no valid
> objections, the plan is to update the existing defconfigs to SLUB and
> remove SLOB in a few cycles.
>
> To make SLUB more suitable replacement for SLOB, a CONFIG_SLUB_TINY
> option was introduced to limit SLUB's memory overhead.
> There is a number of defconfigs specifying CONFIG_SLOB=y. As part of
> this patch, update them to select CONFIG_SLUB and CONFIG_SLUB_TINY.
Hm I forgot - some of those defconfigs might not actually be for so tiny
devices to need CONFIG_SLUB_TINY (or SLOB previously). For those it
would make more sense to simply remove CONFIG_SLOB=y and leave it to the
default choice, which is SLUB (without _TINY). Feel free to point those
out to me and I'll adjust. Thanks.
> [1] https://lore.kernel.org/all/[email protected]/
>
> Cc: Russell King <[email protected]>
> Cc: Aaro Koskinen <[email protected]>
> Cc: Janusz Krzysztofik <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Jonas Bonn <[email protected]>
> Cc: Stefan Kristiansson <[email protected]>
> Cc: Stafford Horne <[email protected]>
> Cc: Yoshinori Sato <[email protected]>
> Cc: Rich Felker <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Conor Dooley <[email protected]>
> Cc: Damien Le Moal <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
On November 21, 2022 9:11:51 AM PST, Vlastimil Babka <[email protected]> wrote:
>With CONFIG_HARDENED_USERCOPY not enabled, there are no
>__check_heap_object() checks happening that would use the kmem_cache
>useroffset and usersize fields. Yet the fields are still initialized,
>preventing merging of otherwise compatible caches. Thus ignore the
>values passed to cache creation and leave them zero when
>CONFIG_HARDENED_USERCOPY is disabled.
>
>In a quick virtme boot test, this has reduced the number of caches in
>/proc/slabinfo from 131 to 111.
>
>Cc: Kees Cook <[email protected]>
>Signed-off-by: Vlastimil Babka <[email protected]>
>---
> mm/slab_common.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/mm/slab_common.c b/mm/slab_common.c
>index 0042fb2730d1..a8cb5de255fc 100644
>--- a/mm/slab_common.c
>+++ b/mm/slab_common.c
>@@ -317,7 +317,8 @@ kmem_cache_create_usercopy(const char *name,
> flags &= CACHE_CREATE_MASK;
>
> /* Fail closed on bad usersize of useroffset values. */
>- if (WARN_ON(!usersize && useroffset) ||
>+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
>+ WARN_ON(!usersize && useroffset) ||
> WARN_ON(size < usersize || size - usersize < useroffset))
> usersize = useroffset = 0;
>
>@@ -640,6 +641,9 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
> align = max(align, size);
> s->align = calculate_alignment(flags, align, size);
>
>+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY))
>+ useroffset = usersize = 0;
>+
> s->useroffset = useroffset;
> s->usersize = usersize;
>
"Always non-mergeable" is intentional here, but I do see the argument for not doing it under hardened-usercopy.
That said, if you keep this part, maybe go the full step and ifdef away useroffset/usersize's struct member definition and other logic, especially for SLUB_TINY benefits, so 2 ulongs are dropped from the cache struct?
-Kees
--
Kees Cook
On 11/22/22 02:12, Vlastimil Babka wrote:
> As explained in [1], we would like to remove SLOB if possible.
>
> - There are no known users that need its somewhat lower memory footprint
> so much that they cannot handle SLUB (after some modifications by the
> previous patches) instead.
>
> - It is an extra maintenance burden, and a number of features are
> incompatible with it.
>
> - It blocks the API improvement of allowing kfree() on objects allocated
> via kmem_cache_alloc().
>
> As the first step, rename the CONFIG_SLOB option in the slab allocator
> configuration choice to CONFIG_SLOB_DEPRECATED. Add CONFIG_SLOB
> depending on CONFIG_SLOB_DEPRECATED as an internal option to avoid code
> churn. This will cause existing .config files and defconfigs with
> CONFIG_SLOB=y to silently switch to the default (and recommended
> replacement) SLUB, while still allowing SLOB to be configured by anyone
> that notices and needs it. But those should contact the slab maintainers
> and [email protected] as explained in the updated help. With no valid
> objections, the plan is to update the existing defconfigs to SLUB and
> remove SLOB in a few cycles.
>
> To make SLUB more suitable replacement for SLOB, a CONFIG_SLUB_TINY
> option was introduced to limit SLUB's memory overhead.
> There is a number of defconfigs specifying CONFIG_SLOB=y. As part of
> this patch, update them to select CONFIG_SLUB and CONFIG_SLUB_TINY.
>
> [1] https://lore.kernel.org/all/[email protected]/
For the riscv k210,
Reviewed-by: Damien Le Moal <[email protected]>
Also, if these patches do not change from what I tested, feel free to add:
Tested-by: Damien Le Moal <[email protected]>
Thanks !
>
> Cc: Russell King <[email protected]>
> Cc: Aaro Koskinen <[email protected]>
> Cc: Janusz Krzysztofik <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Jonas Bonn <[email protected]>
> Cc: Stefan Kristiansson <[email protected]>
> Cc: Stafford Horne <[email protected]>
> Cc: Yoshinori Sato <[email protected]>
> Cc: Rich Felker <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Conor Dooley <[email protected]>
> Cc: Damien Le Moal <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> arch/arm/configs/clps711x_defconfig | 3 ++-
> arch/arm/configs/collie_defconfig | 3 ++-
> arch/arm/configs/multi_v4t_defconfig | 3 ++-
> arch/arm/configs/omap1_defconfig | 3 ++-
> arch/arm/configs/pxa_defconfig | 3 ++-
> arch/arm/configs/tct_hammer_defconfig | 3 ++-
> arch/arm/configs/xcep_defconfig | 3 ++-
> arch/openrisc/configs/or1ksim_defconfig | 3 ++-
> arch/openrisc/configs/simple_smp_defconfig | 3 ++-
> arch/riscv/configs/nommu_k210_defconfig | 3 ++-
> arch/riscv/configs/nommu_k210_sdcard_defconfig | 3 ++-
> arch/riscv/configs/nommu_virt_defconfig | 3 ++-
> arch/sh/configs/rsk7201_defconfig | 3 ++-
> arch/sh/configs/rsk7203_defconfig | 3 ++-
> arch/sh/configs/se7206_defconfig | 3 ++-
> arch/sh/configs/shmin_defconfig | 3 ++-
> arch/sh/configs/shx3_defconfig | 3 ++-
> kernel/configs/tiny.config | 5 +++--
> mm/Kconfig | 17 +++++++++++++++--
> 19 files changed, 52 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm/configs/clps711x_defconfig b/arch/arm/configs/clps711x_defconfig
> index 92481b2a88fa..adcee238822a 100644
> --- a/arch/arm/configs/clps711x_defconfig
> +++ b/arch/arm/configs/clps711x_defconfig
> @@ -14,7 +14,8 @@ CONFIG_ARCH_EDB7211=y
> CONFIG_ARCH_P720T=y
> CONFIG_AEABI=y
> # CONFIG_COREDUMP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/collie_defconfig b/arch/arm/configs/collie_defconfig
> index 2a2d2cb3ce2e..69341c33e0cc 100644
> --- a/arch/arm/configs/collie_defconfig
> +++ b/arch/arm/configs/collie_defconfig
> @@ -13,7 +13,8 @@ CONFIG_CMDLINE="noinitrd root=/dev/mtdblock2 rootfstype=jffs2 fbcon=rotate:1"
> CONFIG_FPE_NWFPE=y
> CONFIG_PM=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/multi_v4t_defconfig b/arch/arm/configs/multi_v4t_defconfig
> index e2fd822f741a..b60000a89aff 100644
> --- a/arch/arm/configs/multi_v4t_defconfig
> +++ b/arch/arm/configs/multi_v4t_defconfig
> @@ -25,7 +25,8 @@ CONFIG_ARM_CLPS711X_CPUIDLE=y
> CONFIG_JUMP_LABEL=y
> CONFIG_PARTITION_ADVANCED=y
> # CONFIG_COREDUMP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MTD=y
> CONFIG_MTD_CMDLINE_PARTS=y
> CONFIG_MTD_BLOCK=y
> diff --git a/arch/arm/configs/omap1_defconfig b/arch/arm/configs/omap1_defconfig
> index 70511fe4b3ec..246f1bba7df5 100644
> --- a/arch/arm/configs/omap1_defconfig
> +++ b/arch/arm/configs/omap1_defconfig
> @@ -42,7 +42,8 @@ CONFIG_MODULE_FORCE_UNLOAD=y
> CONFIG_PARTITION_ADVANCED=y
> CONFIG_BINFMT_MISC=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> CONFIG_NET=y
> CONFIG_PACKET=y
> diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig
> index d60cc9cc4c21..0a0f12df40b5 100644
> --- a/arch/arm/configs/pxa_defconfig
> +++ b/arch/arm/configs/pxa_defconfig
> @@ -49,7 +49,8 @@ CONFIG_PARTITION_ADVANCED=y
> CONFIG_LDM_PARTITION=y
> CONFIG_CMDLINE_PARTITION=y
> CONFIG_BINFMT_MISC=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_COMPACTION is not set
> CONFIG_NET=y
> CONFIG_PACKET=y
> diff --git a/arch/arm/configs/tct_hammer_defconfig b/arch/arm/configs/tct_hammer_defconfig
> index 3b29ae1fb750..6bd38b6f22c4 100644
> --- a/arch/arm/configs/tct_hammer_defconfig
> +++ b/arch/arm/configs/tct_hammer_defconfig
> @@ -19,7 +19,8 @@ CONFIG_FPE_NWFPE=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/xcep_defconfig b/arch/arm/configs/xcep_defconfig
> index ea59e4b6bfc5..6bd9f71b71fc 100644
> --- a/arch/arm/configs/xcep_defconfig
> +++ b/arch/arm/configs/xcep_defconfig
> @@ -26,7 +26,8 @@ CONFIG_MODULE_UNLOAD=y
> CONFIG_MODVERSIONS=y
> CONFIG_MODULE_SRCVERSION_ALL=y
> # CONFIG_BLOCK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_COMPAT_BRK is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> CONFIG_NET=y
> diff --git a/arch/openrisc/configs/or1ksim_defconfig b/arch/openrisc/configs/or1ksim_defconfig
> index 6e1e004047c7..0116e465238f 100644
> --- a/arch/openrisc/configs/or1ksim_defconfig
> +++ b/arch/openrisc/configs/or1ksim_defconfig
> @@ -10,7 +10,8 @@ CONFIG_EXPERT=y
> # CONFIG_AIO is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MODULES=y
> # CONFIG_BLOCK is not set
> CONFIG_OPENRISC_BUILTIN_DTB="or1ksim"
> diff --git a/arch/openrisc/configs/simple_smp_defconfig b/arch/openrisc/configs/simple_smp_defconfig
> index ff49d868e040..b990cb6c9309 100644
> --- a/arch/openrisc/configs/simple_smp_defconfig
> +++ b/arch/openrisc/configs/simple_smp_defconfig
> @@ -16,7 +16,8 @@ CONFIG_EXPERT=y
> # CONFIG_AIO is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MODULES=y
> # CONFIG_BLOCK is not set
> CONFIG_OPENRISC_BUILTIN_DTB="simple_smp"
> diff --git a/arch/riscv/configs/nommu_k210_defconfig b/arch/riscv/configs/nommu_k210_defconfig
> index 96fe8def644c..79b3ccd58ff0 100644
> --- a/arch/riscv/configs/nommu_k210_defconfig
> +++ b/arch/riscv/configs/nommu_k210_defconfig
> @@ -25,7 +25,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EMBEDDED=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_CANAAN=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/riscv/configs/nommu_k210_sdcard_defconfig b/arch/riscv/configs/nommu_k210_sdcard_defconfig
> index 379740654373..6b80bb13b8ed 100644
> --- a/arch/riscv/configs/nommu_k210_sdcard_defconfig
> +++ b/arch/riscv/configs/nommu_k210_sdcard_defconfig
> @@ -17,7 +17,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EMBEDDED=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_CANAAN=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
> index 1a56eda5ce46..4cf0f297091e 100644
> --- a/arch/riscv/configs/nommu_virt_defconfig
> +++ b/arch/riscv/configs/nommu_virt_defconfig
> @@ -22,7 +22,8 @@ CONFIG_EXPERT=y
> # CONFIG_KALLSYMS is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_VIRT=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/sh/configs/rsk7201_defconfig b/arch/sh/configs/rsk7201_defconfig
> index 619c18699459..376e95fa77bc 100644
> --- a/arch/sh/configs/rsk7201_defconfig
> +++ b/arch/sh/configs/rsk7201_defconfig
> @@ -10,7 +10,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_BLK_DEV_INITRD=y
> # CONFIG_AIO is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> # CONFIG_BLK_DEV_BSG is not set
> diff --git a/arch/sh/configs/rsk7203_defconfig b/arch/sh/configs/rsk7203_defconfig
> index d00fafc021e1..1d5fd67a3949 100644
> --- a/arch/sh/configs/rsk7203_defconfig
> +++ b/arch/sh/configs/rsk7203_defconfig
> @@ -11,7 +11,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_KALLSYMS_ALL=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> # CONFIG_BLK_DEV_BSG is not set
> diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
> index 122216123e63..78e0e7be57ee 100644
> --- a/arch/sh/configs/se7206_defconfig
> +++ b/arch/sh/configs/se7206_defconfig
> @@ -21,7 +21,8 @@ CONFIG_BLK_DEV_INITRD=y
> CONFIG_KALLSYMS_ALL=y
> # CONFIG_ELF_CORE is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> diff --git a/arch/sh/configs/shmin_defconfig b/arch/sh/configs/shmin_defconfig
> index c0b6f40d01cc..e078b193a78a 100644
> --- a/arch/sh/configs/shmin_defconfig
> +++ b/arch/sh/configs/shmin_defconfig
> @@ -9,7 +9,8 @@ CONFIG_LOG_BUF_SHIFT=14
> # CONFIG_FUTEX is not set
> # CONFIG_EPOLL is not set
> # CONFIG_SHMEM is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_BLK_DEV_BSG is not set
> CONFIG_CPU_SUBTYPE_SH7706=y
> CONFIG_MEMORY_START=0x0c000000
> diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
> index 32ec6eb1eabc..aa353dff7f19 100644
> --- a/arch/sh/configs/shx3_defconfig
> +++ b/arch/sh/configs/shx3_defconfig
> @@ -20,7 +20,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
> CONFIG_KALLSYMS_ALL=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_KPROBES=y
> CONFIG_MODULES=y
> diff --git a/kernel/configs/tiny.config b/kernel/configs/tiny.config
> index 8a44b93da0f3..c2f9c912df1c 100644
> --- a/kernel/configs/tiny.config
> +++ b/kernel/configs/tiny.config
> @@ -7,5 +7,6 @@ CONFIG_KERNEL_XZ=y
> # CONFIG_KERNEL_LZO is not set
> # CONFIG_KERNEL_LZ4 is not set
> # CONFIG_SLAB is not set
> -# CONFIG_SLUB is not set
> -CONFIG_SLOB=y
> +# CONFIG_SLOB_DEPRECATED is not set
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 5941cb34e30d..dcc49c69552f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -219,17 +219,30 @@ config SLUB
> and has enhanced diagnostics. SLUB is the default choice for
> a slab allocator.
>
> -config SLOB
> +config SLOB_DEPRECATED
> depends on EXPERT
> - bool "SLOB (Simple Allocator)"
> + bool "SLOB (Simple Allocator - DEPRECATED)"
> depends on !PREEMPT_RT
> help
> + Deprecated and scheduled for removal in a few cycles. SLUB
> + recommended as replacement. CONFIG_SLUB_TINY can be considered
> + on systems with 16MB or less RAM.
> +
> + If you need SLOB to stay, please contact [email protected] and
> + people listed in the SLAB ALLOCATOR section of MAINTAINERS file,
> + with your use case.
> +
> SLOB replaces the stock allocator with a drastically simpler
> allocator. SLOB is generally more space efficient but
> does not perform as well on large systems.
>
> endchoice
>
> +config SLOB
> + bool
> + default y
> + depends on SLOB_DEPRECATED
> +
> config SLUB_TINY
> bool "Configure SLUB for minimal memory footprint"
> depends on SLUB && EXPERT
--
Damien Le Moal
Western Digital Research
On Mon, Nov 21, 2022, at 18:12, Vlastimil Babka wrote:
> As explained in [1], we would like to remove SLOB if possible.
> ---
> arch/arm/configs/clps711x_defconfig | 3 ++-
> arch/arm/configs/collie_defconfig | 3 ++-
> arch/arm/configs/multi_v4t_defconfig | 3 ++-
> arch/arm/configs/omap1_defconfig | 3 ++-
> arch/arm/configs/pxa_defconfig | 3 ++-
> arch/arm/configs/tct_hammer_defconfig | 3 ++-
> arch/arm/configs/xcep_defconfig | 3 ++-
These all seem fine to convert to SLUB_TINY
It might be a good idea to go through the arm defconfigs after
6.2 (which will remove a bunch of them) and check which of
the others should use it as well, but that of course is
unrelated to the mechanical conversion you do here.
Acked-by: Arnd Bergmann <[email protected]>
On 11/22/22 17:33, Arnd Bergmann wrote:
> On Mon, Nov 21, 2022, at 18:11, Vlastimil Babka wrote:
>>
>> this continues the discussion from [1]. Reasons to remove SLOB are
>> outlined there and no-one has objected so far. The last patch of this
>> series therefore deprecates CONFIG_SLOB and updates all the defconfigs
>> using CONFIG_SLOB=y in the tree.
>>
>> There is a k210 board with 8MB RAM where switching to SLUB caused issues
>> [2] and the lkp bot wasn't also happy about code bloat [3]. To address
>> both, this series introduces CONFIG_SLUB_TINY to perform some rather
>> low-hanging fruit modifications to SLUB to reduce its memory overhead.
>> This seems to have been successful at least in the k210 case [4]. I
>> consider this as an acceptable tradeoff for getting rid of SLOB.
>
> I agree that this is a great success for replacing SLOB on the
> smallest machines that have 32MB or less and have to run a
> a highly customized kernel, and this is probably enough to
> have a drop-in replacement without making any currently working
> system worse.
>
> On the other hand, I have the feeling that we may want something
> a bit less aggressive than this for machines that are slightly
> less constrained, in particular when a single kernel needs to
> scale from 64MB to 512MB, which can happen e.g. on OpenWRT.
> I have seen a number of reports over the years that suggest
> that new kernels handle fragmentation and low memory worse than
> old ones, and it would be great to improve that again.
I see. That would need to study such reports and see if the problem there is
actually SLUB, or the page allocator or something else entirely.
> I can imagine those machines wanting to use sysfs in general
> but not for the slab caches, so having a separate knob to
> configure out the sysfs stuff could be useful without having
> to go all the way to SLUB_TINY.
Right, but AFAIK that wouldn't save much except some text size and kobjects,
so probably negligible for >32MB?
> For the options that trade off performance against lower
> fragmentation (MIN/MAX_PARTIAL, KMALLOC_RECLAIM, percpu
> slabs), I wonder if it's possible to have a boot time
> default based on the amount of RAM per CPU to have a better
> tuned system on most cases, rather than having to go
> to one extreme or the other at compile time.
Possible for some of these things, but for others that brings us back to the
question what are the actual observed issues. If it's low memory in absolute
number of pages, these can help, but if it's fragmentation (and the kind if
RAM sizes should have page grouping by mobility enabled), ditching e.g. the
KMALLOC_RECLAIM could make it worse. Unfortunately some of these tradeoffs
can be rather unpredictable.
Thanks,
Vlastimil
> Arnd
>
> https://openwrt.org/toh/views/toh_standard_all?datasrt=target&dataflt%5B0%5D=availability_%3DAvailable%202021
On Mon, Nov 21, 2022, at 18:11, Vlastimil Babka wrote:
>
> this continues the discussion from [1]. Reasons to remove SLOB are
> outlined there and no-one has objected so far. The last patch of this
> series therefore deprecates CONFIG_SLOB and updates all the defconfigs
> using CONFIG_SLOB=y in the tree.
>
> There is a k210 board with 8MB RAM where switching to SLUB caused issues
> [2] and the lkp bot wasn't also happy about code bloat [3]. To address
> both, this series introduces CONFIG_SLUB_TINY to perform some rather
> low-hanging fruit modifications to SLUB to reduce its memory overhead.
> This seems to have been successful at least in the k210 case [4]. I
> consider this as an acceptable tradeoff for getting rid of SLOB.
I agree that this is a great success for replacing SLOB on the
smallest machines that have 32MB or less and have to run a
a highly customized kernel, and this is probably enough to
have a drop-in replacement without making any currently working
system worse.
On the other hand, I have the feeling that we may want something
a bit less aggressive than this for machines that are slightly
less constrained, in particular when a single kernel needs to
scale from 64MB to 512MB, which can happen e.g. on OpenWRT.
I have seen a number of reports over the years that suggest
that new kernels handle fragmentation and low memory worse than
old ones, and it would be great to improve that again.
I can imagine those machines wanting to use sysfs in general
but not for the slab caches, so having a separate knob to
configure out the sysfs stuff could be useful without having
to go all the way to SLUB_TINY.
For the options that trade off performance against lower
fragmentation (MIN/MAX_PARTIAL, KMALLOC_RECLAIM, percpu
slabs), I wonder if it's possible to have a boot time
default based on the amount of RAM per CPU to have a better
tuned system on most cases, rather than having to go
to one extreme or the other at compile time.
Arnd
https://openwrt.org/toh/views/toh_standard_all?datasrt=target&dataflt%5B0%5D=availability_%3DAvailable%202021
On Tue, Nov 22, 2022, at 17:59, Vlastimil Babka wrote:
> On 11/22/22 17:33, Arnd Bergmann wrote:
>> On Mon, Nov 21, 2022, at 18:11, Vlastimil Babka wrote:
>> I can imagine those machines wanting to use sysfs in general
>> but not for the slab caches, so having a separate knob to
>> configure out the sysfs stuff could be useful without having
>> to go all the way to SLUB_TINY.
>
> Right, but AFAIK that wouldn't save much except some text size and kobjects,
> so probably negligible for >32MB?
Makes sense, I assume you have a better idea of how much this
could save. I'm not at all worried about the .text size, but
my initial guess was that the metadata for sysfs would be
noticeable.
>> For the options that trade off performance against lower
>> fragmentation (MIN/MAX_PARTIAL, KMALLOC_RECLAIM, percpu
>> slabs), I wonder if it's possible to have a boot time
>> default based on the amount of RAM per CPU to have a better
>> tuned system on most cases, rather than having to go
>> to one extreme or the other at compile time.
>
> Possible for some of these things, but for others that brings us back to the
> question what are the actual observed issues. If it's low memory in absolute
> number of pages, these can help, but if it's fragmentation (and the kind if
> RAM sizes should have page grouping by mobility enabled), ditching e.g. the
> KMALLOC_RECLAIM could make it worse. Unfortunately some of these tradeoffs
> can be rather unpredictable.
Are there any obvious wins on memory uage? I would guess that it
would be safe to e.g. ditch percpu slabs when running with less
128MB per CPU, and the MIN/MAX_PARTIAL values could easily
be a function of the number of pages in total or per cpu,
whichever makes most sense. As a side-effect, those could also
grow slightly larger on huge systems by scaling them with
log2(totalpages).
Arnd
On 11/21/22 22:35, Kees Cook wrote:
> On November 21, 2022 9:11:51 AM PST, Vlastimil Babka <[email protected]> wrote:
>>With CONFIG_HARDENED_USERCOPY not enabled, there are no
>>__check_heap_object() checks happening that would use the kmem_cache
>>useroffset and usersize fields. Yet the fields are still initialized,
>>preventing merging of otherwise compatible caches. Thus ignore the
>>values passed to cache creation and leave them zero when
>>CONFIG_HARDENED_USERCOPY is disabled.
>>
>>In a quick virtme boot test, this has reduced the number of caches in
>>/proc/slabinfo from 131 to 111.
>>
>>Cc: Kees Cook <[email protected]>
>>Signed-off-by: Vlastimil Babka <[email protected]>
>>---
>> mm/slab_common.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>>diff --git a/mm/slab_common.c b/mm/slab_common.c
>>index 0042fb2730d1..a8cb5de255fc 100644
>>--- a/mm/slab_common.c
>>+++ b/mm/slab_common.c
>>@@ -317,7 +317,8 @@ kmem_cache_create_usercopy(const char *name,
>> flags &= CACHE_CREATE_MASK;
>>
>> /* Fail closed on bad usersize of useroffset values. */
>>- if (WARN_ON(!usersize && useroffset) ||
>>+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
>>+ WARN_ON(!usersize && useroffset) ||
>> WARN_ON(size < usersize || size - usersize < useroffset))
>> usersize = useroffset = 0;
>>
>>@@ -640,6 +641,9 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
>> align = max(align, size);
>> s->align = calculate_alignment(flags, align, size);
>>
>>+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY))
>>+ useroffset = usersize = 0;
>>+
>> s->useroffset = useroffset;
>> s->usersize = usersize;
>>
>
> "Always non-mergeable" is intentional here, but I do see the argument
> for not doing it under hardened-usercopy.
>
> That said, if you keep this part, maybe go the full step and ifdef away
> useroffset/usersize's struct member definition and other logic, especially
> for SLUB_TINY benefits, so 2 ulongs are dropped from the cache struct?
Okay, probably won't make much difference in practice, but for consistency...
----8<----
From 3cdb7b6ad16a9d95603b482969fa870f996ac9dc Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Wed, 16 Nov 2022 15:56:32 +0100
Subject: [PATCH] mm, slab: ignore hardened usercopy parameters when disabled
With CONFIG_HARDENED_USERCOPY not enabled, there are no
__check_heap_object() checks happening that would use the struct
kmem_cache useroffset and usersize fields. Yet the fields are still
initialized, preventing merging of otherwise compatible caches.
Also the fields contribute to struct kmem_cache size unnecessarily when
unused. Thus #ifdef them out completely when CONFIG_HARDENED_USERCOPY is
disabled.
In a quick virtme boot test, this has reduced the number of caches in
/proc/slabinfo from 131 to 111.
Cc: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/slab_def.h | 2 ++
include/linux/slub_def.h | 2 ++
mm/slab.h | 2 --
mm/slab_common.c | 9 ++++++++-
mm/slub.c | 4 ++++
5 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index f0ffad6a3365..5834bad8ad78 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -80,8 +80,10 @@ struct kmem_cache {
unsigned int *random_seq;
#endif
+#ifdef CONFIG_HARDENED_USERCOPY
unsigned int useroffset; /* Usercopy region offset */
unsigned int usersize; /* Usercopy region size */
+#endif
struct kmem_cache_node *node[MAX_NUMNODES];
};
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index f9c68a9dac04..7ed5e455cbf4 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -136,8 +136,10 @@ struct kmem_cache {
struct kasan_cache kasan_info;
#endif
+#ifdef CONFIG_HARDENED_USERCOPY
unsigned int useroffset; /* Usercopy region offset */
unsigned int usersize; /* Usercopy region size */
+#endif
struct kmem_cache_node *node[MAX_NUMNODES];
};
diff --git a/mm/slab.h b/mm/slab.h
index 0202a8c2f0d2..db9a7984e22e 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -207,8 +207,6 @@ struct kmem_cache {
unsigned int size; /* The aligned/padded/added on size */
unsigned int align; /* Alignment as calculated */
slab_flags_t flags; /* Active flags on the slab */
- unsigned int useroffset;/* Usercopy region offset */
- unsigned int usersize; /* Usercopy region size */
const char *name; /* Slab name for sysfs */
int refcount; /* Use counter */
void (*ctor)(void *); /* Called on object slot creation */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0042fb2730d1..4339c839a452 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -143,8 +143,10 @@ int slab_unmergeable(struct kmem_cache *s)
if (s->ctor)
return 1;
+#ifdef CONFIG_HARDENED_USERCOPY
if (s->usersize)
return 1;
+#endif
/*
* We may have set a slab to be unmergeable during bootstrap.
@@ -223,8 +225,10 @@ static struct kmem_cache *create_cache(const char *name,
s->size = s->object_size = object_size;
s->align = align;
s->ctor = ctor;
+#ifdef CONFIG_HARDENED_USERCOPY
s->useroffset = useroffset;
s->usersize = usersize;
+#endif
err = __kmem_cache_create(s, flags);
if (err)
@@ -317,7 +321,8 @@ kmem_cache_create_usercopy(const char *name,
flags &= CACHE_CREATE_MASK;
/* Fail closed on bad usersize of useroffset values. */
- if (WARN_ON(!usersize && useroffset) ||
+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
+ WARN_ON(!usersize && useroffset) ||
WARN_ON(size < usersize || size - usersize < useroffset))
usersize = useroffset = 0;
@@ -640,8 +645,10 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
align = max(align, size);
s->align = calculate_alignment(flags, align, size);
+#ifdef CONFIG_HARDENED_USERCOPY
s->useroffset = useroffset;
s->usersize = usersize;
+#endif
err = __kmem_cache_create(s, flags);
diff --git a/mm/slub.c b/mm/slub.c
index 157527d7101b..e32db8540767 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5502,11 +5502,13 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
SLAB_ATTR_RO(cache_dma);
#endif
+#ifdef CONFIG_HARDENED_USERCOPY
static ssize_t usersize_show(struct kmem_cache *s, char *buf)
{
return sysfs_emit(buf, "%u\n", s->usersize);
}
SLAB_ATTR_RO(usersize);
+#endif
static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
{
@@ -5803,7 +5805,9 @@ static struct attribute *slab_attrs[] = {
#ifdef CONFIG_FAILSLAB
&failslab_attr.attr,
#endif
+#ifdef CONFIG_HARDENED_USERCOPY
&usersize_attr.attr,
+#endif
#ifdef CONFIG_KFENCE
&skip_kfence_attr.attr,
#endif
--
2.38.1
On Mon, Nov 21, 2022 at 06:11:54PM +0100, Vlastimil Babka wrote:
> SLUB will leave a number of slabs on the partial list even if they are
> empty, to avoid some slab freeing and reallocation. The goal of
> CONFIG_SLUB_TINY is to minimize memory overhead, so set the limits to 0
> for immediate slab page freeing.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Thanks!
On Mon, Nov 21, 2022 at 06:11:57PM +0100, Vlastimil Babka wrote:
> SLAB_RECLAIM_ACCOUNT caches allocate their slab pages with
> __GFP_RECLAIMABLE and can help against fragmentation by grouping pages
> by mobility, but on tiny systems mobility grouping is likely disabled
> anyway and ignoring SLAB_RECLAIM_ACCOUNT might instead lead to merging
> of caches that are made incompatible just by the flag.
>
> Thus with CONFIG_SLUB_TINY, make SLAB_RECLAIM_ACCOUNT ineffective.
Hm, do you see disabling all kernel memory accounting functionality
with COFNIG_SLUB_TINY? I'd say yes. But in this case need to be consistent
and disable it alltogether.
Thanks!
On Mon, Nov 21, 2022 at 06:12:02PM +0100, Vlastimil Babka wrote:
> As explained in [1], we would like to remove SLOB if possible.
>
> - There are no known users that need its somewhat lower memory footprint
> so much that they cannot handle SLUB (after some modifications by the
> previous patches) instead.
>
> - It is an extra maintenance burden, and a number of features are
> incompatible with it.
>
> - It blocks the API improvement of allowing kfree() on objects allocated
> via kmem_cache_alloc().
>
> As the first step, rename the CONFIG_SLOB option in the slab allocator
> configuration choice to CONFIG_SLOB_DEPRECATED. Add CONFIG_SLOB
> depending on CONFIG_SLOB_DEPRECATED as an internal option to avoid code
> churn. This will cause existing .config files and defconfigs with
> CONFIG_SLOB=y to silently switch to the default (and recommended
> replacement) SLUB, while still allowing SLOB to be configured by anyone
> that notices and needs it. But those should contact the slab maintainers
> and [email protected] as explained in the updated help. With no valid
> objections, the plan is to update the existing defconfigs to SLUB and
> remove SLOB in a few cycles.
>
> To make SLUB more suitable replacement for SLOB, a CONFIG_SLUB_TINY
> option was introduced to limit SLUB's memory overhead.
> There is a number of defconfigs specifying CONFIG_SLOB=y. As part of
> this patch, update them to select CONFIG_SLUB and CONFIG_SLUB_TINY.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Cc: Russell King <[email protected]>
> Cc: Aaro Koskinen <[email protected]>
> Cc: Janusz Krzysztofik <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Jonas Bonn <[email protected]>
> Cc: Stefan Kristiansson <[email protected]>
> Cc: Stafford Horne <[email protected]>
> Cc: Yoshinori Sato <[email protected]>
> Cc: Rich Felker <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Conor Dooley <[email protected]>
> Cc: Damien Le Moal <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Thanks!
On 11/24/22 02:20, Roman Gushchin wrote:
> On Mon, Nov 21, 2022 at 06:11:57PM +0100, Vlastimil Babka wrote:
>> SLAB_RECLAIM_ACCOUNT caches allocate their slab pages with
>> __GFP_RECLAIMABLE and can help against fragmentation by grouping pages
>> by mobility, but on tiny systems mobility grouping is likely disabled
>> anyway and ignoring SLAB_RECLAIM_ACCOUNT might instead lead to merging
>> of caches that are made incompatible just by the flag.
>>
>> Thus with CONFIG_SLUB_TINY, make SLAB_RECLAIM_ACCOUNT ineffective.
>
> Hm, do you see disabling all kernel memory accounting functionality
> with COFNIG_SLUB_TINY? I'd say yes. But in this case need to be consistent
> and disable it alltogether.
SLAB_RECLAIM_ACCOUNT is kinda misnomer these days, as the only thing it does
is to add __GFP_RECLAIMABLE to cache's gfp flags for the page allocator's
mobility grouping. I guess the "ACCOUNT" part comes from being counted
towards SReclaimable (vs SUnreclaim) in /proc/meminfo.
So currently SLUB_TINY has no effect on MEMCG_KMEM (which you probably
meant). Using those two together has little sense and had I stumbled while
making this series upon a code that would become complicated, I would have
made SLUB_TINY disable MEMCG_KMEM, but that didn't happen so I left as is
for now.
> Thanks!
On Thu, 24 Nov 2022, Vlastimil Babka wrote:
> SLAB_RECLAIM_ACCOUNT is kinda misnomer these days, as the only thing it does
> is to add __GFP_RECLAIMABLE to cache's gfp flags for the page allocator's
> mobility grouping. I guess the "ACCOUNT" part comes from being counted
> towards SReclaimable (vs SUnreclaim) in /proc/meminfo.
Well these Sreclaimable etc counters visible in /proc/meminfo are used in
the reclaim logic and are quite important there.
On Wed, Nov 23, 2022 at 03:23:15PM +0100, Vlastimil Babka wrote:
>
> On 11/21/22 22:35, Kees Cook wrote:
> > On November 21, 2022 9:11:51 AM PST, Vlastimil Babka <[email protected]> wrote:
> >>With CONFIG_HARDENED_USERCOPY not enabled, there are no
> >>__check_heap_object() checks happening that would use the kmem_cache
> >>useroffset and usersize fields. Yet the fields are still initialized,
> >>preventing merging of otherwise compatible caches. Thus ignore the
> >>values passed to cache creation and leave them zero when
> >>CONFIG_HARDENED_USERCOPY is disabled.
> >>
> >>In a quick virtme boot test, this has reduced the number of caches in
> >>/proc/slabinfo from 131 to 111.
> >>
> >>Cc: Kees Cook <[email protected]>
> >>Signed-off-by: Vlastimil Babka <[email protected]>
> >>---
> >> mm/slab_common.c | 6 +++++-
> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/mm/slab_common.c b/mm/slab_common.c
> >>index 0042fb2730d1..a8cb5de255fc 100644
> >>--- a/mm/slab_common.c
> >>+++ b/mm/slab_common.c
> >>@@ -317,7 +317,8 @@ kmem_cache_create_usercopy(const char *name,
> >> flags &= CACHE_CREATE_MASK;
> >>
> >> /* Fail closed on bad usersize of useroffset values. */
> >>- if (WARN_ON(!usersize && useroffset) ||
> >>+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
> >>+ WARN_ON(!usersize && useroffset) ||
> >> WARN_ON(size < usersize || size - usersize < useroffset))
> >> usersize = useroffset = 0;
> >>
> >>@@ -640,6 +641,9 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
> >> align = max(align, size);
> >> s->align = calculate_alignment(flags, align, size);
> >>
> >>+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY))
> >>+ useroffset = usersize = 0;
> >>+
> >> s->useroffset = useroffset;
> >> s->usersize = usersize;
> >>
> >
> > "Always non-mergeable" is intentional here, but I do see the argument
> > for not doing it under hardened-usercopy.
> >
> > That said, if you keep this part, maybe go the full step and ifdef away
> > useroffset/usersize's struct member definition and other logic, especially
> > for SLUB_TINY benefits, so 2 ulongs are dropped from the cache struct?
>
> Okay, probably won't make much difference in practice, but for consistency...
> ----8<----
> From 3cdb7b6ad16a9d95603b482969fa870f996ac9dc Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <[email protected]>
> Date: Wed, 16 Nov 2022 15:56:32 +0100
> Subject: [PATCH] mm, slab: ignore hardened usercopy parameters when disabled
>
> With CONFIG_HARDENED_USERCOPY not enabled, there are no
> __check_heap_object() checks happening that would use the struct
> kmem_cache useroffset and usersize fields. Yet the fields are still
> initialized, preventing merging of otherwise compatible caches.
>
> Also the fields contribute to struct kmem_cache size unnecessarily when
> unused. Thus #ifdef them out completely when CONFIG_HARDENED_USERCOPY is
> disabled.
>
> In a quick virtme boot test, this has reduced the number of caches in
> /proc/slabinfo from 131 to 111.
>
> Cc: Kees Cook <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/slab_def.h | 2 ++
> include/linux/slub_def.h | 2 ++
> mm/slab.h | 2 --
> mm/slab_common.c | 9 ++++++++-
> mm/slub.c | 4 ++++
> 5 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index f0ffad6a3365..5834bad8ad78 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -80,8 +80,10 @@ struct kmem_cache {
> unsigned int *random_seq;
> #endif
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> unsigned int useroffset; /* Usercopy region offset */
> unsigned int usersize; /* Usercopy region size */
> +#endif
>
> struct kmem_cache_node *node[MAX_NUMNODES];
> };
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index f9c68a9dac04..7ed5e455cbf4 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -136,8 +136,10 @@ struct kmem_cache {
> struct kasan_cache kasan_info;
> #endif
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> unsigned int useroffset; /* Usercopy region offset */
> unsigned int usersize; /* Usercopy region size */
> +#endif
>
> struct kmem_cache_node *node[MAX_NUMNODES];
> };
> diff --git a/mm/slab.h b/mm/slab.h
> index 0202a8c2f0d2..db9a7984e22e 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -207,8 +207,6 @@ struct kmem_cache {
> unsigned int size; /* The aligned/padded/added on size */
> unsigned int align; /* Alignment as calculated */
> slab_flags_t flags; /* Active flags on the slab */
> - unsigned int useroffset;/* Usercopy region offset */
> - unsigned int usersize; /* Usercopy region size */
> const char *name; /* Slab name for sysfs */
> int refcount; /* Use counter */
> void (*ctor)(void *); /* Called on object slot creation */
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 0042fb2730d1..4339c839a452 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -143,8 +143,10 @@ int slab_unmergeable(struct kmem_cache *s)
> if (s->ctor)
> return 1;
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> if (s->usersize)
> return 1;
> +#endif
>
> /*
> * We may have set a slab to be unmergeable during bootstrap.
> @@ -223,8 +225,10 @@ static struct kmem_cache *create_cache(const char *name,
> s->size = s->object_size = object_size;
> s->align = align;
> s->ctor = ctor;
> +#ifdef CONFIG_HARDENED_USERCOPY
> s->useroffset = useroffset;
> s->usersize = usersize;
> +#endif
>
> err = __kmem_cache_create(s, flags);
> if (err)
> @@ -317,7 +321,8 @@ kmem_cache_create_usercopy(const char *name,
> flags &= CACHE_CREATE_MASK;
>
> /* Fail closed on bad usersize of useroffset values. */
> - if (WARN_ON(!usersize && useroffset) ||
> + if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
> + WARN_ON(!usersize && useroffset) ||
> WARN_ON(size < usersize || size - usersize < useroffset))
> usersize = useroffset = 0;
I think this change is no longer needed as slab_unmergeable()
now does not check usersize when CONFIG_HARDENED_USERCOPY=n?
> @@ -640,8 +645,10 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
> align = max(align, size);
> s->align = calculate_alignment(flags, align, size);
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> s->useroffset = useroffset;
> s->usersize = usersize;
> +#endif
>
> err = __kmem_cache_create(s, flags);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 157527d7101b..e32db8540767 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5502,11 +5502,13 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
> SLAB_ATTR_RO(cache_dma);
> #endif
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> static ssize_t usersize_show(struct kmem_cache *s, char *buf)
> {
> return sysfs_emit(buf, "%u\n", s->usersize);
> }
> SLAB_ATTR_RO(usersize);
> +#endif
>
> static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
> {
> @@ -5803,7 +5805,9 @@ static struct attribute *slab_attrs[] = {
> #ifdef CONFIG_FAILSLAB
> &failslab_attr.attr,
> #endif
> +#ifdef CONFIG_HARDENED_USERCOPY
> &usersize_attr.attr,
> +#endif
> #ifdef CONFIG_KFENCE
> &skip_kfence_attr.attr,
> #endif
> --
> 2.38.1
>
>
--
Thanks,
Hyeonggon
On 11/24/22 12:16, Hyeonggon Yoo wrote:
>> /* Fail closed on bad usersize of useroffset values. */
>> - if (WARN_ON(!usersize && useroffset) ||
>> + if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
>> + WARN_ON(!usersize && useroffset) ||
>> WARN_ON(size < usersize || size - usersize < useroffset))
>> usersize = useroffset = 0;
>
> I think this change is no longer needed as slab_unmergeable()
> now does not check usersize when CONFIG_HARDENED_USERCOPY=n?
True, but the code here still follows by
if (!usersize)
s = __kmem_cache_alias(name, size, align, flags, ctor);
So it seemed simplest just to leave it like that.
On Mon, Nov 21, 2022 at 06:11:54PM +0100, Vlastimil Babka wrote:
> SLUB will leave a number of slabs on the partial list even if they are
> empty, to avoid some slab freeing and reallocation. The goal of
> CONFIG_SLUB_TINY is to minimize memory overhead, so set the limits to 0
> for immediate slab page freeing.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slub.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index ab085aa2f1f0..917b79278bad 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -241,6 +241,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
> /* Enable to log cmpxchg failures */
> #undef SLUB_DEBUG_CMPXCHG
>
> +#ifndef CONFIG_SLUB_TINY
> /*
> * Minimum number of partial slabs. These will be left on the partial
> * lists even if they are empty. kmem_cache_shrink may reclaim them.
> @@ -253,6 +254,10 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
> * sort the partial list by the number of objects in use.
> */
> #define MAX_PARTIAL 10
> +#else
> +#define MIN_PARTIAL 0
> +#define MAX_PARTIAL 0
> +#endif
>
> #define DEBUG_DEFAULT_FLAGS (SLAB_CONSISTENCY_CHECKS | SLAB_RED_ZONE | \
> SLAB_POISON | SLAB_STORE_USER)
> --
> 2.38.1
>
Reviewed-by: Hyeonggon Yoo <[email protected]>
--
Thanks,
Hyeonggon
On Wed, Nov 23, 2022 at 03:23:15PM +0100, Vlastimil Babka wrote:
>
> On 11/21/22 22:35, Kees Cook wrote:
> > On November 21, 2022 9:11:51 AM PST, Vlastimil Babka <[email protected]> wrote:
> >>With CONFIG_HARDENED_USERCOPY not enabled, there are no
> >>__check_heap_object() checks happening that would use the kmem_cache
> >>useroffset and usersize fields. Yet the fields are still initialized,
> >>preventing merging of otherwise compatible caches. Thus ignore the
> >>values passed to cache creation and leave them zero when
> >>CONFIG_HARDENED_USERCOPY is disabled.
> >>
> >>In a quick virtme boot test, this has reduced the number of caches in
> >>/proc/slabinfo from 131 to 111.
> >>
> >>Cc: Kees Cook <[email protected]>
> >>Signed-off-by: Vlastimil Babka <[email protected]>
> >>---
> >> mm/slab_common.c | 6 +++++-
> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/mm/slab_common.c b/mm/slab_common.c
> >>index 0042fb2730d1..a8cb5de255fc 100644
> >>--- a/mm/slab_common.c
> >>+++ b/mm/slab_common.c
> >>@@ -317,7 +317,8 @@ kmem_cache_create_usercopy(const char *name,
> >> flags &= CACHE_CREATE_MASK;
> >>
> >> /* Fail closed on bad usersize of useroffset values. */
> >>- if (WARN_ON(!usersize && useroffset) ||
> >>+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
> >>+ WARN_ON(!usersize && useroffset) ||
> >> WARN_ON(size < usersize || size - usersize < useroffset))
> >> usersize = useroffset = 0;
> >>
> >>@@ -640,6 +641,9 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
> >> align = max(align, size);
> >> s->align = calculate_alignment(flags, align, size);
> >>
> >>+ if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY))
> >>+ useroffset = usersize = 0;
> >>+
> >> s->useroffset = useroffset;
> >> s->usersize = usersize;
> >>
> >
> > "Always non-mergeable" is intentional here, but I do see the argument
> > for not doing it under hardened-usercopy.
> >
> > That said, if you keep this part, maybe go the full step and ifdef away
> > useroffset/usersize's struct member definition and other logic, especially
> > for SLUB_TINY benefits, so 2 ulongs are dropped from the cache struct?
>
> Okay, probably won't make much difference in practice, but for consistency...
> ----8<----
> From 3cdb7b6ad16a9d95603b482969fa870f996ac9dc Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <[email protected]>
> Date: Wed, 16 Nov 2022 15:56:32 +0100
> Subject: [PATCH] mm, slab: ignore hardened usercopy parameters when disabled
>
> With CONFIG_HARDENED_USERCOPY not enabled, there are no
> __check_heap_object() checks happening that would use the struct
> kmem_cache useroffset and usersize fields. Yet the fields are still
> initialized, preventing merging of otherwise compatible caches.
>
> Also the fields contribute to struct kmem_cache size unnecessarily when
> unused. Thus #ifdef them out completely when CONFIG_HARDENED_USERCOPY is
> disabled.
>
> In a quick virtme boot test, this has reduced the number of caches in
> /proc/slabinfo from 131 to 111.
>
> Cc: Kees Cook <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/slab_def.h | 2 ++
> include/linux/slub_def.h | 2 ++
> mm/slab.h | 2 --
> mm/slab_common.c | 9 ++++++++-
> mm/slub.c | 4 ++++
> 5 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index f0ffad6a3365..5834bad8ad78 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -80,8 +80,10 @@ struct kmem_cache {
> unsigned int *random_seq;
> #endif
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> unsigned int useroffset; /* Usercopy region offset */
> unsigned int usersize; /* Usercopy region size */
> +#endif
>
> struct kmem_cache_node *node[MAX_NUMNODES];
> };
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index f9c68a9dac04..7ed5e455cbf4 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -136,8 +136,10 @@ struct kmem_cache {
> struct kasan_cache kasan_info;
> #endif
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> unsigned int useroffset; /* Usercopy region offset */
> unsigned int usersize; /* Usercopy region size */
> +#endif
>
> struct kmem_cache_node *node[MAX_NUMNODES];
> };
> diff --git a/mm/slab.h b/mm/slab.h
> index 0202a8c2f0d2..db9a7984e22e 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -207,8 +207,6 @@ struct kmem_cache {
> unsigned int size; /* The aligned/padded/added on size */
> unsigned int align; /* Alignment as calculated */
> slab_flags_t flags; /* Active flags on the slab */
> - unsigned int useroffset;/* Usercopy region offset */
> - unsigned int usersize; /* Usercopy region size */
> const char *name; /* Slab name for sysfs */
> int refcount; /* Use counter */
> void (*ctor)(void *); /* Called on object slot creation */
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 0042fb2730d1..4339c839a452 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -143,8 +143,10 @@ int slab_unmergeable(struct kmem_cache *s)
> if (s->ctor)
> return 1;
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> if (s->usersize)
> return 1;
> +#endif
>
> /*
> * We may have set a slab to be unmergeable during bootstrap.
> @@ -223,8 +225,10 @@ static struct kmem_cache *create_cache(const char *name,
> s->size = s->object_size = object_size;
> s->align = align;
> s->ctor = ctor;
> +#ifdef CONFIG_HARDENED_USERCOPY
> s->useroffset = useroffset;
> s->usersize = usersize;
> +#endif
>
> err = __kmem_cache_create(s, flags);
> if (err)
> @@ -317,7 +321,8 @@ kmem_cache_create_usercopy(const char *name,
> flags &= CACHE_CREATE_MASK;
>
> /* Fail closed on bad usersize of useroffset values. */
> - if (WARN_ON(!usersize && useroffset) ||
> + if (!IS_ENABLED(CONFIG_HARDENED_USERCOPY) ||
> + WARN_ON(!usersize && useroffset) ||
> WARN_ON(size < usersize || size - usersize < useroffset))
> usersize = useroffset = 0;
>
> @@ -640,8 +645,10 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
> align = max(align, size);
> s->align = calculate_alignment(flags, align, size);
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> s->useroffset = useroffset;
> s->usersize = usersize;
> +#endif
>
> err = __kmem_cache_create(s, flags);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 157527d7101b..e32db8540767 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5502,11 +5502,13 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
> SLAB_ATTR_RO(cache_dma);
> #endif
>
> +#ifdef CONFIG_HARDENED_USERCOPY
> static ssize_t usersize_show(struct kmem_cache *s, char *buf)
> {
> return sysfs_emit(buf, "%u\n", s->usersize);
> }
> SLAB_ATTR_RO(usersize);
> +#endif
>
> static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
> {
> @@ -5803,7 +5805,9 @@ static struct attribute *slab_attrs[] = {
> #ifdef CONFIG_FAILSLAB
> &failslab_attr.attr,
> #endif
> +#ifdef CONFIG_HARDENED_USERCOPY
> &usersize_attr.attr,
> +#endif
> #ifdef CONFIG_KFENCE
> &skip_kfence_attr.attr,
> #endif
> --
> 2.38.1
>
Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>
--
Thanks,
Hyeonggon
On Mon, Nov 21, 2022 at 06:11:50PM +0100, Vlastimil Babka wrote:
> Hi,
>
> this continues the discussion from [1]. Reasons to remove SLOB are
> outlined there and no-one has objected so far. The last patch of this
> series therefore deprecates CONFIG_SLOB and updates all the defconfigs
> using CONFIG_SLOB=y in the tree.
>
> There is a k210 board with 8MB RAM where switching to SLUB caused issues
> [2] and the lkp bot wasn't also happy about code bloat [3]. To address
> both, this series introduces CONFIG_SLUB_TINY to perform some rather
> low-hanging fruit modifications to SLUB to reduce its memory overhead.
> This seems to have been successful at least in the k210 case [4]. I
> consider this as an acceptable tradeoff for getting rid of SLOB.
>
> The series is also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-tiny-v1r2
>
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/
> [3] https://lore.kernel.org/all/Y25E9cJbhDAKi1vd@99bb1221be19/
> [4] https://lore.kernel.org/all/[email protected]/
>
> Vlastimil Babka (12):
> mm, slab: ignore hardened usercopy parameters when disabled
> mm, slub: add CONFIG_SLUB_TINY
> mm, slub: disable SYSFS support with CONFIG_SLUB_TINY
> mm, slub: retain no free slabs on partial list with CONFIG_SLUB_TINY
> mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY
> mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY
> mm, slab: ignore SLAB_RECLAIM_ACCOUNT with CONFIG_SLUB_TINY
> mm, slub: refactor free debug processing
> mm, slub: split out allocations from pre/post hooks
> mm, slub: remove percpu slabs with CONFIG_SLUB_TINY
> mm, slub: don't aggressively inline with CONFIG_SLUB_TINY
> mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED
>
> arch/arm/configs/clps711x_defconfig | 3 +-
> arch/arm/configs/collie_defconfig | 3 +-
> arch/arm/configs/multi_v4t_defconfig | 3 +-
> arch/arm/configs/omap1_defconfig | 3 +-
> arch/arm/configs/pxa_defconfig | 3 +-
> arch/arm/configs/tct_hammer_defconfig | 3 +-
> arch/arm/configs/xcep_defconfig | 3 +-
> arch/openrisc/configs/or1ksim_defconfig | 3 +-
> arch/openrisc/configs/simple_smp_defconfig | 3 +-
> arch/riscv/configs/nommu_k210_defconfig | 3 +-
> .../riscv/configs/nommu_k210_sdcard_defconfig | 3 +-
> arch/riscv/configs/nommu_virt_defconfig | 3 +-
> arch/sh/configs/rsk7201_defconfig | 3 +-
> arch/sh/configs/rsk7203_defconfig | 3 +-
> arch/sh/configs/se7206_defconfig | 3 +-
> arch/sh/configs/shmin_defconfig | 3 +-
> arch/sh/configs/shx3_defconfig | 3 +-
> include/linux/slab.h | 8 +
> include/linux/slub_def.h | 6 +-
> kernel/configs/tiny.config | 5 +-
> mm/Kconfig | 38 +-
> mm/Kconfig.debug | 2 +-
> mm/slab_common.c | 16 +-
> mm/slub.c | 415 ++++++++++++------
> 24 files changed, 377 insertions(+), 164 deletions(-)
For the series
Acked-by: Mike Rapoport <[email protected]>
--
Sincerely yours,
Mike.
On Mon, Nov 21, 2022 at 06:12:00PM +0100, Vlastimil Babka wrote:
> SLUB gets most of its scalability by percpu slabs. However for
> CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
> Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
> associated code. Additionally to the slab page savings, this reduces
> percpu allocator usage, and code size.
[+Cc Dennis]
Wondering if we can reduce (or zero) early reservation of percpu area
when #if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_TINY)?
> This change builds on recent commit c7323a5ad078 ("mm/slub: restrict
> sysfs validation to debug caches and make it safe"), as caches with
> enabled debugging also avoid percpu slabs and all allocations and
> freeing ends up working with the partial list. With a bit more
> refactoring by the preceding patches, use the same code paths with
> CONFIG_SLUB_TINY.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
--
Thanks,
Hyeonggon
On Mon, Nov 21, 2022 at 06:11:58PM +0100, Vlastimil Babka wrote:
> Since commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> caches and make it safe"), caches with debugging enabled use the
> free_debug_processing() function to do both freeing checks and actual
> freeing to partial list under list_lock, bypassing the fast paths.
>
> We will want to use the same path for CONFIG_SLUB_TINY, but without the
> debugging checks, so refactor the code so that free_debug_processing()
> does only the checks, while the freeing is handled by a new function
> free_to_partial_list().
>
> For consistency, change return parameter alloc_debug_processing() from
> int to bool and correct the !SLUB_DEBUG variant to return true and not
> false. This didn't matter until now, but will in the following changes.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slub.c | 154 +++++++++++++++++++++++++++++-------------------------
> 1 file changed, 83 insertions(+), 71 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index bf726dd00f7d..fd56d7cca9c2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1368,7 +1368,7 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
> return 1;
> }
>
> -static noinline int alloc_debug_processing(struct kmem_cache *s,
> +static noinline bool alloc_debug_processing(struct kmem_cache *s,
> struct slab *slab, void *object, int orig_size)
> {
> if (s->flags & SLAB_CONSISTENCY_CHECKS) {
> @@ -1380,7 +1380,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
> trace(s, slab, object, 1);
> set_orig_size(s, object, orig_size);
> init_object(s, object, SLUB_RED_ACTIVE);
> - return 1;
> + return true;
>
> bad:
> if (folio_test_slab(slab_folio(slab))) {
> @@ -1393,7 +1393,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
> slab->inuse = slab->objects;
> slab->freelist = NULL;
> }
> - return 0;
> + return false;
> }
>
> static inline int free_consistency_checks(struct kmem_cache *s,
> @@ -1646,17 +1646,17 @@ static inline void setup_object_debug(struct kmem_cache *s, void *object) {}
> static inline
> void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
>
> -static inline int alloc_debug_processing(struct kmem_cache *s,
> - struct slab *slab, void *object, int orig_size) { return 0; }
> +static inline bool alloc_debug_processing(struct kmem_cache *s,
> + struct slab *slab, void *object, int orig_size) { return true; }
>
> -static inline void free_debug_processing(
> - struct kmem_cache *s, struct slab *slab,
> - void *head, void *tail, int bulk_cnt,
> - unsigned long addr) {}
> +static inline bool free_debug_processing(struct kmem_cache *s,
> + struct slab *slab, void *head, void *tail, int *bulk_cnt,
> + unsigned long addr, depot_stack_handle_t handle) { return true; }
>
> static inline void slab_pad_check(struct kmem_cache *s, struct slab *slab) {}
> static inline int check_object(struct kmem_cache *s, struct slab *slab,
> void *object, u8 val) { return 1; }
> +static inline depot_stack_handle_t set_track_prepare(void) { return 0; }
> static inline void set_track(struct kmem_cache *s, void *object,
> enum track_item alloc, unsigned long addr) {}
> static inline void add_full(struct kmem_cache *s, struct kmem_cache_node *n,
> @@ -2833,38 +2833,28 @@ static inline unsigned long node_nr_objs(struct kmem_cache_node *n)
> }
>
> /* Supports checking bulk free of a constructed freelist */
> -static noinline void free_debug_processing(
> - struct kmem_cache *s, struct slab *slab,
> - void *head, void *tail, int bulk_cnt,
> - unsigned long addr)
> +static inline bool free_debug_processing(struct kmem_cache *s,
> + struct slab *slab, void *head, void *tail, int *bulk_cnt,
> + unsigned long addr, depot_stack_handle_t handle)
> {
> - struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> - struct slab *slab_free = NULL;
> + bool checks_ok = false;
> void *object = head;
> int cnt = 0;
> - unsigned long flags;
> - bool checks_ok = false;
> - depot_stack_handle_t handle = 0;
> -
> - if (s->flags & SLAB_STORE_USER)
> - handle = set_track_prepare();
> -
> - spin_lock_irqsave(&n->list_lock, flags);
>
> if (s->flags & SLAB_CONSISTENCY_CHECKS) {
> if (!check_slab(s, slab))
> goto out;
> }
>
> - if (slab->inuse < bulk_cnt) {
> + if (slab->inuse < *bulk_cnt) {
> slab_err(s, slab, "Slab has %d allocated objects but %d are to be freed\n",
> - slab->inuse, bulk_cnt);
> + slab->inuse, *bulk_cnt);
> goto out;
> }
>
> next_object:
>
> - if (++cnt > bulk_cnt)
> + if (++cnt > *bulk_cnt)
> goto out_cnt;
>
> if (s->flags & SLAB_CONSISTENCY_CHECKS) {
> @@ -2886,57 +2876,18 @@ static noinline void free_debug_processing(
> checks_ok = true;
>
> out_cnt:
> - if (cnt != bulk_cnt)
> + if (cnt != *bulk_cnt) {
> slab_err(s, slab, "Bulk free expected %d objects but found %d\n",
> - bulk_cnt, cnt);
> -
> -out:
> - if (checks_ok) {
> - void *prior = slab->freelist;
> -
> - /* Perform the actual freeing while we still hold the locks */
> - slab->inuse -= cnt;
> - set_freepointer(s, tail, prior);
> - slab->freelist = head;
> -
> - /*
> - * If the slab is empty, and node's partial list is full,
> - * it should be discarded anyway no matter it's on full or
> - * partial list.
> - */
> - if (slab->inuse == 0 && n->nr_partial >= s->min_partial)
> - slab_free = slab;
> -
> - if (!prior) {
> - /* was on full list */
> - remove_full(s, n, slab);
> - if (!slab_free) {
> - add_partial(n, slab, DEACTIVATE_TO_TAIL);
> - stat(s, FREE_ADD_PARTIAL);
> - }
> - } else if (slab_free) {
> - remove_partial(n, slab);
> - stat(s, FREE_REMOVE_PARTIAL);
> - }
> + *bulk_cnt, cnt);
> + *bulk_cnt = cnt;
> }
>
> - if (slab_free) {
> - /*
> - * Update the counters while still holding n->list_lock to
> - * prevent spurious validation warnings
> - */
> - dec_slabs_node(s, slab_nid(slab_free), slab_free->objects);
> - }
> -
> - spin_unlock_irqrestore(&n->list_lock, flags);
> +out:
>
> if (!checks_ok)
> slab_fix(s, "Object at 0x%p not freed", object);
>
> - if (slab_free) {
> - stat(s, FREE_SLAB);
> - free_slab(s, slab_free);
> - }
> + return checks_ok;
> }
> #endif /* CONFIG_SLUB_DEBUG */
>
> @@ -3453,6 +3404,67 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
> }
> EXPORT_SYMBOL(kmem_cache_alloc_node);
>
> +static noinline void free_to_partial_list(
> + struct kmem_cache *s, struct slab *slab,
> + void *head, void *tail, int bulk_cnt,
> + unsigned long addr)
> +{
> + struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> + struct slab *slab_free = NULL;
> + int cnt = bulk_cnt;
> + unsigned long flags;
> + depot_stack_handle_t handle = 0;
> +
> + if (s->flags & SLAB_STORE_USER)
> + handle = set_track_prepare();
> +
> + spin_lock_irqsave(&n->list_lock, flags);
> +
> + if (free_debug_processing(s, slab, head, tail, &cnt, addr, handle)) {
> + void *prior = slab->freelist;
> +
> + /* Perform the actual freeing while we still hold the locks */
> + slab->inuse -= cnt;
> + set_freepointer(s, tail, prior);
> + slab->freelist = head;
> +
> + /*
> + * If the slab is empty, and node's partial list is full,
> + * it should be discarded anyway no matter it's on full or
> + * partial list.
> + */
> + if (slab->inuse == 0 && n->nr_partial >= s->min_partial)
> + slab_free = slab;
> +
> + if (!prior) {
> + /* was on full list */
> + remove_full(s, n, slab);
> + if (!slab_free) {
> + add_partial(n, slab, DEACTIVATE_TO_TAIL);
> + stat(s, FREE_ADD_PARTIAL);
> + }
> + } else if (slab_free) {
> + remove_partial(n, slab);
> + stat(s, FREE_REMOVE_PARTIAL);
> + }
> + }
> +
> + if (slab_free) {
> + /*
> + * Update the counters while still holding n->list_lock to
> + * prevent spurious validation warnings
> + */
> + dec_slabs_node(s, slab_nid(slab_free), slab_free->objects);
> + }
> +
> + spin_unlock_irqrestore(&n->list_lock, flags);
> +
> + if (slab_free) {
> + stat(s, FREE_SLAB);
> + free_slab(s, slab_free);
> + }
> +}
> +
> /*
> * Slow path handling. This may still be called frequently since objects
> * have a longer lifetime than the cpu slabs in most processing loads.
> @@ -3479,7 +3491,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
> return;
>
> if (kmem_cache_debug(s)) {
> - free_debug_processing(s, slab, head, tail, cnt, addr);
> + free_to_partial_list(s, slab, head, tail, cnt, addr);
> return;
> }
>
> --
> 2.38.1
>
Looks good to me.
Reviewed-by: Hyeonggon Yoo <[email protected]>
--
Thanks,
Hyeonggon
On 11/21/22 18:11, Vlastimil Babka wrote:
> SLAB_RECLAIM_ACCOUNT caches allocate their slab pages with
> __GFP_RECLAIMABLE and can help against fragmentation by grouping pages
> by mobility, but on tiny systems mobility grouping is likely disabled
> anyway and ignoring SLAB_RECLAIM_ACCOUNT might instead lead to merging
> of caches that are made incompatible just by the flag.
>
> Thus with CONFIG_SLUB_TINY, make SLAB_RECLAIM_ACCOUNT ineffective.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/slab.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 3ce9474c90ab..1cbbda03ad06 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -129,7 +129,11 @@
>
> /* The following flags affect the page allocator grouping pages by mobility */
> /* Objects are reclaimable */
> +#ifndef CONFIG_SLUB_TINY
> #define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U)
> +#else
> +#define SLAB_RECLAIM_ACCOUNT 0
Updating the last line above to:
#define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0)
In response to:
https://lore.kernel.org/all/[email protected]/
Yeah it probably means that the other pre-existing flag variants that
#define to 0 should be also adjusted to avoid these issues, but not as part
of this series.
> +#endif
> #define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */
>
> /*
On Mon, Nov 21, 2022 at 06:12:01PM +0100, Vlastimil Babka wrote:
> SLUB fastpaths use __always_inline to avoid function calls. With
> CONFIG_SLUB_TINY we would rather save the memory. Add a
> __fastpath_inline macro that's __always_inline normally but empty with
> CONFIG_SLUB_TINY.
>
> bloat-o-meter results on x86_64 mm/slub.o:
>
> add/remove: 3/1 grow/shrink: 1/8 up/down: 865/-1784 (-919)
> Function old new delta
> kmem_cache_free 20 281 +261
> slab_alloc_node.isra - 245 +245
> slab_free.constprop.isra - 231 +231
> __kmem_cache_alloc_lru.isra - 128 +128
> __kmem_cache_release 88 83 -5
> __kmem_cache_create 1446 1436 -10
> __kmem_cache_free 271 142 -129
> kmem_cache_alloc_node 330 127 -203
> kmem_cache_free_bulk.part 826 613 -213
> __kmem_cache_alloc_node 230 10 -220
> kmem_cache_alloc_lru 325 12 -313
> kmem_cache_alloc 325 10 -315
> kmem_cache_free.part 376 - -376
> Total: Before=26103, After=25184, chg -3.52%
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slub.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 7f1cd702c3b4..d54466e76503 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -187,6 +187,12 @@ do { \
> #define USE_LOCKLESS_FAST_PATH() (false)
> #endif
>
> +#ifndef CONFIG_SLUB_TINY
> +#define __fastpath_inline __always_inline
> +#else
> +#define __fastpath_inline
> +#endif
> +
> #ifdef CONFIG_SLUB_DEBUG
> #ifdef CONFIG_SLUB_DEBUG_ON
> DEFINE_STATIC_KEY_TRUE(slub_debug_enabled);
> @@ -3386,7 +3392,7 @@ static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s,
> *
> * Otherwise we can simply pick the next object from the lockless free list.
> */
> -static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
> +static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
> gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
> {
> void *object;
> @@ -3412,13 +3418,13 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l
> return object;
> }
>
> -static __always_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
> +static __fastpath_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
> gfp_t gfpflags, unsigned long addr, size_t orig_size)
> {
> return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size);
> }
>
> -static __always_inline
> +static __fastpath_inline
> void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> gfp_t gfpflags)
> {
> @@ -3733,7 +3739,7 @@ static void do_slab_free(struct kmem_cache *s,
> }
> #endif /* CONFIG_SLUB_TINY */
>
> -static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab,
> +static __fastpath_inline void slab_free(struct kmem_cache *s, struct slab *slab,
> void *head, void *tail, void **p, int cnt,
> unsigned long addr)
> {
> --
> 2.38.1
Acked-by: Hyeonggon Yoo <[email protected]>
--
Thanks,
Hyeonggon
On Mon, 21 Nov 2022 09:12:02 PST (-0800), [email protected] wrote:
> As explained in [1], we would like to remove SLOB if possible.
>
> - There are no known users that need its somewhat lower memory footprint
> so much that they cannot handle SLUB (after some modifications by the
> previous patches) instead.
>
> - It is an extra maintenance burden, and a number of features are
> incompatible with it.
>
> - It blocks the API improvement of allowing kfree() on objects allocated
> via kmem_cache_alloc().
>
> As the first step, rename the CONFIG_SLOB option in the slab allocator
> configuration choice to CONFIG_SLOB_DEPRECATED. Add CONFIG_SLOB
> depending on CONFIG_SLOB_DEPRECATED as an internal option to avoid code
> churn. This will cause existing .config files and defconfigs with
> CONFIG_SLOB=y to silently switch to the default (and recommended
> replacement) SLUB, while still allowing SLOB to be configured by anyone
> that notices and needs it. But those should contact the slab maintainers
> and [email protected] as explained in the updated help. With no valid
> objections, the plan is to update the existing defconfigs to SLUB and
> remove SLOB in a few cycles.
>
> To make SLUB more suitable replacement for SLOB, a CONFIG_SLUB_TINY
> option was introduced to limit SLUB's memory overhead.
> There is a number of defconfigs specifying CONFIG_SLOB=y. As part of
> this patch, update them to select CONFIG_SLUB and CONFIG_SLUB_TINY.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Cc: Russell King <[email protected]>
> Cc: Aaro Koskinen <[email protected]>
> Cc: Janusz Krzysztofik <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Jonas Bonn <[email protected]>
> Cc: Stefan Kristiansson <[email protected]>
> Cc: Stafford Horne <[email protected]>
> Cc: Yoshinori Sato <[email protected]>
> Cc: Rich Felker <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Conor Dooley <[email protected]>
> Cc: Damien Le Moal <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> arch/arm/configs/clps711x_defconfig | 3 ++-
> arch/arm/configs/collie_defconfig | 3 ++-
> arch/arm/configs/multi_v4t_defconfig | 3 ++-
> arch/arm/configs/omap1_defconfig | 3 ++-
> arch/arm/configs/pxa_defconfig | 3 ++-
> arch/arm/configs/tct_hammer_defconfig | 3 ++-
> arch/arm/configs/xcep_defconfig | 3 ++-
> arch/openrisc/configs/or1ksim_defconfig | 3 ++-
> arch/openrisc/configs/simple_smp_defconfig | 3 ++-
> arch/riscv/configs/nommu_k210_defconfig | 3 ++-
> arch/riscv/configs/nommu_k210_sdcard_defconfig | 3 ++-
> arch/riscv/configs/nommu_virt_defconfig | 3 ++-
> arch/sh/configs/rsk7201_defconfig | 3 ++-
> arch/sh/configs/rsk7203_defconfig | 3 ++-
> arch/sh/configs/se7206_defconfig | 3 ++-
> arch/sh/configs/shmin_defconfig | 3 ++-
> arch/sh/configs/shx3_defconfig | 3 ++-
> kernel/configs/tiny.config | 5 +++--
> mm/Kconfig | 17 +++++++++++++++--
> 19 files changed, 52 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm/configs/clps711x_defconfig b/arch/arm/configs/clps711x_defconfig
> index 92481b2a88fa..adcee238822a 100644
> --- a/arch/arm/configs/clps711x_defconfig
> +++ b/arch/arm/configs/clps711x_defconfig
> @@ -14,7 +14,8 @@ CONFIG_ARCH_EDB7211=y
> CONFIG_ARCH_P720T=y
> CONFIG_AEABI=y
> # CONFIG_COREDUMP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/collie_defconfig b/arch/arm/configs/collie_defconfig
> index 2a2d2cb3ce2e..69341c33e0cc 100644
> --- a/arch/arm/configs/collie_defconfig
> +++ b/arch/arm/configs/collie_defconfig
> @@ -13,7 +13,8 @@ CONFIG_CMDLINE="noinitrd root=/dev/mtdblock2 rootfstype=jffs2 fbcon=rotate:1"
> CONFIG_FPE_NWFPE=y
> CONFIG_PM=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/multi_v4t_defconfig b/arch/arm/configs/multi_v4t_defconfig
> index e2fd822f741a..b60000a89aff 100644
> --- a/arch/arm/configs/multi_v4t_defconfig
> +++ b/arch/arm/configs/multi_v4t_defconfig
> @@ -25,7 +25,8 @@ CONFIG_ARM_CLPS711X_CPUIDLE=y
> CONFIG_JUMP_LABEL=y
> CONFIG_PARTITION_ADVANCED=y
> # CONFIG_COREDUMP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MTD=y
> CONFIG_MTD_CMDLINE_PARTS=y
> CONFIG_MTD_BLOCK=y
> diff --git a/arch/arm/configs/omap1_defconfig b/arch/arm/configs/omap1_defconfig
> index 70511fe4b3ec..246f1bba7df5 100644
> --- a/arch/arm/configs/omap1_defconfig
> +++ b/arch/arm/configs/omap1_defconfig
> @@ -42,7 +42,8 @@ CONFIG_MODULE_FORCE_UNLOAD=y
> CONFIG_PARTITION_ADVANCED=y
> CONFIG_BINFMT_MISC=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> CONFIG_NET=y
> CONFIG_PACKET=y
> diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig
> index d60cc9cc4c21..0a0f12df40b5 100644
> --- a/arch/arm/configs/pxa_defconfig
> +++ b/arch/arm/configs/pxa_defconfig
> @@ -49,7 +49,8 @@ CONFIG_PARTITION_ADVANCED=y
> CONFIG_LDM_PARTITION=y
> CONFIG_CMDLINE_PARTITION=y
> CONFIG_BINFMT_MISC=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_COMPACTION is not set
> CONFIG_NET=y
> CONFIG_PACKET=y
> diff --git a/arch/arm/configs/tct_hammer_defconfig b/arch/arm/configs/tct_hammer_defconfig
> index 3b29ae1fb750..6bd38b6f22c4 100644
> --- a/arch/arm/configs/tct_hammer_defconfig
> +++ b/arch/arm/configs/tct_hammer_defconfig
> @@ -19,7 +19,8 @@ CONFIG_FPE_NWFPE=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/xcep_defconfig b/arch/arm/configs/xcep_defconfig
> index ea59e4b6bfc5..6bd9f71b71fc 100644
> --- a/arch/arm/configs/xcep_defconfig
> +++ b/arch/arm/configs/xcep_defconfig
> @@ -26,7 +26,8 @@ CONFIG_MODULE_UNLOAD=y
> CONFIG_MODVERSIONS=y
> CONFIG_MODULE_SRCVERSION_ALL=y
> # CONFIG_BLOCK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_COMPAT_BRK is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> CONFIG_NET=y
> diff --git a/arch/openrisc/configs/or1ksim_defconfig b/arch/openrisc/configs/or1ksim_defconfig
> index 6e1e004047c7..0116e465238f 100644
> --- a/arch/openrisc/configs/or1ksim_defconfig
> +++ b/arch/openrisc/configs/or1ksim_defconfig
> @@ -10,7 +10,8 @@ CONFIG_EXPERT=y
> # CONFIG_AIO is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MODULES=y
> # CONFIG_BLOCK is not set
> CONFIG_OPENRISC_BUILTIN_DTB="or1ksim"
> diff --git a/arch/openrisc/configs/simple_smp_defconfig b/arch/openrisc/configs/simple_smp_defconfig
> index ff49d868e040..b990cb6c9309 100644
> --- a/arch/openrisc/configs/simple_smp_defconfig
> +++ b/arch/openrisc/configs/simple_smp_defconfig
> @@ -16,7 +16,8 @@ CONFIG_EXPERT=y
> # CONFIG_AIO is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MODULES=y
> # CONFIG_BLOCK is not set
> CONFIG_OPENRISC_BUILTIN_DTB="simple_smp"
> diff --git a/arch/riscv/configs/nommu_k210_defconfig b/arch/riscv/configs/nommu_k210_defconfig
> index 96fe8def644c..79b3ccd58ff0 100644
> --- a/arch/riscv/configs/nommu_k210_defconfig
> +++ b/arch/riscv/configs/nommu_k210_defconfig
> @@ -25,7 +25,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EMBEDDED=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_CANAAN=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/riscv/configs/nommu_k210_sdcard_defconfig b/arch/riscv/configs/nommu_k210_sdcard_defconfig
> index 379740654373..6b80bb13b8ed 100644
> --- a/arch/riscv/configs/nommu_k210_sdcard_defconfig
> +++ b/arch/riscv/configs/nommu_k210_sdcard_defconfig
> @@ -17,7 +17,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EMBEDDED=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_CANAAN=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
> index 1a56eda5ce46..4cf0f297091e 100644
> --- a/arch/riscv/configs/nommu_virt_defconfig
> +++ b/arch/riscv/configs/nommu_virt_defconfig
> @@ -22,7 +22,8 @@ CONFIG_EXPERT=y
> # CONFIG_KALLSYMS is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_VIRT=y
> CONFIG_NONPORTABLE=y
Acked-by: Palmer Dabbelt <[email protected]>
Though I don't have a K210 to test against, maybe Damien still does?
> diff --git a/arch/sh/configs/rsk7201_defconfig b/arch/sh/configs/rsk7201_defconfig
> index 619c18699459..376e95fa77bc 100644
> --- a/arch/sh/configs/rsk7201_defconfig
> +++ b/arch/sh/configs/rsk7201_defconfig
> @@ -10,7 +10,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_BLK_DEV_INITRD=y
> # CONFIG_AIO is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> # CONFIG_BLK_DEV_BSG is not set
> diff --git a/arch/sh/configs/rsk7203_defconfig b/arch/sh/configs/rsk7203_defconfig
> index d00fafc021e1..1d5fd67a3949 100644
> --- a/arch/sh/configs/rsk7203_defconfig
> +++ b/arch/sh/configs/rsk7203_defconfig
> @@ -11,7 +11,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_KALLSYMS_ALL=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> # CONFIG_BLK_DEV_BSG is not set
> diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
> index 122216123e63..78e0e7be57ee 100644
> --- a/arch/sh/configs/se7206_defconfig
> +++ b/arch/sh/configs/se7206_defconfig
> @@ -21,7 +21,8 @@ CONFIG_BLK_DEV_INITRD=y
> CONFIG_KALLSYMS_ALL=y
> # CONFIG_ELF_CORE is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> diff --git a/arch/sh/configs/shmin_defconfig b/arch/sh/configs/shmin_defconfig
> index c0b6f40d01cc..e078b193a78a 100644
> --- a/arch/sh/configs/shmin_defconfig
> +++ b/arch/sh/configs/shmin_defconfig
> @@ -9,7 +9,8 @@ CONFIG_LOG_BUF_SHIFT=14
> # CONFIG_FUTEX is not set
> # CONFIG_EPOLL is not set
> # CONFIG_SHMEM is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_BLK_DEV_BSG is not set
> CONFIG_CPU_SUBTYPE_SH7706=y
> CONFIG_MEMORY_START=0x0c000000
> diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
> index 32ec6eb1eabc..aa353dff7f19 100644
> --- a/arch/sh/configs/shx3_defconfig
> +++ b/arch/sh/configs/shx3_defconfig
> @@ -20,7 +20,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
> CONFIG_KALLSYMS_ALL=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_KPROBES=y
> CONFIG_MODULES=y
> diff --git a/kernel/configs/tiny.config b/kernel/configs/tiny.config
> index 8a44b93da0f3..c2f9c912df1c 100644
> --- a/kernel/configs/tiny.config
> +++ b/kernel/configs/tiny.config
> @@ -7,5 +7,6 @@ CONFIG_KERNEL_XZ=y
> # CONFIG_KERNEL_LZO is not set
> # CONFIG_KERNEL_LZ4 is not set
> # CONFIG_SLAB is not set
> -# CONFIG_SLUB is not set
> -CONFIG_SLOB=y
> +# CONFIG_SLOB_DEPRECATED is not set
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 5941cb34e30d..dcc49c69552f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -219,17 +219,30 @@ config SLUB
> and has enhanced diagnostics. SLUB is the default choice for
> a slab allocator.
>
> -config SLOB
> +config SLOB_DEPRECATED
> depends on EXPERT
> - bool "SLOB (Simple Allocator)"
> + bool "SLOB (Simple Allocator - DEPRECATED)"
> depends on !PREEMPT_RT
> help
> + Deprecated and scheduled for removal in a few cycles. SLUB
> + recommended as replacement. CONFIG_SLUB_TINY can be considered
> + on systems with 16MB or less RAM.
> +
> + If you need SLOB to stay, please contact [email protected] and
> + people listed in the SLAB ALLOCATOR section of MAINTAINERS file,
> + with your use case.
> +
> SLOB replaces the stock allocator with a drastically simpler
> allocator. SLOB is generally more space efficient but
> does not perform as well on large systems.
>
> endchoice
>
> +config SLOB
> + bool
> + default y
> + depends on SLOB_DEPRECATED
> +
> config SLUB_TINY
> bool "Configure SLUB for minimal memory footprint"
> depends on SLUB && EXPERT
On 12/3/22 02:59, Palmer Dabbelt wrote:
[...]
>> diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
>> index 1a56eda5ce46..4cf0f297091e 100644
>> --- a/arch/riscv/configs/nommu_virt_defconfig
>> +++ b/arch/riscv/configs/nommu_virt_defconfig
>> @@ -22,7 +22,8 @@ CONFIG_EXPERT=y
>> # CONFIG_KALLSYMS is not set
>> # CONFIG_VM_EVENT_COUNTERS is not set
>> # CONFIG_COMPAT_BRK is not set
>> -CONFIG_SLOB=y
>> +CONFIG_SLUB=y
>> +CONFIG_SLUB_TINY=y
>> # CONFIG_MMU is not set
>> CONFIG_SOC_VIRT=y
>> CONFIG_NONPORTABLE=y
>
> Acked-by: Palmer Dabbelt <[email protected]>
>
> Though I don't have a K210 to test against, maybe Damien still does?
I did test and it is OK.
--
Damien Le Moal
Western Digital Research
On 11/27/22 12:05, Hyeonggon Yoo wrote:
> On Mon, Nov 21, 2022 at 06:12:00PM +0100, Vlastimil Babka wrote:
>> SLUB gets most of its scalability by percpu slabs. However for
>> CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
>> Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
>> associated code. Additionally to the slab page savings, this reduces
>> percpu allocator usage, and code size.
>
> [+Cc Dennis]
+To: Baoquan also.
> Wondering if we can reduce (or zero) early reservation of percpu area
> when #if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_TINY)?
Good point. I've sent a PR as it was [1], but (if merged) we can still
improve that during RC series, if it means more memory saved thanks to less
percpu usage with CONFIG_SLUB_TINY.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/tag/?h=slab-for-6.2-rc1
>> This change builds on recent commit c7323a5ad078 ("mm/slub: restrict
>> sysfs validation to debug caches and make it safe"), as caches with
>> enabled debugging also avoid percpu slabs and all allocations and
>> freeing ends up working with the partial list. With a bit more
>> refactoring by the preceding patches, use the same code paths with
>> CONFIG_SLUB_TINY.
>>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>
Hello,
On Mon, Dec 12, 2022 at 11:54:28AM +0100, Vlastimil Babka wrote:
> On 11/27/22 12:05, Hyeonggon Yoo wrote:
> > On Mon, Nov 21, 2022 at 06:12:00PM +0100, Vlastimil Babka wrote:
> >> SLUB gets most of its scalability by percpu slabs. However for
> >> CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
> >> Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
> >> associated code. Additionally to the slab page savings, this reduces
> >> percpu allocator usage, and code size.
> >
> > [+Cc Dennis]
>
> +To: Baoquan also.
>
> > Wondering if we can reduce (or zero) early reservation of percpu area
> > when #if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_TINY)?
>
> Good point. I've sent a PR as it was [1], but (if merged) we can still
> improve that during RC series, if it means more memory saved thanks to less
> percpu usage with CONFIG_SLUB_TINY.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/tag/?h=slab-for-6.2-rc1
The early reservation area not used at boot is then used to serve normal
percpu allocations. Percpu allocates additional chunks based on a free
page float count and is backed page by page, not all at once. I get
slabs is the main motivator of early reservation, but if there are other
users of percpu, then shrinking the early reservation area is a bit
moot.
Thanks,
Dennis
>
> >> This change builds on recent commit c7323a5ad078 ("mm/slub: restrict
> >> sysfs validation to debug caches and make it safe"), as caches with
> >> enabled debugging also avoid percpu slabs and all allocations and
> >> freeing ends up working with the partial list. With a bit more
> >> refactoring by the preceding patches, use the same code paths with
> >> CONFIG_SLUB_TINY.
> >>
> >> Signed-off-by: Vlastimil Babka <[email protected]>
> >
>
On 12/12/22 at 05:11am, Dennis Zhou wrote:
> Hello,
>
> On Mon, Dec 12, 2022 at 11:54:28AM +0100, Vlastimil Babka wrote:
> > On 11/27/22 12:05, Hyeonggon Yoo wrote:
> > > On Mon, Nov 21, 2022 at 06:12:00PM +0100, Vlastimil Babka wrote:
> > >> SLUB gets most of its scalability by percpu slabs. However for
> > >> CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
> > >> Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
> > >> associated code. Additionally to the slab page savings, this reduces
> > >> percpu allocator usage, and code size.
> > >
> > > [+Cc Dennis]
> >
> > +To: Baoquan also.
Thanks for adding me.
> >
> > > Wondering if we can reduce (or zero) early reservation of percpu area
> > > when #if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_TINY)?
> >
> > Good point. I've sent a PR as it was [1], but (if merged) we can still
> > improve that during RC series, if it means more memory saved thanks to less
> > percpu usage with CONFIG_SLUB_TINY.
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/tag/?h=slab-for-6.2-rc1
>
> The early reservation area not used at boot is then used to serve normal
> percpu allocations. Percpu allocates additional chunks based on a free
> page float count and is backed page by page, not all at once. I get
> slabs is the main motivator of early reservation, but if there are other
> users of percpu, then shrinking the early reservation area is a bit
> moot.
Agree. Before kmem_cache_init() is done, anyone calling alloc_percpu()
can only get allocation done from early reservatoin of percpu area.
So, unless we can make sure nobody need to call alloc_percpu() before
kmem_cache_init() now and future.
The only drawback of early reservation is it's not so flexible. We can
only dynamically create chunk to increase percpu areas when early
reservation is run out, but can't shrink early reservation if system
doesn't need that much.
So we may need weigh the two ideas:
- Not allowing to alloc_percpu() before kmem_cache_init();
- Keep early reservation, and think of a economic value for
CONFIG_SLUB_TINY.
start_kernel()
->setup_per_cpu_areas();
......
->mm_init();
......
-->kmem_cache_init();
__alloc_percpu()
-->pcpu_alloc()
--> succeed to allocate from early reservation
or
-->pcpu_create_chunk()
-->pcpu_alloc_chunk()
-->pcpu_mem_zalloc()
On Mon, Nov 21, 2022 at 06:12:02PM +0100, Vlastimil Babka wrote:
> As explained in [1], we would like to remove SLOB if possible.
>
> - There are no known users that need its somewhat lower memory footprint
> so much that they cannot handle SLUB (after some modifications by the
> previous patches) instead.
>
> - It is an extra maintenance burden, and a number of features are
> incompatible with it.
>
> - It blocks the API improvement of allowing kfree() on objects allocated
> via kmem_cache_alloc().
>
> As the first step, rename the CONFIG_SLOB option in the slab allocator
> configuration choice to CONFIG_SLOB_DEPRECATED. Add CONFIG_SLOB
> depending on CONFIG_SLOB_DEPRECATED as an internal option to avoid code
> churn. This will cause existing .config files and defconfigs with
> CONFIG_SLOB=y to silently switch to the default (and recommended
> replacement) SLUB, while still allowing SLOB to be configured by anyone
> that notices and needs it. But those should contact the slab maintainers
> and [email protected] as explained in the updated help. With no valid
> objections, the plan is to update the existing defconfigs to SLUB and
> remove SLOB in a few cycles.
>
> To make SLUB more suitable replacement for SLOB, a CONFIG_SLUB_TINY
> option was introduced to limit SLUB's memory overhead.
> There is a number of defconfigs specifying CONFIG_SLOB=y. As part of
> this patch, update them to select CONFIG_SLUB and CONFIG_SLUB_TINY.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Cc: Russell King <[email protected]>
> Cc: Aaro Koskinen <[email protected]>
> Cc: Janusz Krzysztofik <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Jonas Bonn <[email protected]>
> Cc: Stefan Kristiansson <[email protected]>
> Cc: Stafford Horne <[email protected]>
> Cc: Yoshinori Sato <[email protected]>
> Cc: Rich Felker <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Conor Dooley <[email protected]>
> Cc: Damien Le Moal <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> arch/arm/configs/clps711x_defconfig | 3 ++-
> arch/arm/configs/collie_defconfig | 3 ++-
> arch/arm/configs/multi_v4t_defconfig | 3 ++-
> arch/arm/configs/omap1_defconfig | 3 ++-
> arch/arm/configs/pxa_defconfig | 3 ++-
> arch/arm/configs/tct_hammer_defconfig | 3 ++-
> arch/arm/configs/xcep_defconfig | 3 ++-
> arch/openrisc/configs/or1ksim_defconfig | 3 ++-
> arch/openrisc/configs/simple_smp_defconfig | 3 ++-
> arch/riscv/configs/nommu_k210_defconfig | 3 ++-
> arch/riscv/configs/nommu_k210_sdcard_defconfig | 3 ++-
> arch/riscv/configs/nommu_virt_defconfig | 3 ++-
> arch/sh/configs/rsk7201_defconfig | 3 ++-
> arch/sh/configs/rsk7203_defconfig | 3 ++-
> arch/sh/configs/se7206_defconfig | 3 ++-
> arch/sh/configs/shmin_defconfig | 3 ++-
> arch/sh/configs/shx3_defconfig | 3 ++-
> kernel/configs/tiny.config | 5 +++--
> mm/Kconfig | 17 +++++++++++++++--
> 19 files changed, 52 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm/configs/clps711x_defconfig b/arch/arm/configs/clps711x_defconfig
> index 92481b2a88fa..adcee238822a 100644
> --- a/arch/arm/configs/clps711x_defconfig
> +++ b/arch/arm/configs/clps711x_defconfig
> @@ -14,7 +14,8 @@ CONFIG_ARCH_EDB7211=y
> CONFIG_ARCH_P720T=y
> CONFIG_AEABI=y
> # CONFIG_COREDUMP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/collie_defconfig b/arch/arm/configs/collie_defconfig
> index 2a2d2cb3ce2e..69341c33e0cc 100644
> --- a/arch/arm/configs/collie_defconfig
> +++ b/arch/arm/configs/collie_defconfig
> @@ -13,7 +13,8 @@ CONFIG_CMDLINE="noinitrd root=/dev/mtdblock2 rootfstype=jffs2 fbcon=rotate:1"
> CONFIG_FPE_NWFPE=y
> CONFIG_PM=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/multi_v4t_defconfig b/arch/arm/configs/multi_v4t_defconfig
> index e2fd822f741a..b60000a89aff 100644
> --- a/arch/arm/configs/multi_v4t_defconfig
> +++ b/arch/arm/configs/multi_v4t_defconfig
> @@ -25,7 +25,8 @@ CONFIG_ARM_CLPS711X_CPUIDLE=y
> CONFIG_JUMP_LABEL=y
> CONFIG_PARTITION_ADVANCED=y
> # CONFIG_COREDUMP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MTD=y
> CONFIG_MTD_CMDLINE_PARTS=y
> CONFIG_MTD_BLOCK=y
> diff --git a/arch/arm/configs/omap1_defconfig b/arch/arm/configs/omap1_defconfig
> index 70511fe4b3ec..246f1bba7df5 100644
> --- a/arch/arm/configs/omap1_defconfig
> +++ b/arch/arm/configs/omap1_defconfig
> @@ -42,7 +42,8 @@ CONFIG_MODULE_FORCE_UNLOAD=y
> CONFIG_PARTITION_ADVANCED=y
> CONFIG_BINFMT_MISC=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> CONFIG_NET=y
> CONFIG_PACKET=y
> diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig
> index d60cc9cc4c21..0a0f12df40b5 100644
> --- a/arch/arm/configs/pxa_defconfig
> +++ b/arch/arm/configs/pxa_defconfig
> @@ -49,7 +49,8 @@ CONFIG_PARTITION_ADVANCED=y
> CONFIG_LDM_PARTITION=y
> CONFIG_CMDLINE_PARTITION=y
> CONFIG_BINFMT_MISC=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_COMPACTION is not set
> CONFIG_NET=y
> CONFIG_PACKET=y
> diff --git a/arch/arm/configs/tct_hammer_defconfig b/arch/arm/configs/tct_hammer_defconfig
> index 3b29ae1fb750..6bd38b6f22c4 100644
> --- a/arch/arm/configs/tct_hammer_defconfig
> +++ b/arch/arm/configs/tct_hammer_defconfig
> @@ -19,7 +19,8 @@ CONFIG_FPE_NWFPE=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> # CONFIG_SWAP is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> diff --git a/arch/arm/configs/xcep_defconfig b/arch/arm/configs/xcep_defconfig
> index ea59e4b6bfc5..6bd9f71b71fc 100644
> --- a/arch/arm/configs/xcep_defconfig
> +++ b/arch/arm/configs/xcep_defconfig
> @@ -26,7 +26,8 @@ CONFIG_MODULE_UNLOAD=y
> CONFIG_MODVERSIONS=y
> CONFIG_MODULE_SRCVERSION_ALL=y
> # CONFIG_BLOCK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_COMPAT_BRK is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> CONFIG_NET=y
> diff --git a/arch/openrisc/configs/or1ksim_defconfig b/arch/openrisc/configs/or1ksim_defconfig
> index 6e1e004047c7..0116e465238f 100644
> --- a/arch/openrisc/configs/or1ksim_defconfig
> +++ b/arch/openrisc/configs/or1ksim_defconfig
> @@ -10,7 +10,8 @@ CONFIG_EXPERT=y
> # CONFIG_AIO is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MODULES=y
> # CONFIG_BLOCK is not set
> CONFIG_OPENRISC_BUILTIN_DTB="or1ksim"
> diff --git a/arch/openrisc/configs/simple_smp_defconfig b/arch/openrisc/configs/simple_smp_defconfig
> index ff49d868e040..b990cb6c9309 100644
> --- a/arch/openrisc/configs/simple_smp_defconfig
> +++ b/arch/openrisc/configs/simple_smp_defconfig
> @@ -16,7 +16,8 @@ CONFIG_EXPERT=y
> # CONFIG_AIO is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_MODULES=y
> # CONFIG_BLOCK is not set
> CONFIG_OPENRISC_BUILTIN_DTB="simple_smp"
> diff --git a/arch/riscv/configs/nommu_k210_defconfig b/arch/riscv/configs/nommu_k210_defconfig
> index 96fe8def644c..79b3ccd58ff0 100644
> --- a/arch/riscv/configs/nommu_k210_defconfig
> +++ b/arch/riscv/configs/nommu_k210_defconfig
> @@ -25,7 +25,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EMBEDDED=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_CANAAN=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/riscv/configs/nommu_k210_sdcard_defconfig b/arch/riscv/configs/nommu_k210_sdcard_defconfig
> index 379740654373..6b80bb13b8ed 100644
> --- a/arch/riscv/configs/nommu_k210_sdcard_defconfig
> +++ b/arch/riscv/configs/nommu_k210_sdcard_defconfig
> @@ -17,7 +17,8 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EMBEDDED=y
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_CANAAN=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
> index 1a56eda5ce46..4cf0f297091e 100644
> --- a/arch/riscv/configs/nommu_virt_defconfig
> +++ b/arch/riscv/configs/nommu_virt_defconfig
> @@ -22,7 +22,8 @@ CONFIG_EXPERT=y
> # CONFIG_KALLSYMS is not set
> # CONFIG_VM_EVENT_COUNTERS is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_MMU is not set
> CONFIG_SOC_VIRT=y
> CONFIG_NONPORTABLE=y
> diff --git a/arch/sh/configs/rsk7201_defconfig b/arch/sh/configs/rsk7201_defconfig
> index 619c18699459..376e95fa77bc 100644
> --- a/arch/sh/configs/rsk7201_defconfig
> +++ b/arch/sh/configs/rsk7201_defconfig
> @@ -10,7 +10,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_BLK_DEV_INITRD=y
> # CONFIG_AIO is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> # CONFIG_BLK_DEV_BSG is not set
> diff --git a/arch/sh/configs/rsk7203_defconfig b/arch/sh/configs/rsk7203_defconfig
> index d00fafc021e1..1d5fd67a3949 100644
> --- a/arch/sh/configs/rsk7203_defconfig
> +++ b/arch/sh/configs/rsk7203_defconfig
> @@ -11,7 +11,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_KALLSYMS_ALL=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> # CONFIG_BLK_DEV_BSG is not set
> diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
> index 122216123e63..78e0e7be57ee 100644
> --- a/arch/sh/configs/se7206_defconfig
> +++ b/arch/sh/configs/se7206_defconfig
> @@ -21,7 +21,8 @@ CONFIG_BLK_DEV_INITRD=y
> CONFIG_KALLSYMS_ALL=y
> # CONFIG_ELF_CORE is not set
> # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> diff --git a/arch/sh/configs/shmin_defconfig b/arch/sh/configs/shmin_defconfig
> index c0b6f40d01cc..e078b193a78a 100644
> --- a/arch/sh/configs/shmin_defconfig
> +++ b/arch/sh/configs/shmin_defconfig
> @@ -9,7 +9,8 @@ CONFIG_LOG_BUF_SHIFT=14
> # CONFIG_FUTEX is not set
> # CONFIG_EPOLL is not set
> # CONFIG_SHMEM is not set
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> # CONFIG_BLK_DEV_BSG is not set
> CONFIG_CPU_SUBTYPE_SH7706=y
> CONFIG_MEMORY_START=0x0c000000
> diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
> index 32ec6eb1eabc..aa353dff7f19 100644
> --- a/arch/sh/configs/shx3_defconfig
> +++ b/arch/sh/configs/shx3_defconfig
> @@ -20,7 +20,8 @@ CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
> CONFIG_KALLSYMS_ALL=y
> -CONFIG_SLOB=y
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> CONFIG_PROFILING=y
> CONFIG_KPROBES=y
> CONFIG_MODULES=y
> diff --git a/kernel/configs/tiny.config b/kernel/configs/tiny.config
> index 8a44b93da0f3..c2f9c912df1c 100644
> --- a/kernel/configs/tiny.config
> +++ b/kernel/configs/tiny.config
> @@ -7,5 +7,6 @@ CONFIG_KERNEL_XZ=y
> # CONFIG_KERNEL_LZO is not set
> # CONFIG_KERNEL_LZ4 is not set
> # CONFIG_SLAB is not set
> -# CONFIG_SLUB is not set
> -CONFIG_SLOB=y
> +# CONFIG_SLOB_DEPRECATED is not set
> +CONFIG_SLUB=y
> +CONFIG_SLUB_TINY=y
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 5941cb34e30d..dcc49c69552f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -219,17 +219,30 @@ config SLUB
> and has enhanced diagnostics. SLUB is the default choice for
> a slab allocator.
>
> -config SLOB
> +config SLOB_DEPRECATED
> depends on EXPERT
> - bool "SLOB (Simple Allocator)"
> + bool "SLOB (Simple Allocator - DEPRECATED)"
> depends on !PREEMPT_RT
> help
> + Deprecated and scheduled for removal in a few cycles. SLUB
> + recommended as replacement. CONFIG_SLUB_TINY can be considered
> + on systems with 16MB or less RAM.
> +
> + If you need SLOB to stay, please contact [email protected] and
> + people listed in the SLAB ALLOCATOR section of MAINTAINERS file,
> + with your use case.
> +
> SLOB replaces the stock allocator with a drastically simpler
> allocator. SLOB is generally more space efficient but
> does not perform as well on large systems.
>
> endchoice
>
> +config SLOB
> + bool
> + default y
> + depends on SLOB_DEPRECATED
> +
> config SLUB_TINY
> bool "Configure SLUB for minimal memory footprint"
> depends on SLUB && EXPERT
> --
> 2.38.1
FTR,
Acked-by: Hyeonggon Yoo <[email protected]>
--
Thanks,
Hyeonggon
On Tue, Dec 13, 2022 at 11:04:33AM +0800, Baoquan He wrote:
> On 12/12/22 at 05:11am, Dennis Zhou wrote:
> > Hello,
> >
> > On Mon, Dec 12, 2022 at 11:54:28AM +0100, Vlastimil Babka wrote:
> > > On 11/27/22 12:05, Hyeonggon Yoo wrote:
> > > > On Mon, Nov 21, 2022 at 06:12:00PM +0100, Vlastimil Babka wrote:
> > > >> SLUB gets most of its scalability by percpu slabs. However for
> > > >> CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
> > > >> Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
> > > >> associated code. Additionally to the slab page savings, this reduces
> > > >> percpu allocator usage, and code size.
> > > >
> > > > [+Cc Dennis]
> > >
> > > +To: Baoquan also.
>
> Thanks for adding me.
>
> > >
> > > > Wondering if we can reduce (or zero) early reservation of percpu area
> > > > when #if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_TINY)?
> > >
> > > Good point. I've sent a PR as it was [1], but (if merged) we can still
> > > improve that during RC series, if it means more memory saved thanks to less
> > > percpu usage with CONFIG_SLUB_TINY.
> > >
> > > [1]
> > > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/tag/?h=slab-for-6.2-rc1
> >
> > The early reservation area not used at boot is then used to serve normal
> > percpu allocations. Percpu allocates additional chunks based on a free
> > page float count and is backed page by page, not all at once. I get
> > slabs is the main motivator of early reservation, but if there are other
> > users of percpu, then shrinking the early reservation area is a bit
> > moot.
>
> Agree. Before kmem_cache_init() is done, anyone calling alloc_percpu()
> can only get allocation done from early reservatoin of percpu area.
> So, unless we can make sure nobody need to call alloc_percpu() before
> kmem_cache_init() now and future.
Thank you both for explaination.
just googled and found random /proc/meminfo output of K210 board (6MB RAM, dual-core)
Given that even K210 board uses around 100kB of percpu area,
might not be worth thing to do :(
https://gist.github.com/pdp7/0fd86d39e07ad7084f430c85a7a567f4?permalink_comment_id=3179983#gistcomment-3179983
> The only drawback of early reservation is it's not so flexible. We can
> only dynamically create chunk to increase percpu areas when early
> reservation is run out, but can't shrink early reservation if system
> doesn't need that much.
>
> So we may need weigh the two ideas:
> - Not allowing to alloc_percpu() before kmem_cache_init();
> - Keep early reservation, and think of a economic value for
> CONFIG_SLUB_TINY.
>
> start_kernel()
> ->setup_per_cpu_areas();
> ......
> ->mm_init();
> ......
> -->kmem_cache_init();
>
>
> __alloc_percpu()
> -->pcpu_alloc()
> --> succeed to allocate from early reservation
> or
> -->pcpu_create_chunk()
> -->pcpu_alloc_chunk()
> -->pcpu_mem_zalloc()
>
--
Thanks,
Hyeonggon
On Mon, Nov 21, 2022 at 06:12:00PM +0100, Vlastimil Babka wrote:
> SLUB gets most of its scalability by percpu slabs. However for
> CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
> Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
> associated code. Additionally to the slab page savings, this reduces
> percpu allocator usage, and code size.
>
> This change builds on recent commit c7323a5ad078 ("mm/slub: restrict
> sysfs validation to debug caches and make it safe"), as caches with
> enabled debugging also avoid percpu slabs and all allocations and
> freeing ends up working with the partial list. With a bit more
> refactoring by the preceding patches, use the same code paths with
> CONFIG_SLUB_TINY.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/slub_def.h | 4 ++
> mm/slub.c | 102 +++++++++++++++++++++++++++++++++++++--
> 2 files changed, 103 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index c186f25c8148..79df64eb054e 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -41,6 +41,7 @@ enum stat_item {
> CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */
> NR_SLUB_STAT_ITEMS };
>
> +#ifndef CONFIG_SLUB_TINY
> /*
> * When changing the layout, make sure freelist and tid are still compatible
> * with this_cpu_cmpxchg_double() alignment requirements.
> @@ -57,6 +58,7 @@ struct kmem_cache_cpu {
> unsigned stat[NR_SLUB_STAT_ITEMS];
> #endif
> };
> +#endif /* CONFIG_SLUB_TINY */
>
> #ifdef CONFIG_SLUB_CPU_PARTIAL
> #define slub_percpu_partial(c) ((c)->partial)
> @@ -88,7 +90,9 @@ struct kmem_cache_order_objects {
> * Slab cache management.
> */
> struct kmem_cache {
> +#ifndef CONFIG_SLUB_TINY
> struct kmem_cache_cpu __percpu *cpu_slab;
> +#endif
> /* Used for retrieving partial slabs, etc. */
> slab_flags_t flags;
> unsigned long min_partial;
> diff --git a/mm/slub.c b/mm/slub.c
> index 5677db3f6d15..7f1cd702c3b4 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -337,10 +337,12 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si)
> */
> static nodemask_t slab_nodes;
>
> +#ifndef CONFIG_SLUB_TINY
> /*
> * Workqueue used for flush_cpu_slab().
> */
> static struct workqueue_struct *flushwq;
> +#endif
>
> /********************************************************************
> * Core slab cache functions
> @@ -386,10 +388,12 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)
> return freelist_dereference(s, object + s->offset);
> }
>
> +#ifndef CONFIG_SLUB_TINY
> static void prefetch_freepointer(const struct kmem_cache *s, void *object)
> {
> prefetchw(object + s->offset);
> }
> +#endif
>
> /*
> * When running under KMSAN, get_freepointer_safe() may return an uninitialized
> @@ -1681,11 +1685,13 @@ static inline void inc_slabs_node(struct kmem_cache *s, int node,
> static inline void dec_slabs_node(struct kmem_cache *s, int node,
> int objects) {}
>
> +#ifndef CONFIG_SLUB_TINY
> static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
> void **freelist, void *nextfree)
> {
> return false;
> }
> +#endif
> #endif /* CONFIG_SLUB_DEBUG */
>
> /*
> @@ -2219,7 +2225,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
> if (!pfmemalloc_match(slab, pc->flags))
> continue;
>
> - if (kmem_cache_debug(s)) {
> + if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
> object = alloc_single_from_partial(s, n, slab,
> pc->orig_size);
> if (object)
> @@ -2334,6 +2340,8 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context
> return get_any_partial(s, pc);
> }
>
> +#ifndef CONFIG_SLUB_TINY
> +
> #ifdef CONFIG_PREEMPTION
> /*
> * Calculate the next globally unique transaction for disambiguation
> @@ -2347,7 +2355,7 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context
> * different cpus.
> */
> #define TID_STEP 1
> -#endif
> +#endif /* CONFIG_PREEMPTION */
>
> static inline unsigned long next_tid(unsigned long tid)
> {
> @@ -2808,6 +2816,13 @@ static int slub_cpu_dead(unsigned int cpu)
> return 0;
> }
>
> +#else /* CONFIG_SLUB_TINY */
> +static inline void flush_all_cpus_locked(struct kmem_cache *s) { }
> +static inline void flush_all(struct kmem_cache *s) { }
> +static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu) { }
> +static inline int slub_cpu_dead(unsigned int cpu) { return 0; }
> +#endif /* CONFIG_SLUB_TINY */
> +
> /*
> * Check if the objects in a per cpu structure fit numa
> * locality expectations.
> @@ -2955,6 +2970,7 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags)
> return true;
> }
>
> +#ifndef CONFIG_SLUB_TINY
> /*
> * Check the slab->freelist and either transfer the freelist to the
> * per cpu freelist or deactivate the slab.
> @@ -3320,6 +3336,33 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s,
>
> return object;
> }
> +#else /* CONFIG_SLUB_TINY */
> +static void *__slab_alloc_node(struct kmem_cache *s,
> + gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
> +{
> + struct partial_context pc;
> + struct slab *slab;
> + void *object;
> +
> + pc.flags = gfpflags;
> + pc.slab = &slab;
> + pc.orig_size = orig_size;
> + object = get_partial(s, node, &pc);
> +
> + if (object)
> + return object;
> +
> + slab = new_slab(s, gfpflags, node);
> + if (unlikely(!slab)) {
> + slab_out_of_memory(s, gfpflags, node);
> + return NULL;
> + }
> +
> + object = alloc_single_from_new_slab(s, slab, orig_size);
> +
> + return object;
> +}
> +#endif /* CONFIG_SLUB_TINY */
>
> /*
> * If the object has been wiped upon free, make sure it's fully initialized by
> @@ -3503,7 +3546,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
> if (kfence_free(head))
> return;
>
> - if (kmem_cache_debug(s)) {
> + if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
> free_to_partial_list(s, slab, head, tail, cnt, addr);
> return;
> }
> @@ -3604,6 +3647,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
> discard_slab(s, slab);
> }
>
> +#ifndef CONFIG_SLUB_TINY
> /*
> * Fastpath with forced inlining to produce a kfree and kmem_cache_free that
> * can perform fastpath freeing without additional function calls.
> @@ -3678,6 +3722,16 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
> }
> stat(s, FREE_FASTPATH);
> }
> +#else /* CONFIG_SLUB_TINY */
> +static void do_slab_free(struct kmem_cache *s,
> + struct slab *slab, void *head, void *tail,
> + int cnt, unsigned long addr)
> +{
> + void *tail_obj = tail ? : head;
> +
> + __slab_free(s, slab, head, tail_obj, cnt, addr);
> +}
> +#endif /* CONFIG_SLUB_TINY */
>
> static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab,
> void *head, void *tail, void **p, int cnt,
> @@ -3812,6 +3866,7 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> }
> EXPORT_SYMBOL(kmem_cache_free_bulk);
>
> +#ifndef CONFIG_SLUB_TINY
> static inline int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
> size_t size, void **p, struct obj_cgroup *objcg)
> {
> @@ -3880,6 +3935,36 @@ static inline int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
> return 0;
>
> }
> +#else /* CONFIG_SLUB_TINY */
> +static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
> + size_t size, void **p, struct obj_cgroup *objcg)
> +{
> + int i;
> +
> + for (i = 0; i < size; i++) {
> + void *object = kfence_alloc(s, s->object_size, flags);
> +
> + if (unlikely(object)) {
> + p[i] = object;
> + continue;
> + }
> +
> + p[i] = __slab_alloc_node(s, flags, NUMA_NO_NODE,
> + _RET_IP_, s->object_size);
> + if (unlikely(!p[i]))
> + goto error;
> +
> + maybe_wipe_obj_freeptr(s, p[i]);
> + }
> +
> + return i;
> +
> +error:
> + slab_post_alloc_hook(s, objcg, flags, i, p, false);
> + kmem_cache_free_bulk(s, i, p);
> + return 0;
> +}
> +#endif /* CONFIG_SLUB_TINY */
>
> /* Note that interrupts must be enabled when calling this function. */
> int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> @@ -4059,6 +4144,7 @@ init_kmem_cache_node(struct kmem_cache_node *n)
> #endif
> }
>
> +#ifndef CONFIG_SLUB_TINY
> static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
> {
> BUILD_BUG_ON(PERCPU_DYNAMIC_EARLY_SIZE <
> @@ -4078,6 +4164,12 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
>
> return 1;
> }
> +#else
> +static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
> +{
> + return 1;
> +}
> +#endif /* CONFIG_SLUB_TINY */
>
> static struct kmem_cache *kmem_cache_node;
>
> @@ -4140,7 +4232,9 @@ static void free_kmem_cache_nodes(struct kmem_cache *s)
> void __kmem_cache_release(struct kmem_cache *s)
> {
> cache_random_seq_destroy(s);
> +#ifndef CONFIG_SLUB_TINY
> free_percpu(s->cpu_slab);
> +#endif
> free_kmem_cache_nodes(s);
> }
>
> @@ -4917,8 +5011,10 @@ void __init kmem_cache_init(void)
>
> void __init kmem_cache_init_late(void)
> {
> +#ifndef CONFIG_SLUB_TINY
> flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0);
> WARN_ON(!flushwq);
> +#endif
> }
>
> struct kmem_cache *
> --
> 2.38.1
>
For the record:
Looks good to me.
Reviewed-by: Hyeonggon Yoo <[email protected]>
--
Thanks,
Hyeonggon