2023-11-20 18:37:05

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 00/21] remove the SLAB allocator

Changes from v1:
- Added new Patch 01 to fix up kernel docs build (thanks Marco Elver)
- Additional changes to Kconfig user visible texts in Patch 02 (thanks Kees
Cook)
- Whitespace fixes and other fixups (thanks Kees)

The SLAB allocator has been deprecated since 6.5 and nobody has objected
so far. As we agreed at LSF/MM, we should wait with the removal until
the next LTS kernel is released. This is now determined to be 6.6, and
we just missed 6.7, so now we can aim for 6.8 and start exposing the
removal to linux-next during the 6.7 cycle. If nothing substantial pops
up, will start including this in slab-next later this week.

To keep the series reasonably sized and not pull in people from other
subsystems than mm and closely related ones, I didn't attempt to remove
every trace of unnecessary reference to dead config options in external
areas, nor in the defconfigs. Such cleanups can be sent to and handled
by respective maintainers after this is merged.

Instead I have added some patches aimed to reap some immediate benefits
of the removal, mainly by not having to split some fastpath code between
slab_common.c and slub.c anymore. But that is also not an exhaustive
effort and I expect more cleanups and optimizations will follow later.

Patch 09 updates CREDITS for the removed mm/slab.c. Please point out if
I missed someone not yet credited.

Git version: https://git.kernel.org/vbabka/l/slab-remove-slab-v2r1

---
Vlastimil Babka (21):
mm/slab, docs: switch mm-api docs generation from slab.c to slub.c
mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile
KASAN: remove code paths guarded by CONFIG_SLAB
KFENCE: cleanup kfence_guarded_alloc() after CONFIG_SLAB removal
mm/memcontrol: remove CONFIG_SLAB #ifdef guards
cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks
mm/slab: remove CONFIG_SLAB code from slab common code
mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs
mm/slab: remove mm/slab.c and slab_def.h
mm/slab: move struct kmem_cache_cpu declaration to slub.c
mm/slab: move the rest of slub_def.h to mm/slab.h
mm/slab: consolidate includes in the internal mm/slab.h
mm/slab: move pre/post-alloc hooks from slab.h to slub.c
mm/slab: move memcg related functions from slab.h to slub.c
mm/slab: move struct kmem_cache_node from slab.h to slub.c
mm/slab: move kfree() from slab_common.c to slub.c
mm/slab: move kmalloc_slab() to mm/slab.h
mm/slab: move kmalloc() functions from slab_common.c to slub.c
mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers
mm/slub: optimize alloc fastpath code layout
mm/slub: optimize free fast path code layout

CREDITS | 12 +-
Documentation/core-api/mm-api.rst | 2 +-
arch/arm64/Kconfig | 2 +-
arch/s390/Kconfig | 2 +-
arch/x86/Kconfig | 2 +-
include/linux/cpuhotplug.h | 1 -
include/linux/slab.h | 22 +-
include/linux/slab_def.h | 124 --
include/linux/slub_def.h | 204 --
kernel/cpu.c | 5 -
lib/Kconfig.debug | 1 -
lib/Kconfig.kasan | 11 +-
lib/Kconfig.kfence | 2 +-
lib/Kconfig.kmsan | 2 +-
mm/Kconfig | 68 +-
mm/Kconfig.debug | 16 +-
mm/Makefile | 6 +-
mm/dmapool.c | 2 +-
mm/kasan/common.c | 13 +-
mm/kasan/kasan.h | 3 +-
mm/kasan/quarantine.c | 7 -
mm/kasan/report.c | 1 +
mm/kfence/core.c | 4 -
mm/memcontrol.c | 6 +-
mm/mempool.c | 6 +-
mm/slab.c | 4026 -------------------------------------
mm/slab.h | 551 ++---
mm/slab_common.c | 231 +--
mm/slub.c | 617 +++++-
29 files changed, 815 insertions(+), 5134 deletions(-)
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231120-slab-remove-slab-a76ec668d8c6

Best regards,
--
Vlastimil Babka <[email protected]>


2023-11-20 18:37:07

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 05/21] mm/memcontrol: remove CONFIG_SLAB #ifdef guards

With SLAB removed, these are never true anymore so we can clean up.

Reviewed-by: Kees Cook <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/memcontrol.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 774bd6e21e27..947fb50eba31 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5149,7 +5149,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
return ret;
}

-#if defined(CONFIG_MEMCG_KMEM) && (defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG))
+#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG)
static int mem_cgroup_slab_show(struct seq_file *m, void *p)
{
/*
@@ -5258,8 +5258,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
.write = mem_cgroup_reset,
.read_u64 = mem_cgroup_read_u64,
},
-#if defined(CONFIG_MEMCG_KMEM) && \
- (defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG))
+#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG)
{
.name = "kmem.slabinfo",
.seq_show = mem_cgroup_slab_show,

--
2.42.1

2023-11-20 18:37:12

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 19/21] mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers

slab_alloc() is a thin wrapper around slab_alloc_node() with only one
caller. Replace with direct call of slab_alloc_node().
__kmem_cache_alloc_lru() itself is a thin wrapper with two callers,
so replace it with direct calls of slab_alloc_node() and
trace_kmem_cache_alloc().

This also makes sure _RET_IP_ has always the expected value and not
depending on inlining decisions.

Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 25 +++++++++----------------
1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d6bc15929d22..5683f1d02e4f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3821,33 +3821,26 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
return object;
}

-static __fastpath_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
- gfp_t gfpflags, unsigned long addr, size_t orig_size)
-{
- return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size);
-}
-
-static __fastpath_inline
-void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
- gfp_t gfpflags)
+void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
{
- void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
+ void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE, _RET_IP_,
+ s->object_size);

trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);

return ret;
}
-
-void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
-{
- return __kmem_cache_alloc_lru(s, NULL, gfpflags);
-}
EXPORT_SYMBOL(kmem_cache_alloc);

void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags)
{
- return __kmem_cache_alloc_lru(s, lru, gfpflags);
+ void *ret = slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, _RET_IP_,
+ s->object_size);
+
+ trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
+
+ return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc_lru);


--
2.42.1

2023-11-20 18:37:15

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 14/21] mm/slab: move memcg related functions from slab.h to slub.c

We don't share those between SLAB and SLUB anymore, so most memcg
related functions can be moved to slub.c proper.

Reviewed-by: Kees Cook <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slab.h | 206 --------------------------------------------------------------
mm/slub.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 205 insertions(+), 206 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 65ebf86b3fe9..a81ef7c9282d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -486,12 +486,6 @@ void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s);
ssize_t slabinfo_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos);

-static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
-{
- return (s->flags & SLAB_RECLAIM_ACCOUNT) ?
- NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B;
-}
-
#ifdef CONFIG_SLUB_DEBUG
#ifdef CONFIG_SLUB_DEBUG_ON
DECLARE_STATIC_KEY_TRUE(slub_debug_enabled);
@@ -551,220 +545,20 @@ int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
gfp_t gfp, bool new_slab);
void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
enum node_stat_item idx, int nr);
-
-static inline void memcg_free_slab_cgroups(struct slab *slab)
-{
- kfree(slab_objcgs(slab));
- slab->memcg_data = 0;
-}
-
-static inline size_t obj_full_size(struct kmem_cache *s)
-{
- /*
- * For each accounted object there is an extra space which is used
- * to store obj_cgroup membership. Charge it too.
- */
- return s->size + sizeof(struct obj_cgroup *);
-}
-
-/*
- * Returns false if the allocation should fail.
- */
-static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
- struct list_lru *lru,
- struct obj_cgroup **objcgp,
- size_t objects, gfp_t flags)
-{
- struct obj_cgroup *objcg;
-
- if (!memcg_kmem_online())
- return true;
-
- if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
- return true;
-
- /*
- * The obtained objcg pointer is safe to use within the current scope,
- * defined by current task or set_active_memcg() pair.
- * obj_cgroup_get() is used to get a permanent reference.
- */
- objcg = current_obj_cgroup();
- if (!objcg)
- return true;
-
- if (lru) {
- int ret;
- struct mem_cgroup *memcg;
-
- memcg = get_mem_cgroup_from_objcg(objcg);
- ret = memcg_list_lru_alloc(memcg, lru, flags);
- css_put(&memcg->css);
-
- if (ret)
- return false;
- }
-
- if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s)))
- return false;
-
- *objcgp = objcg;
- return true;
-}
-
-static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
- struct obj_cgroup *objcg,
- gfp_t flags, size_t size,
- void **p)
-{
- struct slab *slab;
- unsigned long off;
- size_t i;
-
- if (!memcg_kmem_online() || !objcg)
- return;
-
- for (i = 0; i < size; i++) {
- if (likely(p[i])) {
- slab = virt_to_slab(p[i]);
-
- if (!slab_objcgs(slab) &&
- memcg_alloc_slab_cgroups(slab, s, flags,
- false)) {
- obj_cgroup_uncharge(objcg, obj_full_size(s));
- continue;
- }
-
- off = obj_to_index(s, slab, p[i]);
- obj_cgroup_get(objcg);
- slab_objcgs(slab)[off] = objcg;
- mod_objcg_state(objcg, slab_pgdat(slab),
- cache_vmstat_idx(s), obj_full_size(s));
- } else {
- obj_cgroup_uncharge(objcg, obj_full_size(s));
- }
- }
-}
-
-static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
- void **p, int objects)
-{
- struct obj_cgroup **objcgs;
- int i;
-
- if (!memcg_kmem_online())
- return;
-
- objcgs = slab_objcgs(slab);
- if (!objcgs)
- return;
-
- for (i = 0; i < objects; i++) {
- struct obj_cgroup *objcg;
- unsigned int off;
-
- off = obj_to_index(s, slab, p[i]);
- objcg = objcgs[off];
- if (!objcg)
- continue;
-
- objcgs[off] = NULL;
- obj_cgroup_uncharge(objcg, obj_full_size(s));
- mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
- -obj_full_size(s));
- obj_cgroup_put(objcg);
- }
-}
-
#else /* CONFIG_MEMCG_KMEM */
static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
{
return NULL;
}

-static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
-{
- return NULL;
-}
-
static inline int memcg_alloc_slab_cgroups(struct slab *slab,
struct kmem_cache *s, gfp_t gfp,
bool new_slab)
{
return 0;
}
-
-static inline void memcg_free_slab_cgroups(struct slab *slab)
-{
-}
-
-static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
- struct list_lru *lru,
- struct obj_cgroup **objcgp,
- size_t objects, gfp_t flags)
-{
- return true;
-}
-
-static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
- struct obj_cgroup *objcg,
- gfp_t flags, size_t size,
- void **p)
-{
-}
-
-static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
- void **p, int objects)
-{
-}
#endif /* CONFIG_MEMCG_KMEM */

-static inline struct kmem_cache *virt_to_cache(const void *obj)
-{
- struct slab *slab;
-
- slab = virt_to_slab(obj);
- if (WARN_ONCE(!slab, "%s: Object is not a Slab page!\n",
- __func__))
- return NULL;
- return slab->slab_cache;
-}
-
-static __always_inline void account_slab(struct slab *slab, int order,
- struct kmem_cache *s, gfp_t gfp)
-{
- if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
- memcg_alloc_slab_cgroups(slab, s, gfp, true);
-
- mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
- PAGE_SIZE << order);
-}
-
-static __always_inline void unaccount_slab(struct slab *slab, int order,
- struct kmem_cache *s)
-{
- if (memcg_kmem_online())
- memcg_free_slab_cgroups(slab);
-
- mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
- -(PAGE_SIZE << order));
-}
-
-static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
-{
- struct kmem_cache *cachep;
-
- if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) &&
- !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS))
- return s;
-
- cachep = virt_to_cache(x);
- if (WARN(cachep && cachep != s,
- "%s: Wrong slab cache. %s but object is from %s\n",
- __func__, s->name, cachep->name))
- print_tracking(cachep, x);
- return cachep;
-}
-
void free_large_kmalloc(struct folio *folio, void *object);

size_t __ksize(const void *objp);
diff --git a/mm/slub.c b/mm/slub.c
index 9eb6508152c2..844e0beb84ee 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1814,6 +1814,165 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
#endif
#endif /* CONFIG_SLUB_DEBUG */

+static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
+{
+ return (s->flags & SLAB_RECLAIM_ACCOUNT) ?
+ NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B;
+}
+
+#ifdef CONFIG_MEMCG_KMEM
+static inline void memcg_free_slab_cgroups(struct slab *slab)
+{
+ kfree(slab_objcgs(slab));
+ slab->memcg_data = 0;
+}
+
+static inline size_t obj_full_size(struct kmem_cache *s)
+{
+ /*
+ * For each accounted object there is an extra space which is used
+ * to store obj_cgroup membership. Charge it too.
+ */
+ return s->size + sizeof(struct obj_cgroup *);
+}
+
+/*
+ * Returns false if the allocation should fail.
+ */
+static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
+ struct obj_cgroup **objcgp,
+ size_t objects, gfp_t flags)
+{
+ struct obj_cgroup *objcg;
+
+ if (!memcg_kmem_online())
+ return true;
+
+ if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
+ return true;
+
+ /*
+ * The obtained objcg pointer is safe to use within the current scope,
+ * defined by current task or set_active_memcg() pair.
+ * obj_cgroup_get() is used to get a permanent reference.
+ */
+ objcg = current_obj_cgroup();
+ if (!objcg)
+ return true;
+
+ if (lru) {
+ int ret;
+ struct mem_cgroup *memcg;
+
+ memcg = get_mem_cgroup_from_objcg(objcg);
+ ret = memcg_list_lru_alloc(memcg, lru, flags);
+ css_put(&memcg->css);
+
+ if (ret)
+ return false;
+ }
+
+ if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s)))
+ return false;
+
+ *objcgp = objcg;
+ return true;
+}
+
+static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
+ struct obj_cgroup *objcg,
+ gfp_t flags, size_t size,
+ void **p)
+{
+ struct slab *slab;
+ unsigned long off;
+ size_t i;
+
+ if (!memcg_kmem_online() || !objcg)
+ return;
+
+ for (i = 0; i < size; i++) {
+ if (likely(p[i])) {
+ slab = virt_to_slab(p[i]);
+
+ if (!slab_objcgs(slab) &&
+ memcg_alloc_slab_cgroups(slab, s, flags, false)) {
+ obj_cgroup_uncharge(objcg, obj_full_size(s));
+ continue;
+ }
+
+ off = obj_to_index(s, slab, p[i]);
+ obj_cgroup_get(objcg);
+ slab_objcgs(slab)[off] = objcg;
+ mod_objcg_state(objcg, slab_pgdat(slab),
+ cache_vmstat_idx(s), obj_full_size(s));
+ } else {
+ obj_cgroup_uncharge(objcg, obj_full_size(s));
+ }
+ }
+}
+
+static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+ void **p, int objects)
+{
+ struct obj_cgroup **objcgs;
+ int i;
+
+ if (!memcg_kmem_online())
+ return;
+
+ objcgs = slab_objcgs(slab);
+ if (!objcgs)
+ return;
+
+ for (i = 0; i < objects; i++) {
+ struct obj_cgroup *objcg;
+ unsigned int off;
+
+ off = obj_to_index(s, slab, p[i]);
+ objcg = objcgs[off];
+ if (!objcg)
+ continue;
+
+ objcgs[off] = NULL;
+ obj_cgroup_uncharge(objcg, obj_full_size(s));
+ mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
+ -obj_full_size(s));
+ obj_cgroup_put(objcg);
+ }
+}
+#else /* CONFIG_MEMCG_KMEM */
+static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
+{
+ return NULL;
+}
+
+static inline void memcg_free_slab_cgroups(struct slab *slab)
+{
+}
+
+static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
+ struct obj_cgroup **objcgp,
+ size_t objects, gfp_t flags)
+{
+ return true;
+}
+
+static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
+ struct obj_cgroup *objcg,
+ gfp_t flags, size_t size,
+ void **p)
+{
+}
+
+static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+ void **p, int objects)
+{
+}
+#endif /* CONFIG_MEMCG_KMEM */
+
/*
* Hooks for other subsystems that check memory allocations. In a typical
* production configuration these hooks all should produce no code at all.
@@ -2048,6 +2207,26 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
}
#endif /* CONFIG_SLAB_FREELIST_RANDOM */

+static __always_inline void account_slab(struct slab *slab, int order,
+ struct kmem_cache *s, gfp_t gfp)
+{
+ if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
+ memcg_alloc_slab_cgroups(slab, s, gfp, true);
+
+ mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
+ PAGE_SIZE << order);
+}
+
+static __always_inline void unaccount_slab(struct slab *slab, int order,
+ struct kmem_cache *s)
+{
+ if (memcg_kmem_online())
+ memcg_free_slab_cgroups(slab);
+
+ mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
+ -(PAGE_SIZE << order));
+}
+
static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
{
struct slab *slab;
@@ -3965,6 +4144,32 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
}
#endif

+static inline struct kmem_cache *virt_to_cache(const void *obj)
+{
+ struct slab *slab;
+
+ slab = virt_to_slab(obj);
+ if (WARN_ONCE(!slab, "%s: Object is not a Slab page!\n", __func__))
+ return NULL;
+ return slab->slab_cache;
+}
+
+static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
+{
+ struct kmem_cache *cachep;
+
+ if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) &&
+ !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS))
+ return s;
+
+ cachep = virt_to_cache(x);
+ if (WARN(cachep && cachep != s,
+ "%s: Wrong slab cache. %s but object is from %s\n",
+ __func__, s->name, cachep->name))
+ print_tracking(cachep, x);
+ return cachep;
+}
+
void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
{
slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);

--
2.42.1

2023-11-20 18:37:20

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 15/21] mm/slab: move struct kmem_cache_node from slab.h to slub.c

The declaration and associated helpers are not used anywhere else
anymore.

Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slab.h | 29 -----------------------------
mm/slub.c | 27 +++++++++++++++++++++++++++
2 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index a81ef7c9282d..5ae6a978e9c2 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -588,35 +588,6 @@ static inline size_t slab_ksize(const struct kmem_cache *s)
return s->size;
}

-
-/*
- * The slab lists for all objects.
- */
-struct kmem_cache_node {
- spinlock_t list_lock;
- unsigned long nr_partial;
- struct list_head partial;
-#ifdef CONFIG_SLUB_DEBUG
- atomic_long_t nr_slabs;
- atomic_long_t total_objects;
- struct list_head full;
-#endif
-};
-
-static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
-{
- return s->node[node];
-}
-
-/*
- * Iterator over all nodes. The body will be executed for each node that has
- * a kmem_cache_node structure allocated (which is true for all online nodes)
- */
-#define for_each_kmem_cache_node(__s, __node, __n) \
- for (__node = 0; __node < nr_node_ids; __node++) \
- if ((__n = get_node(__s, __node)))
-
-
#ifdef CONFIG_SLUB_DEBUG
void dump_unreclaimable_slab(void);
#else
diff --git a/mm/slub.c b/mm/slub.c
index 844e0beb84ee..cc801f8258fe 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -396,6 +396,33 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si)
#endif
}

+/*
+ * The slab lists for all objects.
+ */
+struct kmem_cache_node {
+ spinlock_t list_lock;
+ unsigned long nr_partial;
+ struct list_head partial;
+#ifdef CONFIG_SLUB_DEBUG
+ atomic_long_t nr_slabs;
+ atomic_long_t total_objects;
+ struct list_head full;
+#endif
+};
+
+static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
+{
+ return s->node[node];
+}
+
+/*
+ * Iterator over all nodes. The body will be executed for each node that has
+ * a kmem_cache_node structure allocated (which is true for all online nodes)
+ */
+#define for_each_kmem_cache_node(__s, __node, __n) \
+ for (__node = 0; __node < nr_node_ids; __node++) \
+ if ((__n = get_node(__s, __node)))
+
/*
* Tracks for which NUMA nodes we have kmem_cache_nodes allocated.
* Corresponds to node_state[N_NORMAL_MEMORY], but can temporarily

--
2.42.1

2023-11-20 18:37:22

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 13/21] mm/slab: move pre/post-alloc hooks from slab.h to slub.c

We don't share the hooks between two slab implementations anymore so
they can be moved away from the header. As part of the move, also move
should_failslab() from slab_common.c as the pre_alloc hook uses it.
This means slab.h can stop including fault-inject.h and kmemleak.h.
Fix up some files that were depending on the includes transitively.

Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/kasan/report.c | 1 +
mm/memcontrol.c | 1 +
mm/slab.h | 72 -------------------------------------------------
mm/slab_common.c | 8 +-----
mm/slub.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 84 insertions(+), 79 deletions(-)

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index e77facb62900..011f727bfaff 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -23,6 +23,7 @@
#include <linux/stacktrace.h>
#include <linux/string.h>
#include <linux/types.h>
+#include <linux/vmalloc.h>
#include <linux/kasan.h>
#include <linux/module.h>
#include <linux/sched/task_stack.h>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 947fb50eba31..8a0603517065 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -64,6 +64,7 @@
#include <linux/psi.h>
#include <linux/seq_buf.h>
#include <linux/sched/isolation.h>
+#include <linux/kmemleak.h>
#include "internal.h"
#include <net/sock.h>
#include <net/ip.h>
diff --git a/mm/slab.h b/mm/slab.h
index 1ac3a2f8d4c0..65ebf86b3fe9 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -9,8 +9,6 @@
#include <linux/kobject.h>
#include <linux/sched/mm.h>
#include <linux/memcontrol.h>
-#include <linux/fault-inject.h>
-#include <linux/kmemleak.h>
#include <linux/kfence.h>
#include <linux/kasan.h>

@@ -796,76 +794,6 @@ static inline size_t slab_ksize(const struct kmem_cache *s)
return s->size;
}

-static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
- struct list_lru *lru,
- struct obj_cgroup **objcgp,
- size_t size, gfp_t flags)
-{
- flags &= gfp_allowed_mask;
-
- might_alloc(flags);
-
- if (should_failslab(s, flags))
- return NULL;
-
- if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
- return NULL;
-
- return s;
-}
-
-static inline void slab_post_alloc_hook(struct kmem_cache *s,
- struct obj_cgroup *objcg, gfp_t flags,
- size_t size, void **p, bool init,
- unsigned int orig_size)
-{
- unsigned int zero_size = s->object_size;
- bool kasan_init = init;
- size_t i;
-
- flags &= gfp_allowed_mask;
-
- /*
- * For kmalloc object, the allocated memory size(object_size) is likely
- * larger than the requested size(orig_size). If redzone check is
- * enabled for the extra space, don't zero it, as it will be redzoned
- * soon. The redzone operation for this extra space could be seen as a
- * replacement of current poisoning under certain debug option, and
- * won't break other sanity checks.
- */
- if (kmem_cache_debug_flags(s, SLAB_STORE_USER | SLAB_RED_ZONE) &&
- (s->flags & SLAB_KMALLOC))
- zero_size = orig_size;
-
- /*
- * When slub_debug is enabled, avoid memory initialization integrated
- * into KASAN and instead zero out the memory via the memset below with
- * the proper size. Otherwise, KASAN might overwrite SLUB redzones and
- * cause false-positive reports. This does not lead to a performance
- * penalty on production builds, as slub_debug is not intended to be
- * enabled there.
- */
- if (__slub_debug_enabled())
- kasan_init = false;
-
- /*
- * As memory initialization might be integrated into KASAN,
- * kasan_slab_alloc and initialization memset must be
- * kept together to avoid discrepancies in behavior.
- *
- * As p[i] might get tagged, memset and kmemleak hook come after KASAN.
- */
- for (i = 0; i < size; i++) {
- p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init);
- if (p[i] && init && (!kasan_init || !kasan_has_integrated_init()))
- memset(p[i], 0, zero_size);
- kmemleak_alloc_recursive(p[i], s->object_size, 1,
- s->flags, flags);
- kmsan_slab_alloc(s, p[i], flags);
- }
-
- memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
-}

/*
* The slab lists for all objects.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 63b8411db7ce..bbc2e3f061f1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -21,6 +21,7 @@
#include <linux/swiotlb.h>
#include <linux/proc_fs.h>
#include <linux/debugfs.h>
+#include <linux/kmemleak.h>
#include <linux/kasan.h>
#include <asm/cacheflush.h>
#include <asm/tlbflush.h>
@@ -1470,10 +1471,3 @@ EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
EXPORT_TRACEPOINT_SYMBOL(kfree);
EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);

-int should_failslab(struct kmem_cache *s, gfp_t gfpflags)
-{
- if (__should_failslab(s, gfpflags))
- return -ENOMEM;
- return 0;
-}
-ALLOW_ERROR_INJECTION(should_failslab, ERRNO);
diff --git a/mm/slub.c b/mm/slub.c
index 979932d046fd..9eb6508152c2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -34,6 +34,7 @@
#include <linux/memory.h>
#include <linux/math64.h>
#include <linux/fault-inject.h>
+#include <linux/kmemleak.h>
#include <linux/stacktrace.h>
#include <linux/prefetch.h>
#include <linux/memcontrol.h>
@@ -3494,6 +3495,86 @@ static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s,
0, sizeof(void *));
}

+noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags)
+{
+ if (__should_failslab(s, gfpflags))
+ return -ENOMEM;
+ return 0;
+}
+ALLOW_ERROR_INJECTION(should_failslab, ERRNO);
+
+static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
+ struct obj_cgroup **objcgp,
+ size_t size, gfp_t flags)
+{
+ flags &= gfp_allowed_mask;
+
+ might_alloc(flags);
+
+ if (should_failslab(s, flags))
+ return NULL;
+
+ if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
+ return NULL;
+
+ return s;
+}
+
+static inline void slab_post_alloc_hook(struct kmem_cache *s,
+ struct obj_cgroup *objcg, gfp_t flags,
+ size_t size, void **p, bool init,
+ unsigned int orig_size)
+{
+ unsigned int zero_size = s->object_size;
+ bool kasan_init = init;
+ size_t i;
+
+ flags &= gfp_allowed_mask;
+
+ /*
+ * For kmalloc object, the allocated memory size(object_size) is likely
+ * larger than the requested size(orig_size). If redzone check is
+ * enabled for the extra space, don't zero it, as it will be redzoned
+ * soon. The redzone operation for this extra space could be seen as a
+ * replacement of current poisoning under certain debug option, and
+ * won't break other sanity checks.
+ */
+ if (kmem_cache_debug_flags(s, SLAB_STORE_USER | SLAB_RED_ZONE) &&
+ (s->flags & SLAB_KMALLOC))
+ zero_size = orig_size;
+
+ /*
+ * When slub_debug is enabled, avoid memory initialization integrated
+ * into KASAN and instead zero out the memory via the memset below with
+ * the proper size. Otherwise, KASAN might overwrite SLUB redzones and
+ * cause false-positive reports. This does not lead to a performance
+ * penalty on production builds, as slub_debug is not intended to be
+ * enabled there.
+ */
+ if (__slub_debug_enabled())
+ kasan_init = false;
+
+ /*
+ * As memory initialization might be integrated into KASAN,
+ * kasan_slab_alloc and initialization memset must be
+ * kept together to avoid discrepancies in behavior.
+ *
+ * As p[i] might get tagged, memset and kmemleak hook come after KASAN.
+ */
+ for (i = 0; i < size; i++) {
+ p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init);
+ if (p[i] && init && (!kasan_init ||
+ !kasan_has_integrated_init()))
+ memset(p[i], 0, zero_size);
+ kmemleak_alloc_recursive(p[i], s->object_size, 1,
+ s->flags, flags);
+ kmsan_slab_alloc(s, p[i], flags);
+ }
+
+ memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
+}
+
/*
* Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc)
* have the fastpath folded into their functions. So no function call

--
2.42.1

2023-11-20 18:37:22

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 20/21] mm/slub: optimize alloc fastpath code layout

With allocation fastpaths no longer divided between two .c files, we
have better inlining, however checking the disassembly of
kmem_cache_alloc() reveals we can do better to make the fastpaths
smaller and move the less common situations out of line or to separate
functions, to reduce instruction cache pressure.

- split memcg pre/post alloc hooks to inlined checks that use likely()
to assume there will be no objcg handling necessary, and non-inline
functions doing the actual handling

- add some more likely/unlikely() to pre/post alloc hooks to indicate
which scenarios should be out of line

- change gfp_allowed_mask handling in slab_post_alloc_hook() so the
code can be optimized away when kasan/kmsan/kmemleak is configured out

bloat-o-meter shows:
add/remove: 4/2 grow/shrink: 1/8 up/down: 521/-2924 (-2403)
Function old new delta
__memcg_slab_post_alloc_hook - 461 +461
kmem_cache_alloc_bulk 775 791 +16
__pfx_should_failslab.constprop - 16 +16
__pfx___memcg_slab_post_alloc_hook - 16 +16
should_failslab.constprop - 12 +12
__pfx_memcg_slab_post_alloc_hook 16 - -16
kmem_cache_alloc_lru 1295 1023 -272
kmem_cache_alloc_node 1118 817 -301
kmem_cache_alloc 1076 772 -304
kmalloc_node_trace 1149 838 -311
kmalloc_trace 1102 789 -313
__kmalloc_node_track_caller 1393 1080 -313
__kmalloc_node 1397 1082 -315
__kmalloc 1374 1059 -315
memcg_slab_post_alloc_hook 464 - -464

Note that gcc still decided to inline __memcg_pre_alloc_hook(), but the
code is out of line. Forcing noinline did not improve the results. As a
result the fastpaths are shorter and overal code size is reduced.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 89 ++++++++++++++++++++++++++++++++++++++-------------------------
1 file changed, 54 insertions(+), 35 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 5683f1d02e4f..77d259f3d592 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1866,25 +1866,17 @@ static inline size_t obj_full_size(struct kmem_cache *s)
/*
* Returns false if the allocation should fail.
*/
-static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
- struct list_lru *lru,
- struct obj_cgroup **objcgp,
- size_t objects, gfp_t flags)
+static bool __memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
+ struct obj_cgroup **objcgp,
+ size_t objects, gfp_t flags)
{
- struct obj_cgroup *objcg;
-
- if (!memcg_kmem_online())
- return true;
-
- if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
- return true;
-
/*
* The obtained objcg pointer is safe to use within the current scope,
* defined by current task or set_active_memcg() pair.
* obj_cgroup_get() is used to get a permanent reference.
*/
- objcg = current_obj_cgroup();
+ struct obj_cgroup *objcg = current_obj_cgroup();
if (!objcg)
return true;

@@ -1907,17 +1899,34 @@ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
return true;
}

-static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
- struct obj_cgroup *objcg,
- gfp_t flags, size_t size,
- void **p)
+/*
+ * Returns false if the allocation should fail.
+ */
+static __fastpath_inline
+bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
+ struct obj_cgroup **objcgp, size_t objects,
+ gfp_t flags)
+{
+ if (!memcg_kmem_online())
+ return true;
+
+ if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)))
+ return true;
+
+ return likely(__memcg_slab_pre_alloc_hook(s, lru, objcgp, objects,
+ flags));
+}
+
+static void __memcg_slab_post_alloc_hook(struct kmem_cache *s,
+ struct obj_cgroup *objcg,
+ gfp_t flags, size_t size,
+ void **p)
{
struct slab *slab;
unsigned long off;
size_t i;

- if (!memcg_kmem_online() || !objcg)
- return;
+ flags &= gfp_allowed_mask;

for (i = 0; i < size; i++) {
if (likely(p[i])) {
@@ -1940,6 +1949,16 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
}
}

+static __fastpath_inline
+void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
+ gfp_t flags, size_t size, void **p)
+{
+ if (likely(!memcg_kmem_online() || !objcg))
+ return;
+
+ return __memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
+}
+
static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
void **p, int objects)
{
@@ -3709,34 +3728,34 @@ noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags)
}
ALLOW_ERROR_INJECTION(should_failslab, ERRNO);

-static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
- struct list_lru *lru,
- struct obj_cgroup **objcgp,
- size_t size, gfp_t flags)
+static __fastpath_inline
+struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
+ struct obj_cgroup **objcgp,
+ size_t size, gfp_t flags)
{
flags &= gfp_allowed_mask;

might_alloc(flags);

- if (should_failslab(s, flags))
+ if (unlikely(should_failslab(s, flags)))
return NULL;

- if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
+ if (unlikely(!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags)))
return NULL;

return s;
}

-static inline void slab_post_alloc_hook(struct kmem_cache *s,
- struct obj_cgroup *objcg, gfp_t flags,
- size_t size, void **p, bool init,
- unsigned int orig_size)
+static __fastpath_inline
+void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
+ gfp_t flags, size_t size, void **p, bool init,
+ unsigned int orig_size)
{
unsigned int zero_size = s->object_size;
bool kasan_init = init;
size_t i;
-
- flags &= gfp_allowed_mask;
+ gfp_t init_flags = flags & gfp_allowed_mask;

/*
* For kmalloc object, the allocated memory size(object_size) is likely
@@ -3769,13 +3788,13 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
* As p[i] might get tagged, memset and kmemleak hook come after KASAN.
*/
for (i = 0; i < size; i++) {
- p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init);
+ p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
if (p[i] && init && (!kasan_init ||
!kasan_has_integrated_init()))
memset(p[i], 0, zero_size);
kmemleak_alloc_recursive(p[i], s->object_size, 1,
- s->flags, flags);
- kmsan_slab_alloc(s, p[i], flags);
+ s->flags, init_flags);
+ kmsan_slab_alloc(s, p[i], init_flags);
}

memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
@@ -3799,7 +3818,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
bool init = false;

s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags);
- if (!s)
+ if (unlikely(!s))
return NULL;

object = kfence_alloc(s, orig_size, gfpflags);

--
2.42.1

2023-11-20 18:38:16

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 21/21] mm/slub: optimize free fast path code layout

Inspection of kmem_cache_free() disassembly showed we could make the
fast path smaller by providing few more hints to the compiler, and
splitting the memcg_slab_free_hook() into an inline part that only
checks if there's work to do, and an out of line part doing the actual
uncharge.

bloat-o-meter results:
add/remove: 2/0 grow/shrink: 0/3 up/down: 286/-554 (-268)
Function old new delta
__memcg_slab_free_hook - 270 +270
__pfx___memcg_slab_free_hook - 16 +16
kfree 828 665 -163
kmem_cache_free 1116 948 -168
kmem_cache_free_bulk.part 1701 1478 -223

Checking kmem_cache_free() disassembly now shows the non-fastpath
cases are handled out of line, which should reduce instruction cache
usage.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 40 ++++++++++++++++++++++++----------------
1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 77d259f3d592..3f8b95757106 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1959,20 +1959,11 @@ void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
return __memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
}

-static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
- void **p, int objects)
+static void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+ void **p, int objects,
+ struct obj_cgroup **objcgs)
{
- struct obj_cgroup **objcgs;
- int i;
-
- if (!memcg_kmem_online())
- return;
-
- objcgs = slab_objcgs(slab);
- if (!objcgs)
- return;
-
- for (i = 0; i < objects; i++) {
+ for (int i = 0; i < objects; i++) {
struct obj_cgroup *objcg;
unsigned int off;

@@ -1988,6 +1979,22 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
obj_cgroup_put(objcg);
}
}
+
+static __fastpath_inline
+void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
+ int objects)
+{
+ struct obj_cgroup **objcgs;
+
+ if (!memcg_kmem_online())
+ return;
+
+ objcgs = slab_objcgs(slab);
+ if (likely(!objcgs))
+ return;
+
+ __memcg_slab_free_hook(s, slab, p, objects, objcgs);
+}
#else /* CONFIG_MEMCG_KMEM */
static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
{
@@ -2047,7 +2054,7 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s,
* The initialization memset's clear the object and the metadata,
* but don't touch the SLAB redzone.
*/
- if (init) {
+ if (unlikely(init)) {
int rsize;

if (!kasan_has_integrated_init())
@@ -2083,7 +2090,8 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
next = get_freepointer(s, object);

/* If object's reuse doesn't have to be delayed */
- if (!slab_free_hook(s, object, slab_want_init_on_free(s))) {
+ if (likely(!slab_free_hook(s, object,
+ slab_want_init_on_free(s)))) {
/* Move object to the new freelist */
set_freepointer(s, object, *head);
*head = object;
@@ -4282,7 +4290,7 @@ static __fastpath_inline void slab_free(struct kmem_cache *s, struct slab *slab,
* With KASAN enabled slab_free_freelist_hook modifies the freelist
* to remove objects, whose reuse must be delayed.
*/
- if (slab_free_freelist_hook(s, &head, &tail, &cnt))
+ if (likely(slab_free_freelist_hook(s, &head, &tail, &cnt)))
do_slab_free(s, slab, head, tail, cnt, addr);
}


--
2.42.1

2023-11-20 18:38:17

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 18/21] mm/slab: move kmalloc() functions from slab_common.c to slub.c

This will eliminate a call between compilation units through
__kmem_cache_alloc_node() and allow better inlining of the allocation
fast path.

Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slab.h | 3 --
mm/slab_common.c | 119 ----------------------------------------------------
mm/slub.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 118 insertions(+), 130 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 7d7cc7af614e..54deeb0428c6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -416,9 +416,6 @@ kmalloc_slab(size_t size, gfp_t flags, unsigned long caller)
return kmalloc_caches[kmalloc_type(flags, caller)][index];
}

-void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
- int node, size_t orig_size,
- unsigned long caller);
gfp_t kmalloc_fix_flags(gfp_t flags);

/* Functions provided by the slab allocators */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 31ade17a7ad9..238293b1dbe1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -936,50 +936,6 @@ void __init create_kmalloc_caches(slab_flags_t flags)
slab_state = UP;
}

-static void *__kmalloc_large_node(size_t size, gfp_t flags, int node);
-static __always_inline
-void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
-{
- struct kmem_cache *s;
- void *ret;
-
- if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
- ret = __kmalloc_large_node(size, flags, node);
- trace_kmalloc(caller, ret, size,
- PAGE_SIZE << get_order(size), flags, node);
- return ret;
- }
-
- if (unlikely(!size))
- return ZERO_SIZE_PTR;
-
- s = kmalloc_slab(size, flags, caller);
-
- ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
- ret = kasan_kmalloc(s, ret, size, flags);
- trace_kmalloc(caller, ret, size, s->size, flags, node);
- return ret;
-}
-
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
- return __do_kmalloc_node(size, flags, node, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void *__kmalloc(size_t size, gfp_t flags)
-{
- return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc);
-
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
- int node, unsigned long caller)
-{
- return __do_kmalloc_node(size, flags, node, caller);
-}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
-
/**
* __ksize -- Report full size of underlying allocation
* @object: pointer to the object
@@ -1016,30 +972,6 @@ size_t __ksize(const void *object)
return slab_ksize(folio_slab(folio)->slab_cache);
}

-void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
-{
- void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
- size, _RET_IP_);
-
- trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
-
- ret = kasan_kmalloc(s, ret, size, gfpflags);
- return ret;
-}
-EXPORT_SYMBOL(kmalloc_trace);
-
-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
- int node, size_t size)
-{
- void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
-
- trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
-
- ret = kasan_kmalloc(s, ret, size, gfpflags);
- return ret;
-}
-EXPORT_SYMBOL(kmalloc_node_trace);
-
gfp_t kmalloc_fix_flags(gfp_t flags)
{
gfp_t invalid_mask = flags & GFP_SLAB_BUG_MASK;
@@ -1052,57 +984,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
return flags;
}

-/*
- * To avoid unnecessary overhead, we pass through large allocation requests
- * directly to the page allocator. We use __GFP_COMP, because we will need to
- * know the allocation order to free the pages properly in kfree.
- */
-
-static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
-{
- struct page *page;
- void *ptr = NULL;
- unsigned int order = get_order(size);
-
- if (unlikely(flags & GFP_SLAB_BUG_MASK))
- flags = kmalloc_fix_flags(flags);
-
- flags |= __GFP_COMP;
- page = alloc_pages_node(node, flags, order);
- if (page) {
- ptr = page_address(page);
- mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
- PAGE_SIZE << order);
- }
-
- ptr = kasan_kmalloc_large(ptr, size, flags);
- /* As ptr might get tagged, call kmemleak hook after KASAN. */
- kmemleak_alloc(ptr, size, 1, flags);
- kmsan_kmalloc_large(ptr, size, flags);
-
- return ptr;
-}
-
-void *kmalloc_large(size_t size, gfp_t flags)
-{
- void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
-
- trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
- flags, NUMA_NO_NODE);
- return ret;
-}
-EXPORT_SYMBOL(kmalloc_large);
-
-void *kmalloc_large_node(size_t size, gfp_t flags, int node)
-{
- void *ret = __kmalloc_large_node(size, flags, node);
-
- trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
- flags, node);
- return ret;
-}
-EXPORT_SYMBOL(kmalloc_large_node);
-
#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Randomize a generic freelist */
static void freelist_randomize(unsigned int *list,
diff --git a/mm/slub.c b/mm/slub.c
index 2baa9e94d9df..d6bc15929d22 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3851,14 +3851,6 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
}
EXPORT_SYMBOL(kmem_cache_alloc_lru);

-void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
- int node, size_t orig_size,
- unsigned long caller)
-{
- return slab_alloc_node(s, NULL, gfpflags, node,
- caller, orig_size);
-}
-
/**
* kmem_cache_alloc_node - Allocate an object on the specified node
* @s: The cache to allocate from.
@@ -3882,6 +3874,124 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
}
EXPORT_SYMBOL(kmem_cache_alloc_node);

+/*
+ * To avoid unnecessary overhead, we pass through large allocation requests
+ * directly to the page allocator. We use __GFP_COMP, because we will need to
+ * know the allocation order to free the pages properly in kfree.
+ */
+static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
+{
+ struct page *page;
+ void *ptr = NULL;
+ unsigned int order = get_order(size);
+
+ if (unlikely(flags & GFP_SLAB_BUG_MASK))
+ flags = kmalloc_fix_flags(flags);
+
+ flags |= __GFP_COMP;
+ page = alloc_pages_node(node, flags, order);
+ if (page) {
+ ptr = page_address(page);
+ mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
+ PAGE_SIZE << order);
+ }
+
+ ptr = kasan_kmalloc_large(ptr, size, flags);
+ /* As ptr might get tagged, call kmemleak hook after KASAN. */
+ kmemleak_alloc(ptr, size, 1, flags);
+ kmsan_kmalloc_large(ptr, size, flags);
+
+ return ptr;
+}
+
+void *kmalloc_large(size_t size, gfp_t flags)
+{
+ void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
+
+ trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
+ flags, NUMA_NO_NODE);
+ return ret;
+}
+EXPORT_SYMBOL(kmalloc_large);
+
+void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+{
+ void *ret = __kmalloc_large_node(size, flags, node);
+
+ trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
+ flags, node);
+ return ret;
+}
+EXPORT_SYMBOL(kmalloc_large_node);
+
+static __always_inline
+void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
+ unsigned long caller)
+{
+ struct kmem_cache *s;
+ void *ret;
+
+ if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
+ ret = __kmalloc_large_node(size, flags, node);
+ trace_kmalloc(caller, ret, size,
+ PAGE_SIZE << get_order(size), flags, node);
+ return ret;
+ }
+
+ if (unlikely(!size))
+ return ZERO_SIZE_PTR;
+
+ s = kmalloc_slab(size, flags, caller);
+
+ ret = slab_alloc_node(s, NULL, flags, node, caller, size);
+ ret = kasan_kmalloc(s, ret, size, flags);
+ trace_kmalloc(caller, ret, size, s->size, flags, node);
+ return ret;
+}
+
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+ return __do_kmalloc_node(size, flags, node, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc_node);
+
+void *__kmalloc(size_t size, gfp_t flags)
+{
+ return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
+ int node, unsigned long caller)
+{
+ return __do_kmalloc_node(size, flags, node, caller);
+}
+EXPORT_SYMBOL(__kmalloc_node_track_caller);
+
+void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
+{
+ void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
+ _RET_IP_, size);
+
+ trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
+
+ ret = kasan_kmalloc(s, ret, size, gfpflags);
+ return ret;
+}
+EXPORT_SYMBOL(kmalloc_trace);
+
+void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+ int node, size_t size)
+{
+ void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
+
+ trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
+
+ ret = kasan_kmalloc(s, ret, size, gfpflags);
+ return ret;
+}
+EXPORT_SYMBOL(kmalloc_node_trace);
+
static noinline void free_to_partial_list(
struct kmem_cache *s, struct slab *slab,
void *head, void *tail, int bulk_cnt,

--
2.42.1

2023-11-20 18:38:20

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 09/21] mm/slab: remove mm/slab.c and slab_def.h

Remove the SLAB implementation. Update CREDITS.
Also update and properly sort the SLOB entry there.

RIP SLAB allocator (1996 - 2024)

Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
CREDITS | 12 +-
include/linux/slab_def.h | 124 --
mm/slab.c | 4005 ----------------------------------------------
3 files changed, 8 insertions(+), 4133 deletions(-)

diff --git a/CREDITS b/CREDITS
index f33a33fd2371..943a73e96149 100644
--- a/CREDITS
+++ b/CREDITS
@@ -9,10 +9,6 @@
Linus
----------

-N: Matt Mackal
-E: [email protected]
-D: SLOB slab allocator
-
N: Matti Aarnio
E: [email protected]
D: Alpha systems hacking, IPv6 and other network related stuff
@@ -1572,6 +1568,10 @@ S: Ampferstr. 50 / 4
S: 6020 Innsbruck
S: Austria

+N: Mark Hemment
+E: [email protected]
+D: SLAB allocator implementation
+
N: Richard Henderson
E: [email protected]
E: [email protected]
@@ -2437,6 +2437,10 @@ D: work on suspend-to-ram/disk, killing duplicates from ioctl32,
D: Altera SoCFPGA and Nokia N900 support.
S: Czech Republic

+N: Olivia Mackal
+E: [email protected]
+D: SLOB slab allocator
+
N: Paul Mackerras
E: [email protected]
D: PPP driver
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
deleted file mode 100644
index a61e7d55d0d3..000000000000
--- a/include/linux/slab_def.h
+++ /dev/null
@@ -1,124 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_SLAB_DEF_H
-#define _LINUX_SLAB_DEF_H
-
-#include <linux/kfence.h>
-#include <linux/reciprocal_div.h>
-
-/*
- * Definitions unique to the original Linux SLAB allocator.
- */
-
-struct kmem_cache {
- struct array_cache __percpu *cpu_cache;
-
-/* 1) Cache tunables. Protected by slab_mutex */
- unsigned int batchcount;
- unsigned int limit;
- unsigned int shared;
-
- unsigned int size;
- struct reciprocal_value reciprocal_buffer_size;
-/* 2) touched by every alloc & free from the backend */
-
- slab_flags_t flags; /* constant flags */
- unsigned int num; /* # of objs per slab */
-
-/* 3) cache_grow/shrink */
- /* order of pgs per slab (2^n) */
- unsigned int gfporder;
-
- /* force GFP flags, e.g. GFP_DMA */
- gfp_t allocflags;
-
- size_t colour; /* cache colouring range */
- unsigned int colour_off; /* colour offset */
- unsigned int freelist_size;
-
- /* constructor func */
- void (*ctor)(void *obj);
-
-/* 4) cache creation/removal */
- const char *name;
- struct list_head list;
- int refcount;
- int object_size;
- int align;
-
-/* 5) statistics */
-#ifdef CONFIG_DEBUG_SLAB
- unsigned long num_active;
- unsigned long num_allocations;
- unsigned long high_mark;
- unsigned long grown;
- unsigned long reaped;
- unsigned long errors;
- unsigned long max_freeable;
- unsigned long node_allocs;
- unsigned long node_frees;
- unsigned long node_overflow;
- atomic_t allochit;
- atomic_t allocmiss;
- atomic_t freehit;
- atomic_t freemiss;
-
- /*
- * If debugging is enabled, then the allocator can add additional
- * fields and/or padding to every object. 'size' contains the total
- * object size including these internal fields, while 'obj_offset'
- * and 'object_size' contain the offset to the user object and its
- * size.
- */
- int obj_offset;
-#endif /* CONFIG_DEBUG_SLAB */
-
-#ifdef CONFIG_KASAN_GENERIC
- struct kasan_cache kasan_info;
-#endif
-
-#ifdef CONFIG_SLAB_FREELIST_RANDOM
- unsigned int *random_seq;
-#endif
-
-#ifdef CONFIG_HARDENED_USERCOPY
- unsigned int useroffset; /* Usercopy region offset */
- unsigned int usersize; /* Usercopy region size */
-#endif
-
- struct kmem_cache_node *node[MAX_NUMNODES];
-};
-
-static inline void *nearest_obj(struct kmem_cache *cache, const struct slab *slab,
- void *x)
-{
- void *object = x - (x - slab->s_mem) % cache->size;
- void *last_object = slab->s_mem + (cache->num - 1) * cache->size;
-
- if (unlikely(object > last_object))
- return last_object;
- else
- return object;
-}
-
-/*
- * We want to avoid an expensive divide : (offset / cache->size)
- * Using the fact that size is a constant for a particular cache,
- * we can replace (offset / cache->size) by
- * reciprocal_divide(offset, cache->reciprocal_buffer_size)
- */
-static inline unsigned int obj_to_index(const struct kmem_cache *cache,
- const struct slab *slab, void *obj)
-{
- u32 offset = (obj - slab->s_mem);
- return reciprocal_divide(offset, cache->reciprocal_buffer_size);
-}
-
-static inline int objs_per_slab(const struct kmem_cache *cache,
- const struct slab *slab)
-{
- if (is_kfence_address(slab_address(slab)))
- return 1;
- return cache->num;
-}
-
-#endif /* _LINUX_SLAB_DEF_H */
diff --git a/mm/slab.c b/mm/slab.c
deleted file mode 100644
index 37efe3241f9c..000000000000
--- a/mm/slab.c
+++ /dev/null
@@ -1,4005 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * linux/mm/slab.c
- * Written by Mark Hemment, 1996/97.
- * ([email protected])
- *
- * kmem_cache_destroy() + some cleanup - 1999 Andrea Arcangeli
- *
- * Major cleanup, different bufctl logic, per-cpu arrays
- * (c) 2000 Manfred Spraul
- *
- * Cleanup, make the head arrays unconditional, preparation for NUMA
- * (c) 2002 Manfred Spraul
- *
- * An implementation of the Slab Allocator as described in outline in;
- * UNIX Internals: The New Frontiers by Uresh Vahalia
- * Pub: Prentice Hall ISBN 0-13-101908-2
- * or with a little more detail in;
- * The Slab Allocator: An Object-Caching Kernel Memory Allocator
- * Jeff Bonwick (Sun Microsystems).
- * Presented at: USENIX Summer 1994 Technical Conference
- *
- * The memory is organized in caches, one cache for each object type.
- * (e.g. inode_cache, dentry_cache, buffer_head, vm_area_struct)
- * Each cache consists out of many slabs (they are small (usually one
- * page long) and always contiguous), and each slab contains multiple
- * initialized objects.
- *
- * This means, that your constructor is used only for newly allocated
- * slabs and you must pass objects with the same initializations to
- * kmem_cache_free.
- *
- * Each cache can only support one memory type (GFP_DMA, GFP_HIGHMEM,
- * normal). If you need a special memory type, then must create a new
- * cache for that memory type.
- *
- * In order to reduce fragmentation, the slabs are sorted in 3 groups:
- * full slabs with 0 free objects
- * partial slabs
- * empty slabs with no allocated objects
- *
- * If partial slabs exist, then new allocations come from these slabs,
- * otherwise from empty slabs or new slabs are allocated.
- *
- * kmem_cache_destroy() CAN CRASH if you try to allocate from the cache
- * during kmem_cache_destroy(). The caller must prevent concurrent allocs.
- *
- * Each cache has a short per-cpu head array, most allocs
- * and frees go into that array, and if that array overflows, then 1/2
- * of the entries in the array are given back into the global cache.
- * The head array is strictly LIFO and should improve the cache hit rates.
- * On SMP, it additionally reduces the spinlock operations.
- *
- * The c_cpuarray may not be read with enabled local interrupts -
- * it's changed with a smp_call_function().
- *
- * SMP synchronization:
- * constructors and destructors are called without any locking.
- * Several members in struct kmem_cache and struct slab never change, they
- * are accessed without any locking.
- * The per-cpu arrays are never accessed from the wrong cpu, no locking,
- * and local interrupts are disabled so slab code is preempt-safe.
- * The non-constant members are protected with a per-cache irq spinlock.
- *
- * Many thanks to Mark Hemment, who wrote another per-cpu slab patch
- * in 2000 - many ideas in the current implementation are derived from
- * his patch.
- *
- * Further notes from the original documentation:
- *
- * 11 April '97. Started multi-threading - markhe
- * The global cache-chain is protected by the mutex 'slab_mutex'.
- * The sem is only needed when accessing/extending the cache-chain, which
- * can never happen inside an interrupt (kmem_cache_create(),
- * kmem_cache_shrink() and kmem_cache_reap()).
- *
- * At present, each engine can be growing a cache. This should be blocked.
- *
- * 15 March 2005. NUMA slab allocator.
- * Shai Fultheim <[email protected]>.
- * Shobhit Dayal <[email protected]>
- * Alok N Kataria <[email protected]>
- * Christoph Lameter <[email protected]>
- *
- * Modified the slab allocator to be node aware on NUMA systems.
- * Each node has its own list of partial, free and full slabs.
- * All object allocations for a node occur from node specific slab lists.
- */
-
-#include <linux/slab.h>
-#include <linux/mm.h>
-#include <linux/poison.h>
-#include <linux/swap.h>
-#include <linux/cache.h>
-#include <linux/interrupt.h>
-#include <linux/init.h>
-#include <linux/compiler.h>
-#include <linux/cpuset.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/notifier.h>
-#include <linux/kallsyms.h>
-#include <linux/kfence.h>
-#include <linux/cpu.h>
-#include <linux/sysctl.h>
-#include <linux/module.h>
-#include <linux/rcupdate.h>
-#include <linux/string.h>
-#include <linux/uaccess.h>
-#include <linux/nodemask.h>
-#include <linux/kmemleak.h>
-#include <linux/mempolicy.h>
-#include <linux/mutex.h>
-#include <linux/fault-inject.h>
-#include <linux/rtmutex.h>
-#include <linux/reciprocal_div.h>
-#include <linux/debugobjects.h>
-#include <linux/memory.h>
-#include <linux/prefetch.h>
-#include <linux/sched/task_stack.h>
-
-#include <net/sock.h>
-
-#include <asm/cacheflush.h>
-#include <asm/tlbflush.h>
-#include <asm/page.h>
-
-#include <trace/events/kmem.h>
-
-#include "internal.h"
-
-#include "slab.h"
-
-/*
- * DEBUG - 1 for kmem_cache_create() to honour; SLAB_RED_ZONE & SLAB_POISON.
- * 0 for faster, smaller code (especially in the critical paths).
- *
- * STATS - 1 to collect stats for /proc/slabinfo.
- * 0 for faster, smaller code (especially in the critical paths).
- *
- * FORCED_DEBUG - 1 enables SLAB_RED_ZONE and SLAB_POISON (if possible)
- */
-
-#ifdef CONFIG_DEBUG_SLAB
-#define DEBUG 1
-#define STATS 1
-#define FORCED_DEBUG 1
-#else
-#define DEBUG 0
-#define STATS 0
-#define FORCED_DEBUG 0
-#endif
-
-/* Shouldn't this be in a header file somewhere? */
-#define BYTES_PER_WORD sizeof(void *)
-#define REDZONE_ALIGN max(BYTES_PER_WORD, __alignof__(unsigned long long))
-
-#ifndef ARCH_KMALLOC_FLAGS
-#define ARCH_KMALLOC_FLAGS SLAB_HWCACHE_ALIGN
-#endif
-
-#define FREELIST_BYTE_INDEX (((PAGE_SIZE >> BITS_PER_BYTE) \
- <= SLAB_OBJ_MIN_SIZE) ? 1 : 0)
-
-#if FREELIST_BYTE_INDEX
-typedef unsigned char freelist_idx_t;
-#else
-typedef unsigned short freelist_idx_t;
-#endif
-
-#define SLAB_OBJ_MAX_NUM ((1 << sizeof(freelist_idx_t) * BITS_PER_BYTE) - 1)
-
-/*
- * struct array_cache
- *
- * Purpose:
- * - LIFO ordering, to hand out cache-warm objects from _alloc
- * - reduce the number of linked list operations
- * - reduce spinlock operations
- *
- * The limit is stored in the per-cpu structure to reduce the data cache
- * footprint.
- *
- */
-struct array_cache {
- unsigned int avail;
- unsigned int limit;
- unsigned int batchcount;
- unsigned int touched;
- void *entry[]; /*
- * Must have this definition in here for the proper
- * alignment of array_cache. Also simplifies accessing
- * the entries.
- */
-};
-
-struct alien_cache {
- spinlock_t lock;
- struct array_cache ac;
-};
-
-/*
- * Need this for bootstrapping a per node allocator.
- */
-#define NUM_INIT_LISTS (2 * MAX_NUMNODES)
-static struct kmem_cache_node __initdata init_kmem_cache_node[NUM_INIT_LISTS];
-#define CACHE_CACHE 0
-#define SIZE_NODE (MAX_NUMNODES)
-
-static int drain_freelist(struct kmem_cache *cache,
- struct kmem_cache_node *n, int tofree);
-static void free_block(struct kmem_cache *cachep, void **objpp, int len,
- int node, struct list_head *list);
-static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list);
-static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp);
-static void cache_reap(struct work_struct *unused);
-
-static inline void fixup_objfreelist_debug(struct kmem_cache *cachep,
- void **list);
-static inline void fixup_slab_list(struct kmem_cache *cachep,
- struct kmem_cache_node *n, struct slab *slab,
- void **list);
-
-#define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node))
-
-static void kmem_cache_node_init(struct kmem_cache_node *parent)
-{
- INIT_LIST_HEAD(&parent->slabs_full);
- INIT_LIST_HEAD(&parent->slabs_partial);
- INIT_LIST_HEAD(&parent->slabs_free);
- parent->total_slabs = 0;
- parent->free_slabs = 0;
- parent->shared = NULL;
- parent->alien = NULL;
- parent->colour_next = 0;
- raw_spin_lock_init(&parent->list_lock);
- parent->free_objects = 0;
- parent->free_touched = 0;
-}
-
-#define MAKE_LIST(cachep, listp, slab, nodeid) \
- do { \
- INIT_LIST_HEAD(listp); \
- list_splice(&get_node(cachep, nodeid)->slab, listp); \
- } while (0)
-
-#define MAKE_ALL_LISTS(cachep, ptr, nodeid) \
- do { \
- MAKE_LIST((cachep), (&(ptr)->slabs_full), slabs_full, nodeid); \
- MAKE_LIST((cachep), (&(ptr)->slabs_partial), slabs_partial, nodeid); \
- MAKE_LIST((cachep), (&(ptr)->slabs_free), slabs_free, nodeid); \
- } while (0)
-
-#define CFLGS_OBJFREELIST_SLAB ((slab_flags_t __force)0x40000000U)
-#define CFLGS_OFF_SLAB ((slab_flags_t __force)0x80000000U)
-#define OBJFREELIST_SLAB(x) ((x)->flags & CFLGS_OBJFREELIST_SLAB)
-#define OFF_SLAB(x) ((x)->flags & CFLGS_OFF_SLAB)
-
-#define BATCHREFILL_LIMIT 16
-/*
- * Optimization question: fewer reaps means less probability for unnecessary
- * cpucache drain/refill cycles.
- *
- * OTOH the cpuarrays can contain lots of objects,
- * which could lock up otherwise freeable slabs.
- */
-#define REAPTIMEOUT_AC (2*HZ)
-#define REAPTIMEOUT_NODE (4*HZ)
-
-#if STATS
-#define STATS_INC_ACTIVE(x) ((x)->num_active++)
-#define STATS_DEC_ACTIVE(x) ((x)->num_active--)
-#define STATS_INC_ALLOCED(x) ((x)->num_allocations++)
-#define STATS_INC_GROWN(x) ((x)->grown++)
-#define STATS_ADD_REAPED(x, y) ((x)->reaped += (y))
-#define STATS_SET_HIGH(x) \
- do { \
- if ((x)->num_active > (x)->high_mark) \
- (x)->high_mark = (x)->num_active; \
- } while (0)
-#define STATS_INC_ERR(x) ((x)->errors++)
-#define STATS_INC_NODEALLOCS(x) ((x)->node_allocs++)
-#define STATS_INC_NODEFREES(x) ((x)->node_frees++)
-#define STATS_INC_ACOVERFLOW(x) ((x)->node_overflow++)
-#define STATS_SET_FREEABLE(x, i) \
- do { \
- if ((x)->max_freeable < i) \
- (x)->max_freeable = i; \
- } while (0)
-#define STATS_INC_ALLOCHIT(x) atomic_inc(&(x)->allochit)
-#define STATS_INC_ALLOCMISS(x) atomic_inc(&(x)->allocmiss)
-#define STATS_INC_FREEHIT(x) atomic_inc(&(x)->freehit)
-#define STATS_INC_FREEMISS(x) atomic_inc(&(x)->freemiss)
-#else
-#define STATS_INC_ACTIVE(x) do { } while (0)
-#define STATS_DEC_ACTIVE(x) do { } while (0)
-#define STATS_INC_ALLOCED(x) do { } while (0)
-#define STATS_INC_GROWN(x) do { } while (0)
-#define STATS_ADD_REAPED(x, y) do { (void)(y); } while (0)
-#define STATS_SET_HIGH(x) do { } while (0)
-#define STATS_INC_ERR(x) do { } while (0)
-#define STATS_INC_NODEALLOCS(x) do { } while (0)
-#define STATS_INC_NODEFREES(x) do { } while (0)
-#define STATS_INC_ACOVERFLOW(x) do { } while (0)
-#define STATS_SET_FREEABLE(x, i) do { } while (0)
-#define STATS_INC_ALLOCHIT(x) do { } while (0)
-#define STATS_INC_ALLOCMISS(x) do { } while (0)
-#define STATS_INC_FREEHIT(x) do { } while (0)
-#define STATS_INC_FREEMISS(x) do { } while (0)
-#endif
-
-#if DEBUG
-
-/*
- * memory layout of objects:
- * 0 : objp
- * 0 .. cachep->obj_offset - BYTES_PER_WORD - 1: padding. This ensures that
- * the end of an object is aligned with the end of the real
- * allocation. Catches writes behind the end of the allocation.
- * cachep->obj_offset - BYTES_PER_WORD .. cachep->obj_offset - 1:
- * redzone word.
- * cachep->obj_offset: The real object.
- * cachep->size - 2* BYTES_PER_WORD: redzone word [BYTES_PER_WORD long]
- * cachep->size - 1* BYTES_PER_WORD: last caller address
- * [BYTES_PER_WORD long]
- */
-static int obj_offset(struct kmem_cache *cachep)
-{
- return cachep->obj_offset;
-}
-
-static unsigned long long *dbg_redzone1(struct kmem_cache *cachep, void *objp)
-{
- BUG_ON(!(cachep->flags & SLAB_RED_ZONE));
- return (unsigned long long *) (objp + obj_offset(cachep) -
- sizeof(unsigned long long));
-}
-
-static unsigned long long *dbg_redzone2(struct kmem_cache *cachep, void *objp)
-{
- BUG_ON(!(cachep->flags & SLAB_RED_ZONE));
- if (cachep->flags & SLAB_STORE_USER)
- return (unsigned long long *)(objp + cachep->size -
- sizeof(unsigned long long) -
- REDZONE_ALIGN);
- return (unsigned long long *) (objp + cachep->size -
- sizeof(unsigned long long));
-}
-
-static void **dbg_userword(struct kmem_cache *cachep, void *objp)
-{
- BUG_ON(!(cachep->flags & SLAB_STORE_USER));
- return (void **)(objp + cachep->size - BYTES_PER_WORD);
-}
-
-#else
-
-#define obj_offset(x) 0
-#define dbg_redzone1(cachep, objp) ({BUG(); (unsigned long long *)NULL;})
-#define dbg_redzone2(cachep, objp) ({BUG(); (unsigned long long *)NULL;})
-#define dbg_userword(cachep, objp) ({BUG(); (void **)NULL;})
-
-#endif
-
-/*
- * Do not go above this order unless 0 objects fit into the slab or
- * overridden on the command line.
- */
-#define SLAB_MAX_ORDER_HI 1
-#define SLAB_MAX_ORDER_LO 0
-static int slab_max_order = SLAB_MAX_ORDER_LO;
-static bool slab_max_order_set __initdata;
-
-static inline void *index_to_obj(struct kmem_cache *cache,
- const struct slab *slab, unsigned int idx)
-{
- return slab->s_mem + cache->size * idx;
-}
-
-#define BOOT_CPUCACHE_ENTRIES 1
-/* internal cache of cache description objs */
-static struct kmem_cache kmem_cache_boot = {
- .batchcount = 1,
- .limit = BOOT_CPUCACHE_ENTRIES,
- .shared = 1,
- .size = sizeof(struct kmem_cache),
- .name = "kmem_cache",
-};
-
-static DEFINE_PER_CPU(struct delayed_work, slab_reap_work);
-
-static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
-{
- return this_cpu_ptr(cachep->cpu_cache);
-}
-
-/*
- * Calculate the number of objects and left-over bytes for a given buffer size.
- */
-static unsigned int cache_estimate(unsigned long gfporder, size_t buffer_size,
- slab_flags_t flags, size_t *left_over)
-{
- unsigned int num;
- size_t slab_size = PAGE_SIZE << gfporder;
-
- /*
- * The slab management structure can be either off the slab or
- * on it. For the latter case, the memory allocated for a
- * slab is used for:
- *
- * - @buffer_size bytes for each object
- * - One freelist_idx_t for each object
- *
- * We don't need to consider alignment of freelist because
- * freelist will be at the end of slab page. The objects will be
- * at the correct alignment.
- *
- * If the slab management structure is off the slab, then the
- * alignment will already be calculated into the size. Because
- * the slabs are all pages aligned, the objects will be at the
- * correct alignment when allocated.
- */
- if (flags & (CFLGS_OBJFREELIST_SLAB | CFLGS_OFF_SLAB)) {
- num = slab_size / buffer_size;
- *left_over = slab_size % buffer_size;
- } else {
- num = slab_size / (buffer_size + sizeof(freelist_idx_t));
- *left_over = slab_size %
- (buffer_size + sizeof(freelist_idx_t));
- }
-
- return num;
-}
-
-#if DEBUG
-#define slab_error(cachep, msg) __slab_error(__func__, cachep, msg)
-
-static void __slab_error(const char *function, struct kmem_cache *cachep,
- char *msg)
-{
- pr_err("slab error in %s(): cache `%s': %s\n",
- function, cachep->name, msg);
- dump_stack();
- add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
-}
-#endif
-
-/*
- * By default on NUMA we use alien caches to stage the freeing of
- * objects allocated from other nodes. This causes massive memory
- * inefficiencies when using fake NUMA setup to split memory into a
- * large number of small nodes, so it can be disabled on the command
- * line
- */
-
-static int use_alien_caches __read_mostly = 1;
-static int __init noaliencache_setup(char *s)
-{
- use_alien_caches = 0;
- return 1;
-}
-__setup("noaliencache", noaliencache_setup);
-
-static int __init slab_max_order_setup(char *str)
-{
- get_option(&str, &slab_max_order);
- slab_max_order = slab_max_order < 0 ? 0 :
- min(slab_max_order, MAX_ORDER);
- slab_max_order_set = true;
-
- return 1;
-}
-__setup("slab_max_order=", slab_max_order_setup);
-
-#ifdef CONFIG_NUMA
-/*
- * Special reaping functions for NUMA systems called from cache_reap().
- * These take care of doing round robin flushing of alien caches (containing
- * objects freed on different nodes from which they were allocated) and the
- * flushing of remote pcps by calling drain_node_pages.
- */
-static DEFINE_PER_CPU(unsigned long, slab_reap_node);
-
-static void init_reap_node(int cpu)
-{
- per_cpu(slab_reap_node, cpu) = next_node_in(cpu_to_mem(cpu),
- node_online_map);
-}
-
-static void next_reap_node(void)
-{
- int node = __this_cpu_read(slab_reap_node);
-
- node = next_node_in(node, node_online_map);
- __this_cpu_write(slab_reap_node, node);
-}
-
-#else
-#define init_reap_node(cpu) do { } while (0)
-#define next_reap_node(void) do { } while (0)
-#endif
-
-/*
- * Initiate the reap timer running on the target CPU. We run at around 1 to 2Hz
- * via the workqueue/eventd.
- * Add the CPU number into the expiration time to minimize the possibility of
- * the CPUs getting into lockstep and contending for the global cache chain
- * lock.
- */
-static void start_cpu_timer(int cpu)
-{
- struct delayed_work *reap_work = &per_cpu(slab_reap_work, cpu);
-
- if (reap_work->work.func == NULL) {
- init_reap_node(cpu);
- INIT_DEFERRABLE_WORK(reap_work, cache_reap);
- schedule_delayed_work_on(cpu, reap_work,
- __round_jiffies_relative(HZ, cpu));
- }
-}
-
-static void init_arraycache(struct array_cache *ac, int limit, int batch)
-{
- if (ac) {
- ac->avail = 0;
- ac->limit = limit;
- ac->batchcount = batch;
- ac->touched = 0;
- }
-}
-
-static struct array_cache *alloc_arraycache(int node, int entries,
- int batchcount, gfp_t gfp)
-{
- size_t memsize = sizeof(void *) * entries + sizeof(struct array_cache);
- struct array_cache *ac = NULL;
-
- ac = kmalloc_node(memsize, gfp, node);
- /*
- * The array_cache structures contain pointers to free object.
- * However, when such objects are allocated or transferred to another
- * cache the pointers are not cleared and they could be counted as
- * valid references during a kmemleak scan. Therefore, kmemleak must
- * not scan such objects.
- */
- kmemleak_no_scan(ac);
- init_arraycache(ac, entries, batchcount);
- return ac;
-}
-
-static noinline void cache_free_pfmemalloc(struct kmem_cache *cachep,
- struct slab *slab, void *objp)
-{
- struct kmem_cache_node *n;
- int slab_node;
- LIST_HEAD(list);
-
- slab_node = slab_nid(slab);
- n = get_node(cachep, slab_node);
-
- raw_spin_lock(&n->list_lock);
- free_block(cachep, &objp, 1, slab_node, &list);
- raw_spin_unlock(&n->list_lock);
-
- slabs_destroy(cachep, &list);
-}
-
-/*
- * Transfer objects in one arraycache to another.
- * Locking must be handled by the caller.
- *
- * Return the number of entries transferred.
- */
-static int transfer_objects(struct array_cache *to,
- struct array_cache *from, unsigned int max)
-{
- /* Figure out how many entries to transfer */
- int nr = min3(from->avail, max, to->limit - to->avail);
-
- if (!nr)
- return 0;
-
- memcpy(to->entry + to->avail, from->entry + from->avail - nr,
- sizeof(void *) *nr);
-
- from->avail -= nr;
- to->avail += nr;
- return nr;
-}
-
-/* &alien->lock must be held by alien callers. */
-static __always_inline void __free_one(struct array_cache *ac, void *objp)
-{
- /* Avoid trivial double-free. */
- if (IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) &&
- WARN_ON_ONCE(ac->avail > 0 && ac->entry[ac->avail - 1] == objp))
- return;
- ac->entry[ac->avail++] = objp;
-}
-
-#ifndef CONFIG_NUMA
-
-#define drain_alien_cache(cachep, alien) do { } while (0)
-#define reap_alien(cachep, n) do { } while (0)
-
-static inline struct alien_cache **alloc_alien_cache(int node,
- int limit, gfp_t gfp)
-{
- return NULL;
-}
-
-static inline void free_alien_cache(struct alien_cache **ac_ptr)
-{
-}
-
-static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
-{
- return 0;
-}
-
-static inline gfp_t gfp_exact_node(gfp_t flags)
-{
- return flags & ~__GFP_NOFAIL;
-}
-
-#else /* CONFIG_NUMA */
-
-static struct alien_cache *__alloc_alien_cache(int node, int entries,
- int batch, gfp_t gfp)
-{
- size_t memsize = sizeof(void *) * entries + sizeof(struct alien_cache);
- struct alien_cache *alc = NULL;
-
- alc = kmalloc_node(memsize, gfp, node);
- if (alc) {
- kmemleak_no_scan(alc);
- init_arraycache(&alc->ac, entries, batch);
- spin_lock_init(&alc->lock);
- }
- return alc;
-}
-
-static struct alien_cache **alloc_alien_cache(int node, int limit, gfp_t gfp)
-{
- struct alien_cache **alc_ptr;
- int i;
-
- if (limit > 1)
- limit = 12;
- alc_ptr = kcalloc_node(nr_node_ids, sizeof(void *), gfp, node);
- if (!alc_ptr)
- return NULL;
-
- for_each_node(i) {
- if (i == node || !node_online(i))
- continue;
- alc_ptr[i] = __alloc_alien_cache(node, limit, 0xbaadf00d, gfp);
- if (!alc_ptr[i]) {
- for (i--; i >= 0; i--)
- kfree(alc_ptr[i]);
- kfree(alc_ptr);
- return NULL;
- }
- }
- return alc_ptr;
-}
-
-static void free_alien_cache(struct alien_cache **alc_ptr)
-{
- int i;
-
- if (!alc_ptr)
- return;
- for_each_node(i)
- kfree(alc_ptr[i]);
- kfree(alc_ptr);
-}
-
-static void __drain_alien_cache(struct kmem_cache *cachep,
- struct array_cache *ac, int node,
- struct list_head *list)
-{
- struct kmem_cache_node *n = get_node(cachep, node);
-
- if (ac->avail) {
- raw_spin_lock(&n->list_lock);
- /*
- * Stuff objects into the remote nodes shared array first.
- * That way we could avoid the overhead of putting the objects
- * into the free lists and getting them back later.
- */
- if (n->shared)
- transfer_objects(n->shared, ac, ac->limit);
-
- free_block(cachep, ac->entry, ac->avail, node, list);
- ac->avail = 0;
- raw_spin_unlock(&n->list_lock);
- }
-}
-
-/*
- * Called from cache_reap() to regularly drain alien caches round robin.
- */
-static void reap_alien(struct kmem_cache *cachep, struct kmem_cache_node *n)
-{
- int node = __this_cpu_read(slab_reap_node);
-
- if (n->alien) {
- struct alien_cache *alc = n->alien[node];
- struct array_cache *ac;
-
- if (alc) {
- ac = &alc->ac;
- if (ac->avail && spin_trylock_irq(&alc->lock)) {
- LIST_HEAD(list);
-
- __drain_alien_cache(cachep, ac, node, &list);
- spin_unlock_irq(&alc->lock);
- slabs_destroy(cachep, &list);
- }
- }
- }
-}
-
-static void drain_alien_cache(struct kmem_cache *cachep,
- struct alien_cache **alien)
-{
- int i = 0;
- struct alien_cache *alc;
- struct array_cache *ac;
- unsigned long flags;
-
- for_each_online_node(i) {
- alc = alien[i];
- if (alc) {
- LIST_HEAD(list);
-
- ac = &alc->ac;
- spin_lock_irqsave(&alc->lock, flags);
- __drain_alien_cache(cachep, ac, i, &list);
- spin_unlock_irqrestore(&alc->lock, flags);
- slabs_destroy(cachep, &list);
- }
- }
-}
-
-static int __cache_free_alien(struct kmem_cache *cachep, void *objp,
- int node, int slab_node)
-{
- struct kmem_cache_node *n;
- struct alien_cache *alien = NULL;
- struct array_cache *ac;
- LIST_HEAD(list);
-
- n = get_node(cachep, node);
- STATS_INC_NODEFREES(cachep);
- if (n->alien && n->alien[slab_node]) {
- alien = n->alien[slab_node];
- ac = &alien->ac;
- spin_lock(&alien->lock);
- if (unlikely(ac->avail == ac->limit)) {
- STATS_INC_ACOVERFLOW(cachep);
- __drain_alien_cache(cachep, ac, slab_node, &list);
- }
- __free_one(ac, objp);
- spin_unlock(&alien->lock);
- slabs_destroy(cachep, &list);
- } else {
- n = get_node(cachep, slab_node);
- raw_spin_lock(&n->list_lock);
- free_block(cachep, &objp, 1, slab_node, &list);
- raw_spin_unlock(&n->list_lock);
- slabs_destroy(cachep, &list);
- }
- return 1;
-}
-
-static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
-{
- int slab_node = slab_nid(virt_to_slab(objp));
- int node = numa_mem_id();
- /*
- * Make sure we are not freeing an object from another node to the array
- * cache on this cpu.
- */
- if (likely(node == slab_node))
- return 0;
-
- return __cache_free_alien(cachep, objp, node, slab_node);
-}
-
-/*
- * Construct gfp mask to allocate from a specific node but do not reclaim or
- * warn about failures.
- */
-static inline gfp_t gfp_exact_node(gfp_t flags)
-{
- return (flags | __GFP_THISNODE | __GFP_NOWARN) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
-}
-#endif
-
-static int init_cache_node(struct kmem_cache *cachep, int node, gfp_t gfp)
-{
- struct kmem_cache_node *n;
-
- /*
- * Set up the kmem_cache_node for cpu before we can
- * begin anything. Make sure some other cpu on this
- * node has not already allocated this
- */
- n = get_node(cachep, node);
- if (n) {
- raw_spin_lock_irq(&n->list_lock);
- n->free_limit = (1 + nr_cpus_node(node)) * cachep->batchcount +
- cachep->num;
- raw_spin_unlock_irq(&n->list_lock);
-
- return 0;
- }
-
- n = kmalloc_node(sizeof(struct kmem_cache_node), gfp, node);
- if (!n)
- return -ENOMEM;
-
- kmem_cache_node_init(n);
- n->next_reap = jiffies + REAPTIMEOUT_NODE +
- ((unsigned long)cachep) % REAPTIMEOUT_NODE;
-
- n->free_limit =
- (1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num;
-
- /*
- * The kmem_cache_nodes don't come and go as CPUs
- * come and go. slab_mutex provides sufficient
- * protection here.
- */
- cachep->node[node] = n;
-
- return 0;
-}
-
-#if defined(CONFIG_NUMA) || defined(CONFIG_SMP)
-/*
- * Allocates and initializes node for a node on each slab cache, used for
- * either memory or cpu hotplug. If memory is being hot-added, the kmem_cache_node
- * will be allocated off-node since memory is not yet online for the new node.
- * When hotplugging memory or a cpu, existing nodes are not replaced if
- * already in use.
- *
- * Must hold slab_mutex.
- */
-static int init_cache_node_node(int node)
-{
- int ret;
- struct kmem_cache *cachep;
-
- list_for_each_entry(cachep, &slab_caches, list) {
- ret = init_cache_node(cachep, node, GFP_KERNEL);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-#endif
-
-static int setup_kmem_cache_node(struct kmem_cache *cachep,
- int node, gfp_t gfp, bool force_change)
-{
- int ret = -ENOMEM;
- struct kmem_cache_node *n;
- struct array_cache *old_shared = NULL;
- struct array_cache *new_shared = NULL;
- struct alien_cache **new_alien = NULL;
- LIST_HEAD(list);
-
- if (use_alien_caches) {
- new_alien = alloc_alien_cache(node, cachep->limit, gfp);
- if (!new_alien)
- goto fail;
- }
-
- if (cachep->shared) {
- new_shared = alloc_arraycache(node,
- cachep->shared * cachep->batchcount, 0xbaadf00d, gfp);
- if (!new_shared)
- goto fail;
- }
-
- ret = init_cache_node(cachep, node, gfp);
- if (ret)
- goto fail;
-
- n = get_node(cachep, node);
- raw_spin_lock_irq(&n->list_lock);
- if (n->shared && force_change) {
- free_block(cachep, n->shared->entry,
- n->shared->avail, node, &list);
- n->shared->avail = 0;
- }
-
- if (!n->shared || force_change) {
- old_shared = n->shared;
- n->shared = new_shared;
- new_shared = NULL;
- }
-
- if (!n->alien) {
- n->alien = new_alien;
- new_alien = NULL;
- }
-
- raw_spin_unlock_irq(&n->list_lock);
- slabs_destroy(cachep, &list);
-
- /*
- * To protect lockless access to n->shared during irq disabled context.
- * If n->shared isn't NULL in irq disabled context, accessing to it is
- * guaranteed to be valid until irq is re-enabled, because it will be
- * freed after synchronize_rcu().
- */
- if (old_shared && force_change)
- synchronize_rcu();
-
-fail:
- kfree(old_shared);
- kfree(new_shared);
- free_alien_cache(new_alien);
-
- return ret;
-}
-
-#ifdef CONFIG_SMP
-
-static void cpuup_canceled(long cpu)
-{
- struct kmem_cache *cachep;
- struct kmem_cache_node *n = NULL;
- int node = cpu_to_mem(cpu);
- const struct cpumask *mask = cpumask_of_node(node);
-
- list_for_each_entry(cachep, &slab_caches, list) {
- struct array_cache *nc;
- struct array_cache *shared;
- struct alien_cache **alien;
- LIST_HEAD(list);
-
- n = get_node(cachep, node);
- if (!n)
- continue;
-
- raw_spin_lock_irq(&n->list_lock);
-
- /* Free limit for this kmem_cache_node */
- n->free_limit -= cachep->batchcount;
-
- /* cpu is dead; no one can alloc from it. */
- nc = per_cpu_ptr(cachep->cpu_cache, cpu);
- free_block(cachep, nc->entry, nc->avail, node, &list);
- nc->avail = 0;
-
- if (!cpumask_empty(mask)) {
- raw_spin_unlock_irq(&n->list_lock);
- goto free_slab;
- }
-
- shared = n->shared;
- if (shared) {
- free_block(cachep, shared->entry,
- shared->avail, node, &list);
- n->shared = NULL;
- }
-
- alien = n->alien;
- n->alien = NULL;
-
- raw_spin_unlock_irq(&n->list_lock);
-
- kfree(shared);
- if (alien) {
- drain_alien_cache(cachep, alien);
- free_alien_cache(alien);
- }
-
-free_slab:
- slabs_destroy(cachep, &list);
- }
- /*
- * In the previous loop, all the objects were freed to
- * the respective cache's slabs, now we can go ahead and
- * shrink each nodelist to its limit.
- */
- list_for_each_entry(cachep, &slab_caches, list) {
- n = get_node(cachep, node);
- if (!n)
- continue;
- drain_freelist(cachep, n, INT_MAX);
- }
-}
-
-static int cpuup_prepare(long cpu)
-{
- struct kmem_cache *cachep;
- int node = cpu_to_mem(cpu);
- int err;
-
- /*
- * We need to do this right in the beginning since
- * alloc_arraycache's are going to use this list.
- * kmalloc_node allows us to add the slab to the right
- * kmem_cache_node and not this cpu's kmem_cache_node
- */
- err = init_cache_node_node(node);
- if (err < 0)
- goto bad;
-
- /*
- * Now we can go ahead with allocating the shared arrays and
- * array caches
- */
- list_for_each_entry(cachep, &slab_caches, list) {
- err = setup_kmem_cache_node(cachep, node, GFP_KERNEL, false);
- if (err)
- goto bad;
- }
-
- return 0;
-bad:
- cpuup_canceled(cpu);
- return -ENOMEM;
-}
-
-int slab_prepare_cpu(unsigned int cpu)
-{
- int err;
-
- mutex_lock(&slab_mutex);
- err = cpuup_prepare(cpu);
- mutex_unlock(&slab_mutex);
- return err;
-}
-
-/*
- * This is called for a failed online attempt and for a successful
- * offline.
- *
- * Even if all the cpus of a node are down, we don't free the
- * kmem_cache_node of any cache. This is to avoid a race between cpu_down, and
- * a kmalloc allocation from another cpu for memory from the node of
- * the cpu going down. The kmem_cache_node structure is usually allocated from
- * kmem_cache_create() and gets destroyed at kmem_cache_destroy().
- */
-int slab_dead_cpu(unsigned int cpu)
-{
- mutex_lock(&slab_mutex);
- cpuup_canceled(cpu);
- mutex_unlock(&slab_mutex);
- return 0;
-}
-#endif
-
-static int slab_online_cpu(unsigned int cpu)
-{
- start_cpu_timer(cpu);
- return 0;
-}
-
-static int slab_offline_cpu(unsigned int cpu)
-{
- /*
- * Shutdown cache reaper. Note that the slab_mutex is held so
- * that if cache_reap() is invoked it cannot do anything
- * expensive but will only modify reap_work and reschedule the
- * timer.
- */
- cancel_delayed_work_sync(&per_cpu(slab_reap_work, cpu));
- /* Now the cache_reaper is guaranteed to be not running. */
- per_cpu(slab_reap_work, cpu).work.func = NULL;
- return 0;
-}
-
-#if defined(CONFIG_NUMA)
-/*
- * Drains freelist for a node on each slab cache, used for memory hot-remove.
- * Returns -EBUSY if all objects cannot be drained so that the node is not
- * removed.
- *
- * Must hold slab_mutex.
- */
-static int __meminit drain_cache_node_node(int node)
-{
- struct kmem_cache *cachep;
- int ret = 0;
-
- list_for_each_entry(cachep, &slab_caches, list) {
- struct kmem_cache_node *n;
-
- n = get_node(cachep, node);
- if (!n)
- continue;
-
- drain_freelist(cachep, n, INT_MAX);
-
- if (!list_empty(&n->slabs_full) ||
- !list_empty(&n->slabs_partial)) {
- ret = -EBUSY;
- break;
- }
- }
- return ret;
-}
-
-static int __meminit slab_memory_callback(struct notifier_block *self,
- unsigned long action, void *arg)
-{
- struct memory_notify *mnb = arg;
- int ret = 0;
- int nid;
-
- nid = mnb->status_change_nid;
- if (nid < 0)
- goto out;
-
- switch (action) {
- case MEM_GOING_ONLINE:
- mutex_lock(&slab_mutex);
- ret = init_cache_node_node(nid);
- mutex_unlock(&slab_mutex);
- break;
- case MEM_GOING_OFFLINE:
- mutex_lock(&slab_mutex);
- ret = drain_cache_node_node(nid);
- mutex_unlock(&slab_mutex);
- break;
- case MEM_ONLINE:
- case MEM_OFFLINE:
- case MEM_CANCEL_ONLINE:
- case MEM_CANCEL_OFFLINE:
- break;
- }
-out:
- return notifier_from_errno(ret);
-}
-#endif /* CONFIG_NUMA */
-
-/*
- * swap the static kmem_cache_node with kmalloced memory
- */
-static void __init init_list(struct kmem_cache *cachep, struct kmem_cache_node *list,
- int nodeid)
-{
- struct kmem_cache_node *ptr;
-
- ptr = kmalloc_node(sizeof(struct kmem_cache_node), GFP_NOWAIT, nodeid);
- BUG_ON(!ptr);
-
- memcpy(ptr, list, sizeof(struct kmem_cache_node));
- /*
- * Do not assume that spinlocks can be initialized via memcpy:
- */
- raw_spin_lock_init(&ptr->list_lock);
-
- MAKE_ALL_LISTS(cachep, ptr, nodeid);
- cachep->node[nodeid] = ptr;
-}
-
-/*
- * For setting up all the kmem_cache_node for cache whose buffer_size is same as
- * size of kmem_cache_node.
- */
-static void __init set_up_node(struct kmem_cache *cachep, int index)
-{
- int node;
-
- for_each_online_node(node) {
- cachep->node[node] = &init_kmem_cache_node[index + node];
- cachep->node[node]->next_reap = jiffies +
- REAPTIMEOUT_NODE +
- ((unsigned long)cachep) % REAPTIMEOUT_NODE;
- }
-}
-
-/*
- * Initialisation. Called after the page allocator have been initialised and
- * before smp_init().
- */
-void __init kmem_cache_init(void)
-{
- int i;
-
- kmem_cache = &kmem_cache_boot;
-
- if (!IS_ENABLED(CONFIG_NUMA) || num_possible_nodes() == 1)
- use_alien_caches = 0;
-
- for (i = 0; i < NUM_INIT_LISTS; i++)
- kmem_cache_node_init(&init_kmem_cache_node[i]);
-
- /*
- * Fragmentation resistance on low memory - only use bigger
- * page orders on machines with more than 32MB of memory if
- * not overridden on the command line.
- */
- if (!slab_max_order_set && totalram_pages() > (32 << 20) >> PAGE_SHIFT)
- slab_max_order = SLAB_MAX_ORDER_HI;
-
- /* Bootstrap is tricky, because several objects are allocated
- * from caches that do not exist yet:
- * 1) initialize the kmem_cache cache: it contains the struct
- * kmem_cache structures of all caches, except kmem_cache itself:
- * kmem_cache is statically allocated.
- * Initially an __init data area is used for the head array and the
- * kmem_cache_node structures, it's replaced with a kmalloc allocated
- * array at the end of the bootstrap.
- * 2) Create the first kmalloc cache.
- * The struct kmem_cache for the new cache is allocated normally.
- * An __init data area is used for the head array.
- * 3) Create the remaining kmalloc caches, with minimally sized
- * head arrays.
- * 4) Replace the __init data head arrays for kmem_cache and the first
- * kmalloc cache with kmalloc allocated arrays.
- * 5) Replace the __init data for kmem_cache_node for kmem_cache and
- * the other cache's with kmalloc allocated memory.
- * 6) Resize the head arrays of the kmalloc caches to their final sizes.
- */
-
- /* 1) create the kmem_cache */
-
- /*
- * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
- */
- create_boot_cache(kmem_cache, "kmem_cache",
- offsetof(struct kmem_cache, node) +
- nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN, 0, 0);
- list_add(&kmem_cache->list, &slab_caches);
- slab_state = PARTIAL;
-
- /*
- * Initialize the caches that provide memory for the kmem_cache_node
- * structures first. Without this, further allocations will bug.
- */
- new_kmalloc_cache(INDEX_NODE, KMALLOC_NORMAL, ARCH_KMALLOC_FLAGS);
- slab_state = PARTIAL_NODE;
- setup_kmalloc_cache_index_table();
-
- /* 5) Replace the bootstrap kmem_cache_node */
- {
- int nid;
-
- for_each_online_node(nid) {
- init_list(kmem_cache, &init_kmem_cache_node[CACHE_CACHE + nid], nid);
-
- init_list(kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE],
- &init_kmem_cache_node[SIZE_NODE + nid], nid);
- }
- }
-
- create_kmalloc_caches(ARCH_KMALLOC_FLAGS);
-}
-
-void __init kmem_cache_init_late(void)
-{
- struct kmem_cache *cachep;
-
- /* 6) resize the head arrays to their final sizes */
- mutex_lock(&slab_mutex);
- list_for_each_entry(cachep, &slab_caches, list)
- if (enable_cpucache(cachep, GFP_NOWAIT))
- BUG();
- mutex_unlock(&slab_mutex);
-
- /* Done! */
- slab_state = FULL;
-
-#ifdef CONFIG_NUMA
- /*
- * Register a memory hotplug callback that initializes and frees
- * node.
- */
- hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
-#endif
-
- /*
- * The reap timers are started later, with a module init call: That part
- * of the kernel is not yet operational.
- */
-}
-
-static int __init cpucache_init(void)
-{
- int ret;
-
- /*
- * Register the timers that return unneeded pages to the page allocator
- */
- ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "SLAB online",
- slab_online_cpu, slab_offline_cpu);
- WARN_ON(ret < 0);
-
- return 0;
-}
-__initcall(cpucache_init);
-
-static noinline void
-slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
-{
-#if DEBUG
- struct kmem_cache_node *n;
- unsigned long flags;
- int node;
- static DEFINE_RATELIMIT_STATE(slab_oom_rs, DEFAULT_RATELIMIT_INTERVAL,
- DEFAULT_RATELIMIT_BURST);
-
- if ((gfpflags & __GFP_NOWARN) || !__ratelimit(&slab_oom_rs))
- return;
-
- pr_warn("SLAB: Unable to allocate memory on node %d, gfp=%#x(%pGg)\n",
- nodeid, gfpflags, &gfpflags);
- pr_warn(" cache: %s, object size: %d, order: %d\n",
- cachep->name, cachep->size, cachep->gfporder);
-
- for_each_kmem_cache_node(cachep, node, n) {
- unsigned long total_slabs, free_slabs, free_objs;
-
- raw_spin_lock_irqsave(&n->list_lock, flags);
- total_slabs = n->total_slabs;
- free_slabs = n->free_slabs;
- free_objs = n->free_objects;
- raw_spin_unlock_irqrestore(&n->list_lock, flags);
-
- pr_warn(" node %d: slabs: %ld/%ld, objs: %ld/%ld\n",
- node, total_slabs - free_slabs, total_slabs,
- (total_slabs * cachep->num) - free_objs,
- total_slabs * cachep->num);
- }
-#endif
-}
-
-/*
- * Interface to system's page allocator. No need to hold the
- * kmem_cache_node ->list_lock.
- *
- * If we requested dmaable memory, we will get it. Even if we
- * did not request dmaable memory, we might get it, but that
- * would be relatively rare and ignorable.
- */
-static struct slab *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
- int nodeid)
-{
- struct folio *folio;
- struct slab *slab;
-
- flags |= cachep->allocflags;
-
- folio = (struct folio *) __alloc_pages_node(nodeid, flags, cachep->gfporder);
- if (!folio) {
- slab_out_of_memory(cachep, flags, nodeid);
- return NULL;
- }
-
- slab = folio_slab(folio);
-
- account_slab(slab, cachep->gfporder, cachep, flags);
- __folio_set_slab(folio);
- /* Make the flag visible before any changes to folio->mapping */
- smp_wmb();
- /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
- if (sk_memalloc_socks() && folio_is_pfmemalloc(folio))
- slab_set_pfmemalloc(slab);
-
- return slab;
-}
-
-/*
- * Interface to system's page release.
- */
-static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab)
-{
- int order = cachep->gfporder;
- struct folio *folio = slab_folio(slab);
-
- BUG_ON(!folio_test_slab(folio));
- __slab_clear_pfmemalloc(slab);
- page_mapcount_reset(&folio->page);
- folio->mapping = NULL;
- /* Make the mapping reset visible before clearing the flag */
- smp_wmb();
- __folio_clear_slab(folio);
-
- mm_account_reclaimed_pages(1 << order);
- unaccount_slab(slab, order, cachep);
- __free_pages(&folio->page, order);
-}
-
-static void kmem_rcu_free(struct rcu_head *head)
-{
- struct kmem_cache *cachep;
- struct slab *slab;
-
- slab = container_of(head, struct slab, rcu_head);
- cachep = slab->slab_cache;
-
- kmem_freepages(cachep, slab);
-}
-
-#if DEBUG
-static inline bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
-{
- return debug_pagealloc_enabled_static() && OFF_SLAB(cachep) &&
- ((cachep->size % PAGE_SIZE) == 0);
-}
-
-#ifdef CONFIG_DEBUG_PAGEALLOC
-static void slab_kernel_map(struct kmem_cache *cachep, void *objp, int map)
-{
- if (!is_debug_pagealloc_cache(cachep))
- return;
-
- __kernel_map_pages(virt_to_page(objp), cachep->size / PAGE_SIZE, map);
-}
-
-#else
-static inline void slab_kernel_map(struct kmem_cache *cachep, void *objp,
- int map) {}
-
-#endif
-
-static void poison_obj(struct kmem_cache *cachep, void *addr, unsigned char val)
-{
- int size = cachep->object_size;
- addr = &((char *)addr)[obj_offset(cachep)];
-
- memset(addr, val, size);
- *(unsigned char *)(addr + size - 1) = POISON_END;
-}
-
-static void dump_line(char *data, int offset, int limit)
-{
- int i;
- unsigned char error = 0;
- int bad_count = 0;
-
- pr_err("%03x: ", offset);
- for (i = 0; i < limit; i++) {
- if (data[offset + i] != POISON_FREE) {
- error = data[offset + i];
- bad_count++;
- }
- }
- print_hex_dump(KERN_CONT, "", 0, 16, 1,
- &data[offset], limit, 1);
-
- if (bad_count == 1) {
- error ^= POISON_FREE;
- if (!(error & (error - 1))) {
- pr_err("Single bit error detected. Probably bad RAM.\n");
-#ifdef CONFIG_X86
- pr_err("Run memtest86+ or a similar memory test tool.\n");
-#else
- pr_err("Run a memory test tool.\n");
-#endif
- }
- }
-}
-#endif
-
-#if DEBUG
-
-static void print_objinfo(struct kmem_cache *cachep, void *objp, int lines)
-{
- int i, size;
- char *realobj;
-
- if (cachep->flags & SLAB_RED_ZONE) {
- pr_err("Redzone: 0x%llx/0x%llx\n",
- *dbg_redzone1(cachep, objp),
- *dbg_redzone2(cachep, objp));
- }
-
- if (cachep->flags & SLAB_STORE_USER)
- pr_err("Last user: (%pSR)\n", *dbg_userword(cachep, objp));
- realobj = (char *)objp + obj_offset(cachep);
- size = cachep->object_size;
- for (i = 0; i < size && lines; i += 16, lines--) {
- int limit;
- limit = 16;
- if (i + limit > size)
- limit = size - i;
- dump_line(realobj, i, limit);
- }
-}
-
-static void check_poison_obj(struct kmem_cache *cachep, void *objp)
-{
- char *realobj;
- int size, i;
- int lines = 0;
-
- if (is_debug_pagealloc_cache(cachep))
- return;
-
- realobj = (char *)objp + obj_offset(cachep);
- size = cachep->object_size;
-
- for (i = 0; i < size; i++) {
- char exp = POISON_FREE;
- if (i == size - 1)
- exp = POISON_END;
- if (realobj[i] != exp) {
- int limit;
- /* Mismatch ! */
- /* Print header */
- if (lines == 0) {
- pr_err("Slab corruption (%s): %s start=%px, len=%d\n",
- print_tainted(), cachep->name,
- realobj, size);
- print_objinfo(cachep, objp, 0);
- }
- /* Hexdump the affected line */
- i = (i / 16) * 16;
- limit = 16;
- if (i + limit > size)
- limit = size - i;
- dump_line(realobj, i, limit);
- i += 16;
- lines++;
- /* Limit to 5 lines */
- if (lines > 5)
- break;
- }
- }
- if (lines != 0) {
- /* Print some data about the neighboring objects, if they
- * exist:
- */
- struct slab *slab = virt_to_slab(objp);
- unsigned int objnr;
-
- objnr = obj_to_index(cachep, slab, objp);
- if (objnr) {
- objp = index_to_obj(cachep, slab, objnr - 1);
- realobj = (char *)objp + obj_offset(cachep);
- pr_err("Prev obj: start=%px, len=%d\n", realobj, size);
- print_objinfo(cachep, objp, 2);
- }
- if (objnr + 1 < cachep->num) {
- objp = index_to_obj(cachep, slab, objnr + 1);
- realobj = (char *)objp + obj_offset(cachep);
- pr_err("Next obj: start=%px, len=%d\n", realobj, size);
- print_objinfo(cachep, objp, 2);
- }
- }
-}
-#endif
-
-#if DEBUG
-static void slab_destroy_debugcheck(struct kmem_cache *cachep,
- struct slab *slab)
-{
- int i;
-
- if (OBJFREELIST_SLAB(cachep) && cachep->flags & SLAB_POISON) {
- poison_obj(cachep, slab->freelist - obj_offset(cachep),
- POISON_FREE);
- }
-
- for (i = 0; i < cachep->num; i++) {
- void *objp = index_to_obj(cachep, slab, i);
-
- if (cachep->flags & SLAB_POISON) {
- check_poison_obj(cachep, objp);
- slab_kernel_map(cachep, objp, 1);
- }
- if (cachep->flags & SLAB_RED_ZONE) {
- if (*dbg_redzone1(cachep, objp) != RED_INACTIVE)
- slab_error(cachep, "start of a freed object was overwritten");
- if (*dbg_redzone2(cachep, objp) != RED_INACTIVE)
- slab_error(cachep, "end of a freed object was overwritten");
- }
- }
-}
-#else
-static void slab_destroy_debugcheck(struct kmem_cache *cachep,
- struct slab *slab)
-{
-}
-#endif
-
-/**
- * slab_destroy - destroy and release all objects in a slab
- * @cachep: cache pointer being destroyed
- * @slab: slab being destroyed
- *
- * Destroy all the objs in a slab, and release the mem back to the system.
- * Before calling the slab must have been unlinked from the cache. The
- * kmem_cache_node ->list_lock is not held/needed.
- */
-static void slab_destroy(struct kmem_cache *cachep, struct slab *slab)
-{
- void *freelist;
-
- freelist = slab->freelist;
- slab_destroy_debugcheck(cachep, slab);
- if (unlikely(cachep->flags & SLAB_TYPESAFE_BY_RCU))
- call_rcu(&slab->rcu_head, kmem_rcu_free);
- else
- kmem_freepages(cachep, slab);
-
- /*
- * From now on, we don't use freelist
- * although actual page can be freed in rcu context
- */
- if (OFF_SLAB(cachep))
- kfree(freelist);
-}
-
-/*
- * Update the size of the caches before calling slabs_destroy as it may
- * recursively call kfree.
- */
-static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
-{
- struct slab *slab, *n;
-
- list_for_each_entry_safe(slab, n, list, slab_list) {
- list_del(&slab->slab_list);
- slab_destroy(cachep, slab);
- }
-}
-
-/**
- * calculate_slab_order - calculate size (page order) of slabs
- * @cachep: pointer to the cache that is being created
- * @size: size of objects to be created in this cache.
- * @flags: slab allocation flags
- *
- * Also calculates the number of objects per slab.
- *
- * This could be made much more intelligent. For now, try to avoid using
- * high order pages for slabs. When the gfp() functions are more friendly
- * towards high-order requests, this should be changed.
- *
- * Return: number of left-over bytes in a slab
- */
-static size_t calculate_slab_order(struct kmem_cache *cachep,
- size_t size, slab_flags_t flags)
-{
- size_t left_over = 0;
- int gfporder;
-
- for (gfporder = 0; gfporder <= KMALLOC_MAX_ORDER; gfporder++) {
- unsigned int num;
- size_t remainder;
-
- num = cache_estimate(gfporder, size, flags, &remainder);
- if (!num)
- continue;
-
- /* Can't handle number of objects more than SLAB_OBJ_MAX_NUM */
- if (num > SLAB_OBJ_MAX_NUM)
- break;
-
- if (flags & CFLGS_OFF_SLAB) {
- struct kmem_cache *freelist_cache;
- size_t freelist_size;
- size_t freelist_cache_size;
-
- freelist_size = num * sizeof(freelist_idx_t);
- if (freelist_size > KMALLOC_MAX_CACHE_SIZE) {
- freelist_cache_size = PAGE_SIZE << get_order(freelist_size);
- } else {
- freelist_cache = kmalloc_slab(freelist_size, 0u, _RET_IP_);
- if (!freelist_cache)
- continue;
- freelist_cache_size = freelist_cache->size;
-
- /*
- * Needed to avoid possible looping condition
- * in cache_grow_begin()
- */
- if (OFF_SLAB(freelist_cache))
- continue;
- }
-
- /* check if off slab has enough benefit */
- if (freelist_cache_size > cachep->size / 2)
- continue;
- }
-
- /* Found something acceptable - save it away */
- cachep->num = num;
- cachep->gfporder = gfporder;
- left_over = remainder;
-
- /*
- * A VFS-reclaimable slab tends to have most allocations
- * as GFP_NOFS and we really don't want to have to be allocating
- * higher-order pages when we are unable to shrink dcache.
- */
- if (flags & SLAB_RECLAIM_ACCOUNT)
- break;
-
- /*
- * Large number of objects is good, but very large slabs are
- * currently bad for the gfp()s.
- */
- if (gfporder >= slab_max_order)
- break;
-
- /*
- * Acceptable internal fragmentation?
- */
- if (left_over * 8 <= (PAGE_SIZE << gfporder))
- break;
- }
- return left_over;
-}
-
-static struct array_cache __percpu *alloc_kmem_cache_cpus(
- struct kmem_cache *cachep, int entries, int batchcount)
-{
- int cpu;
- size_t size;
- struct array_cache __percpu *cpu_cache;
-
- size = sizeof(void *) * entries + sizeof(struct array_cache);
- cpu_cache = __alloc_percpu(size, sizeof(void *));
-
- if (!cpu_cache)
- return NULL;
-
- for_each_possible_cpu(cpu) {
- init_arraycache(per_cpu_ptr(cpu_cache, cpu),
- entries, batchcount);
- }
-
- return cpu_cache;
-}
-
-static int __ref setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
-{
- if (slab_state >= FULL)
- return enable_cpucache(cachep, gfp);
-
- cachep->cpu_cache = alloc_kmem_cache_cpus(cachep, 1, 1);
- if (!cachep->cpu_cache)
- return 1;
-
- if (slab_state == DOWN) {
- /* Creation of first cache (kmem_cache). */
- set_up_node(kmem_cache, CACHE_CACHE);
- } else if (slab_state == PARTIAL) {
- /* For kmem_cache_node */
- set_up_node(cachep, SIZE_NODE);
- } else {
- int node;
-
- for_each_online_node(node) {
- cachep->node[node] = kmalloc_node(
- sizeof(struct kmem_cache_node), gfp, node);
- BUG_ON(!cachep->node[node]);
- kmem_cache_node_init(cachep->node[node]);
- }
- }
-
- cachep->node[numa_mem_id()]->next_reap =
- jiffies + REAPTIMEOUT_NODE +
- ((unsigned long)cachep) % REAPTIMEOUT_NODE;
-
- cpu_cache_get(cachep)->avail = 0;
- cpu_cache_get(cachep)->limit = BOOT_CPUCACHE_ENTRIES;
- cpu_cache_get(cachep)->batchcount = 1;
- cpu_cache_get(cachep)->touched = 0;
- cachep->batchcount = 1;
- cachep->limit = BOOT_CPUCACHE_ENTRIES;
- return 0;
-}
-
-slab_flags_t kmem_cache_flags(unsigned int object_size,
- slab_flags_t flags, const char *name)
-{
- return flags;
-}
-
-struct kmem_cache *
-__kmem_cache_alias(const char *name, unsigned int size, unsigned int align,
- slab_flags_t flags, void (*ctor)(void *))
-{
- struct kmem_cache *cachep;
-
- cachep = find_mergeable(size, align, flags, name, ctor);
- if (cachep) {
- cachep->refcount++;
-
- /*
- * Adjust the object sizes so that we clear
- * the complete object on kzalloc.
- */
- cachep->object_size = max_t(int, cachep->object_size, size);
- }
- return cachep;
-}
-
-static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
- size_t size, slab_flags_t flags)
-{
- size_t left;
-
- cachep->num = 0;
-
- /*
- * If slab auto-initialization on free is enabled, store the freelist
- * off-slab, so that its contents don't end up in one of the allocated
- * objects.
- */
- if (unlikely(slab_want_init_on_free(cachep)))
- return false;
-
- if (cachep->ctor || flags & SLAB_TYPESAFE_BY_RCU)
- return false;
-
- left = calculate_slab_order(cachep, size,
- flags | CFLGS_OBJFREELIST_SLAB);
- if (!cachep->num)
- return false;
-
- if (cachep->num * sizeof(freelist_idx_t) > cachep->object_size)
- return false;
-
- cachep->colour = left / cachep->colour_off;
-
- return true;
-}
-
-static bool set_off_slab_cache(struct kmem_cache *cachep,
- size_t size, slab_flags_t flags)
-{
- size_t left;
-
- cachep->num = 0;
-
- /*
- * Always use on-slab management when SLAB_NOLEAKTRACE
- * to avoid recursive calls into kmemleak.
- */
- if (flags & SLAB_NOLEAKTRACE)
- return false;
-
- /*
- * Size is large, assume best to place the slab management obj
- * off-slab (should allow better packing of objs).
- */
- left = calculate_slab_order(cachep, size, flags | CFLGS_OFF_SLAB);
- if (!cachep->num)
- return false;
-
- /*
- * If the slab has been placed off-slab, and we have enough space then
- * move it on-slab. This is at the expense of any extra colouring.
- */
- if (left >= cachep->num * sizeof(freelist_idx_t))
- return false;
-
- cachep->colour = left / cachep->colour_off;
-
- return true;
-}
-
-static bool set_on_slab_cache(struct kmem_cache *cachep,
- size_t size, slab_flags_t flags)
-{
- size_t left;
-
- cachep->num = 0;
-
- left = calculate_slab_order(cachep, size, flags);
- if (!cachep->num)
- return false;
-
- cachep->colour = left / cachep->colour_off;
-
- return true;
-}
-
-/*
- * __kmem_cache_create - Create a cache.
- * @cachep: cache management descriptor
- * @flags: SLAB flags
- *
- * Returns zero on success, nonzero on failure.
- *
- * The flags are
- *
- * %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
- * to catch references to uninitialised memory.
- *
- * %SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check
- * for buffer overruns.
- *
- * %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
- * cacheline. This can be beneficial if you're counting cycles as closely
- * as davem.
- */
-int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
-{
- size_t ralign = BYTES_PER_WORD;
- gfp_t gfp;
- int err;
- unsigned int size = cachep->size;
-
-#if DEBUG
-#if FORCED_DEBUG
- /*
- * Enable redzoning and last user accounting, except for caches with
- * large objects, if the increased size would increase the object size
- * above the next power of two: caches with object sizes just above a
- * power of two have a significant amount of internal fragmentation.
- */
- if (size < 4096 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
- 2 * sizeof(unsigned long long)))
- flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
- if (!(flags & SLAB_TYPESAFE_BY_RCU))
- flags |= SLAB_POISON;
-#endif
-#endif
-
- /*
- * Check that size is in terms of words. This is needed to avoid
- * unaligned accesses for some archs when redzoning is used, and makes
- * sure any on-slab bufctl's are also correctly aligned.
- */
- size = ALIGN(size, BYTES_PER_WORD);
-
- if (flags & SLAB_RED_ZONE) {
- ralign = REDZONE_ALIGN;
- /* If redzoning, ensure that the second redzone is suitably
- * aligned, by adjusting the object size accordingly. */
- size = ALIGN(size, REDZONE_ALIGN);
- }
-
- /* 3) caller mandated alignment */
- if (ralign < cachep->align) {
- ralign = cachep->align;
- }
- /* disable debug if necessary */
- if (ralign > __alignof__(unsigned long long))
- flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
- /*
- * 4) Store it.
- */
- cachep->align = ralign;
- cachep->colour_off = cache_line_size();
- /* Offset must be a multiple of the alignment. */
- if (cachep->colour_off < cachep->align)
- cachep->colour_off = cachep->align;
-
- if (slab_is_available())
- gfp = GFP_KERNEL;
- else
- gfp = GFP_NOWAIT;
-
-#if DEBUG
-
- /*
- * Both debugging options require word-alignment which is calculated
- * into align above.
- */
- if (flags & SLAB_RED_ZONE) {
- /* add space for red zone words */
- cachep->obj_offset += sizeof(unsigned long long);
- size += 2 * sizeof(unsigned long long);
- }
- if (flags & SLAB_STORE_USER) {
- /* user store requires one word storage behind the end of
- * the real object. But if the second red zone needs to be
- * aligned to 64 bits, we must allow that much space.
- */
- if (flags & SLAB_RED_ZONE)
- size += REDZONE_ALIGN;
- else
- size += BYTES_PER_WORD;
- }
-#endif
-
- kasan_cache_create(cachep, &size, &flags);
-
- size = ALIGN(size, cachep->align);
- /*
- * We should restrict the number of objects in a slab to implement
- * byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
- */
- if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
- size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
-
-#if DEBUG
- /*
- * To activate debug pagealloc, off-slab management is necessary
- * requirement. In early phase of initialization, small sized slab
- * doesn't get initialized so it would not be possible. So, we need
- * to check size >= 256. It guarantees that all necessary small
- * sized slab is initialized in current slab initialization sequence.
- */
- if (debug_pagealloc_enabled_static() && (flags & SLAB_POISON) &&
- size >= 256 && cachep->object_size > cache_line_size()) {
- if (size < PAGE_SIZE || size % PAGE_SIZE == 0) {
- size_t tmp_size = ALIGN(size, PAGE_SIZE);
-
- if (set_off_slab_cache(cachep, tmp_size, flags)) {
- flags |= CFLGS_OFF_SLAB;
- cachep->obj_offset += tmp_size - size;
- size = tmp_size;
- goto done;
- }
- }
- }
-#endif
-
- if (set_objfreelist_slab_cache(cachep, size, flags)) {
- flags |= CFLGS_OBJFREELIST_SLAB;
- goto done;
- }
-
- if (set_off_slab_cache(cachep, size, flags)) {
- flags |= CFLGS_OFF_SLAB;
- goto done;
- }
-
- if (set_on_slab_cache(cachep, size, flags))
- goto done;
-
- return -E2BIG;
-
-done:
- cachep->freelist_size = cachep->num * sizeof(freelist_idx_t);
- cachep->flags = flags;
- cachep->allocflags = __GFP_COMP;
- if (flags & SLAB_CACHE_DMA)
- cachep->allocflags |= GFP_DMA;
- if (flags & SLAB_CACHE_DMA32)
- cachep->allocflags |= GFP_DMA32;
- if (flags & SLAB_RECLAIM_ACCOUNT)
- cachep->allocflags |= __GFP_RECLAIMABLE;
- cachep->size = size;
- cachep->reciprocal_buffer_size = reciprocal_value(size);
-
-#if DEBUG
- /*
- * If we're going to use the generic kernel_map_pages()
- * poisoning, then it's going to smash the contents of
- * the redzone and userword anyhow, so switch them off.
- */
- if (IS_ENABLED(CONFIG_PAGE_POISONING) &&
- (cachep->flags & SLAB_POISON) &&
- is_debug_pagealloc_cache(cachep))
- cachep->flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
-#endif
-
- err = setup_cpu_cache(cachep, gfp);
- if (err) {
- __kmem_cache_release(cachep);
- return err;
- }
-
- return 0;
-}
-
-#if DEBUG
-static void check_irq_off(void)
-{
- BUG_ON(!irqs_disabled());
-}
-
-static void check_irq_on(void)
-{
- BUG_ON(irqs_disabled());
-}
-
-static void check_mutex_acquired(void)
-{
- BUG_ON(!mutex_is_locked(&slab_mutex));
-}
-
-static void check_spinlock_acquired(struct kmem_cache *cachep)
-{
-#ifdef CONFIG_SMP
- check_irq_off();
- assert_raw_spin_locked(&get_node(cachep, numa_mem_id())->list_lock);
-#endif
-}
-
-static void check_spinlock_acquired_node(struct kmem_cache *cachep, int node)
-{
-#ifdef CONFIG_SMP
- check_irq_off();
- assert_raw_spin_locked(&get_node(cachep, node)->list_lock);
-#endif
-}
-
-#else
-#define check_irq_off() do { } while(0)
-#define check_irq_on() do { } while(0)
-#define check_mutex_acquired() do { } while(0)
-#define check_spinlock_acquired(x) do { } while(0)
-#define check_spinlock_acquired_node(x, y) do { } while(0)
-#endif
-
-static void drain_array_locked(struct kmem_cache *cachep, struct array_cache *ac,
- int node, bool free_all, struct list_head *list)
-{
- int tofree;
-
- if (!ac || !ac->avail)
- return;
-
- tofree = free_all ? ac->avail : (ac->limit + 4) / 5;
- if (tofree > ac->avail)
- tofree = (ac->avail + 1) / 2;
-
- free_block(cachep, ac->entry, tofree, node, list);
- ac->avail -= tofree;
- memmove(ac->entry, &(ac->entry[tofree]), sizeof(void *) * ac->avail);
-}
-
-static void do_drain(void *arg)
-{
- struct kmem_cache *cachep = arg;
- struct array_cache *ac;
- int node = numa_mem_id();
- struct kmem_cache_node *n;
- LIST_HEAD(list);
-
- check_irq_off();
- ac = cpu_cache_get(cachep);
- n = get_node(cachep, node);
- raw_spin_lock(&n->list_lock);
- free_block(cachep, ac->entry, ac->avail, node, &list);
- raw_spin_unlock(&n->list_lock);
- ac->avail = 0;
- slabs_destroy(cachep, &list);
-}
-
-static void drain_cpu_caches(struct kmem_cache *cachep)
-{
- struct kmem_cache_node *n;
- int node;
- LIST_HEAD(list);
-
- on_each_cpu(do_drain, cachep, 1);
- check_irq_on();
- for_each_kmem_cache_node(cachep, node, n)
- if (n->alien)
- drain_alien_cache(cachep, n->alien);
-
- for_each_kmem_cache_node(cachep, node, n) {
- raw_spin_lock_irq(&n->list_lock);
- drain_array_locked(cachep, n->shared, node, true, &list);
- raw_spin_unlock_irq(&n->list_lock);
-
- slabs_destroy(cachep, &list);
- }
-}
-
-/*
- * Remove slabs from the list of free slabs.
- * Specify the number of slabs to drain in tofree.
- *
- * Returns the actual number of slabs released.
- */
-static int drain_freelist(struct kmem_cache *cache,
- struct kmem_cache_node *n, int tofree)
-{
- struct list_head *p;
- int nr_freed;
- struct slab *slab;
-
- nr_freed = 0;
- while (nr_freed < tofree && !list_empty(&n->slabs_free)) {
-
- raw_spin_lock_irq(&n->list_lock);
- p = n->slabs_free.prev;
- if (p == &n->slabs_free) {
- raw_spin_unlock_irq(&n->list_lock);
- goto out;
- }
-
- slab = list_entry(p, struct slab, slab_list);
- list_del(&slab->slab_list);
- n->free_slabs--;
- n->total_slabs--;
- /*
- * Safe to drop the lock. The slab is no longer linked
- * to the cache.
- */
- n->free_objects -= cache->num;
- raw_spin_unlock_irq(&n->list_lock);
- slab_destroy(cache, slab);
- nr_freed++;
-
- cond_resched();
- }
-out:
- return nr_freed;
-}
-
-bool __kmem_cache_empty(struct kmem_cache *s)
-{
- int node;
- struct kmem_cache_node *n;
-
- for_each_kmem_cache_node(s, node, n)
- if (!list_empty(&n->slabs_full) ||
- !list_empty(&n->slabs_partial))
- return false;
- return true;
-}
-
-int __kmem_cache_shrink(struct kmem_cache *cachep)
-{
- int ret = 0;
- int node;
- struct kmem_cache_node *n;
-
- drain_cpu_caches(cachep);
-
- check_irq_on();
- for_each_kmem_cache_node(cachep, node, n) {
- drain_freelist(cachep, n, INT_MAX);
-
- ret += !list_empty(&n->slabs_full) ||
- !list_empty(&n->slabs_partial);
- }
- return (ret ? 1 : 0);
-}
-
-int __kmem_cache_shutdown(struct kmem_cache *cachep)
-{
- return __kmem_cache_shrink(cachep);
-}
-
-void __kmem_cache_release(struct kmem_cache *cachep)
-{
- int i;
- struct kmem_cache_node *n;
-
- cache_random_seq_destroy(cachep);
-
- free_percpu(cachep->cpu_cache);
-
- /* NUMA: free the node structures */
- for_each_kmem_cache_node(cachep, i, n) {
- kfree(n->shared);
- free_alien_cache(n->alien);
- kfree(n);
- cachep->node[i] = NULL;
- }
-}
-
-/*
- * Get the memory for a slab management obj.
- *
- * For a slab cache when the slab descriptor is off-slab, the
- * slab descriptor can't come from the same cache which is being created,
- * Because if it is the case, that means we defer the creation of
- * the kmalloc_{dma,}_cache of size sizeof(slab descriptor) to this point.
- * And we eventually call down to __kmem_cache_create(), which
- * in turn looks up in the kmalloc_{dma,}_caches for the desired-size one.
- * This is a "chicken-and-egg" problem.
- *
- * So the off-slab slab descriptor shall come from the kmalloc_{dma,}_caches,
- * which are all initialized during kmem_cache_init().
- */
-static void *alloc_slabmgmt(struct kmem_cache *cachep,
- struct slab *slab, int colour_off,
- gfp_t local_flags, int nodeid)
-{
- void *freelist;
- void *addr = slab_address(slab);
-
- slab->s_mem = addr + colour_off;
- slab->active = 0;
-
- if (OBJFREELIST_SLAB(cachep))
- freelist = NULL;
- else if (OFF_SLAB(cachep)) {
- /* Slab management obj is off-slab. */
- freelist = kmalloc_node(cachep->freelist_size,
- local_flags, nodeid);
- } else {
- /* We will use last bytes at the slab for freelist */
- freelist = addr + (PAGE_SIZE << cachep->gfporder) -
- cachep->freelist_size;
- }
-
- return freelist;
-}
-
-static inline freelist_idx_t get_free_obj(struct slab *slab, unsigned int idx)
-{
- return ((freelist_idx_t *) slab->freelist)[idx];
-}
-
-static inline void set_free_obj(struct slab *slab,
- unsigned int idx, freelist_idx_t val)
-{
- ((freelist_idx_t *)(slab->freelist))[idx] = val;
-}
-
-static void cache_init_objs_debug(struct kmem_cache *cachep, struct slab *slab)
-{
-#if DEBUG
- int i;
-
- for (i = 0; i < cachep->num; i++) {
- void *objp = index_to_obj(cachep, slab, i);
-
- if (cachep->flags & SLAB_STORE_USER)
- *dbg_userword(cachep, objp) = NULL;
-
- if (cachep->flags & SLAB_RED_ZONE) {
- *dbg_redzone1(cachep, objp) = RED_INACTIVE;
- *dbg_redzone2(cachep, objp) = RED_INACTIVE;
- }
- /*
- * Constructors are not allowed to allocate memory from the same
- * cache which they are a constructor for. Otherwise, deadlock.
- * They must also be threaded.
- */
- if (cachep->ctor && !(cachep->flags & SLAB_POISON)) {
- kasan_unpoison_object_data(cachep,
- objp + obj_offset(cachep));
- cachep->ctor(objp + obj_offset(cachep));
- kasan_poison_object_data(
- cachep, objp + obj_offset(cachep));
- }
-
- if (cachep->flags & SLAB_RED_ZONE) {
- if (*dbg_redzone2(cachep, objp) != RED_INACTIVE)
- slab_error(cachep, "constructor overwrote the end of an object");
- if (*dbg_redzone1(cachep, objp) != RED_INACTIVE)
- slab_error(cachep, "constructor overwrote the start of an object");
- }
- /* need to poison the objs? */
- if (cachep->flags & SLAB_POISON) {
- poison_obj(cachep, objp, POISON_FREE);
- slab_kernel_map(cachep, objp, 0);
- }
- }
-#endif
-}
-
-#ifdef CONFIG_SLAB_FREELIST_RANDOM
-/* Hold information during a freelist initialization */
-struct freelist_init_state {
- unsigned int pos;
- unsigned int *list;
- unsigned int count;
-};
-
-/*
- * Initialize the state based on the randomization method available.
- * return true if the pre-computed list is available, false otherwise.
- */
-static bool freelist_state_initialize(struct freelist_init_state *state,
- struct kmem_cache *cachep,
- unsigned int count)
-{
- bool ret;
- if (!cachep->random_seq) {
- ret = false;
- } else {
- state->list = cachep->random_seq;
- state->count = count;
- state->pos = get_random_u32_below(count);
- ret = true;
- }
- return ret;
-}
-
-/* Get the next entry on the list and randomize it using a random shift */
-static freelist_idx_t next_random_slot(struct freelist_init_state *state)
-{
- if (state->pos >= state->count)
- state->pos = 0;
- return state->list[state->pos++];
-}
-
-/* Swap two freelist entries */
-static void swap_free_obj(struct slab *slab, unsigned int a, unsigned int b)
-{
- swap(((freelist_idx_t *) slab->freelist)[a],
- ((freelist_idx_t *) slab->freelist)[b]);
-}
-
-/*
- * Shuffle the freelist initialization state based on pre-computed lists.
- * return true if the list was successfully shuffled, false otherwise.
- */
-static bool shuffle_freelist(struct kmem_cache *cachep, struct slab *slab)
-{
- unsigned int objfreelist = 0, i, rand, count = cachep->num;
- struct freelist_init_state state;
- bool precomputed;
-
- if (count < 2)
- return false;
-
- precomputed = freelist_state_initialize(&state, cachep, count);
-
- /* Take a random entry as the objfreelist */
- if (OBJFREELIST_SLAB(cachep)) {
- if (!precomputed)
- objfreelist = count - 1;
- else
- objfreelist = next_random_slot(&state);
- slab->freelist = index_to_obj(cachep, slab, objfreelist) +
- obj_offset(cachep);
- count--;
- }
-
- /*
- * On early boot, generate the list dynamically.
- * Later use a pre-computed list for speed.
- */
- if (!precomputed) {
- for (i = 0; i < count; i++)
- set_free_obj(slab, i, i);
-
- /* Fisher-Yates shuffle */
- for (i = count - 1; i > 0; i--) {
- rand = get_random_u32_below(i + 1);
- swap_free_obj(slab, i, rand);
- }
- } else {
- for (i = 0; i < count; i++)
- set_free_obj(slab, i, next_random_slot(&state));
- }
-
- if (OBJFREELIST_SLAB(cachep))
- set_free_obj(slab, cachep->num - 1, objfreelist);
-
- return true;
-}
-#else
-static inline bool shuffle_freelist(struct kmem_cache *cachep,
- struct slab *slab)
-{
- return false;
-}
-#endif /* CONFIG_SLAB_FREELIST_RANDOM */
-
-static void cache_init_objs(struct kmem_cache *cachep,
- struct slab *slab)
-{
- int i;
- void *objp;
- bool shuffled;
-
- cache_init_objs_debug(cachep, slab);
-
- /* Try to randomize the freelist if enabled */
- shuffled = shuffle_freelist(cachep, slab);
-
- if (!shuffled && OBJFREELIST_SLAB(cachep)) {
- slab->freelist = index_to_obj(cachep, slab, cachep->num - 1) +
- obj_offset(cachep);
- }
-
- for (i = 0; i < cachep->num; i++) {
- objp = index_to_obj(cachep, slab, i);
- objp = kasan_init_slab_obj(cachep, objp);
-
- /* constructor could break poison info */
- if (DEBUG == 0 && cachep->ctor) {
- kasan_unpoison_object_data(cachep, objp);
- cachep->ctor(objp);
- kasan_poison_object_data(cachep, objp);
- }
-
- if (!shuffled)
- set_free_obj(slab, i, i);
- }
-}
-
-static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slab)
-{
- void *objp;
-
- objp = index_to_obj(cachep, slab, get_free_obj(slab, slab->active));
- slab->active++;
-
- return objp;
-}
-
-static void slab_put_obj(struct kmem_cache *cachep,
- struct slab *slab, void *objp)
-{
- unsigned int objnr = obj_to_index(cachep, slab, objp);
-#if DEBUG
- unsigned int i;
-
- /* Verify double free bug */
- for (i = slab->active; i < cachep->num; i++) {
- if (get_free_obj(slab, i) == objnr) {
- pr_err("slab: double free detected in cache '%s', objp %px\n",
- cachep->name, objp);
- BUG();
- }
- }
-#endif
- slab->active--;
- if (!slab->freelist)
- slab->freelist = objp + obj_offset(cachep);
-
- set_free_obj(slab, slab->active, objnr);
-}
-
-/*
- * Grow (by 1) the number of slabs within a cache. This is called by
- * kmem_cache_alloc() when there are no active objs left in a cache.
- */
-static struct slab *cache_grow_begin(struct kmem_cache *cachep,
- gfp_t flags, int nodeid)
-{
- void *freelist;
- size_t offset;
- gfp_t local_flags;
- int slab_node;
- struct kmem_cache_node *n;
- struct slab *slab;
-
- /*
- * Be lazy and only check for valid flags here, keeping it out of the
- * critical path in kmem_cache_alloc().
- */
- if (unlikely(flags & GFP_SLAB_BUG_MASK))
- flags = kmalloc_fix_flags(flags);
-
- WARN_ON_ONCE(cachep->ctor && (flags & __GFP_ZERO));
- local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
-
- check_irq_off();
- if (gfpflags_allow_blocking(local_flags))
- local_irq_enable();
-
- /*
- * Get mem for the objs. Attempt to allocate a physical page from
- * 'nodeid'.
- */
- slab = kmem_getpages(cachep, local_flags, nodeid);
- if (!slab)
- goto failed;
-
- slab_node = slab_nid(slab);
- n = get_node(cachep, slab_node);
-
- /* Get colour for the slab, and cal the next value. */
- n->colour_next++;
- if (n->colour_next >= cachep->colour)
- n->colour_next = 0;
-
- offset = n->colour_next;
- if (offset >= cachep->colour)
- offset = 0;
-
- offset *= cachep->colour_off;
-
- /*
- * Call kasan_poison_slab() before calling alloc_slabmgmt(), so
- * page_address() in the latter returns a non-tagged pointer,
- * as it should be for slab pages.
- */
- kasan_poison_slab(slab);
-
- /* Get slab management. */
- freelist = alloc_slabmgmt(cachep, slab, offset,
- local_flags & ~GFP_CONSTRAINT_MASK, slab_node);
- if (OFF_SLAB(cachep) && !freelist)
- goto opps1;
-
- slab->slab_cache = cachep;
- slab->freelist = freelist;
-
- cache_init_objs(cachep, slab);
-
- if (gfpflags_allow_blocking(local_flags))
- local_irq_disable();
-
- return slab;
-
-opps1:
- kmem_freepages(cachep, slab);
-failed:
- if (gfpflags_allow_blocking(local_flags))
- local_irq_disable();
- return NULL;
-}
-
-static void cache_grow_end(struct kmem_cache *cachep, struct slab *slab)
-{
- struct kmem_cache_node *n;
- void *list = NULL;
-
- check_irq_off();
-
- if (!slab)
- return;
-
- INIT_LIST_HEAD(&slab->slab_list);
- n = get_node(cachep, slab_nid(slab));
-
- raw_spin_lock(&n->list_lock);
- n->total_slabs++;
- if (!slab->active) {
- list_add_tail(&slab->slab_list, &n->slabs_free);
- n->free_slabs++;
- } else
- fixup_slab_list(cachep, n, slab, &list);
-
- STATS_INC_GROWN(cachep);
- n->free_objects += cachep->num - slab->active;
- raw_spin_unlock(&n->list_lock);
-
- fixup_objfreelist_debug(cachep, &list);
-}
-
-#if DEBUG
-
-/*
- * Perform extra freeing checks:
- * - detect bad pointers.
- * - POISON/RED_ZONE checking
- */
-static void kfree_debugcheck(const void *objp)
-{
- if (!virt_addr_valid(objp)) {
- pr_err("kfree_debugcheck: out of range ptr %lxh\n",
- (unsigned long)objp);
- BUG();
- }
-}
-
-static inline void verify_redzone_free(struct kmem_cache *cache, void *obj)
-{
- unsigned long long redzone1, redzone2;
-
- redzone1 = *dbg_redzone1(cache, obj);
- redzone2 = *dbg_redzone2(cache, obj);
-
- /*
- * Redzone is ok.
- */
- if (redzone1 == RED_ACTIVE && redzone2 == RED_ACTIVE)
- return;
-
- if (redzone1 == RED_INACTIVE && redzone2 == RED_INACTIVE)
- slab_error(cache, "double free detected");
- else
- slab_error(cache, "memory outside object was overwritten");
-
- pr_err("%px: redzone 1:0x%llx, redzone 2:0x%llx\n",
- obj, redzone1, redzone2);
-}
-
-static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
- unsigned long caller)
-{
- unsigned int objnr;
- struct slab *slab;
-
- BUG_ON(virt_to_cache(objp) != cachep);
-
- objp -= obj_offset(cachep);
- kfree_debugcheck(objp);
- slab = virt_to_slab(objp);
-
- if (cachep->flags & SLAB_RED_ZONE) {
- verify_redzone_free(cachep, objp);
- *dbg_redzone1(cachep, objp) = RED_INACTIVE;
- *dbg_redzone2(cachep, objp) = RED_INACTIVE;
- }
- if (cachep->flags & SLAB_STORE_USER)
- *dbg_userword(cachep, objp) = (void *)caller;
-
- objnr = obj_to_index(cachep, slab, objp);
-
- BUG_ON(objnr >= cachep->num);
- BUG_ON(objp != index_to_obj(cachep, slab, objnr));
-
- if (cachep->flags & SLAB_POISON) {
- poison_obj(cachep, objp, POISON_FREE);
- slab_kernel_map(cachep, objp, 0);
- }
- return objp;
-}
-
-#else
-#define kfree_debugcheck(x) do { } while(0)
-#define cache_free_debugcheck(x, objp, z) (objp)
-#endif
-
-static inline void fixup_objfreelist_debug(struct kmem_cache *cachep,
- void **list)
-{
-#if DEBUG
- void *next = *list;
- void *objp;
-
- while (next) {
- objp = next - obj_offset(cachep);
- next = *(void **)next;
- poison_obj(cachep, objp, POISON_FREE);
- }
-#endif
-}
-
-static inline void fixup_slab_list(struct kmem_cache *cachep,
- struct kmem_cache_node *n, struct slab *slab,
- void **list)
-{
- /* move slabp to correct slabp list: */
- list_del(&slab->slab_list);
- if (slab->active == cachep->num) {
- list_add(&slab->slab_list, &n->slabs_full);
- if (OBJFREELIST_SLAB(cachep)) {
-#if DEBUG
- /* Poisoning will be done without holding the lock */
- if (cachep->flags & SLAB_POISON) {
- void **objp = slab->freelist;
-
- *objp = *list;
- *list = objp;
- }
-#endif
- slab->freelist = NULL;
- }
- } else
- list_add(&slab->slab_list, &n->slabs_partial);
-}
-
-/* Try to find non-pfmemalloc slab if needed */
-static noinline struct slab *get_valid_first_slab(struct kmem_cache_node *n,
- struct slab *slab, bool pfmemalloc)
-{
- if (!slab)
- return NULL;
-
- if (pfmemalloc)
- return slab;
-
- if (!slab_test_pfmemalloc(slab))
- return slab;
-
- /* No need to keep pfmemalloc slab if we have enough free objects */
- if (n->free_objects > n->free_limit) {
- slab_clear_pfmemalloc(slab);
- return slab;
- }
-
- /* Move pfmemalloc slab to the end of list to speed up next search */
- list_del(&slab->slab_list);
- if (!slab->active) {
- list_add_tail(&slab->slab_list, &n->slabs_free);
- n->free_slabs++;
- } else
- list_add_tail(&slab->slab_list, &n->slabs_partial);
-
- list_for_each_entry(slab, &n->slabs_partial, slab_list) {
- if (!slab_test_pfmemalloc(slab))
- return slab;
- }
-
- n->free_touched = 1;
- list_for_each_entry(slab, &n->slabs_free, slab_list) {
- if (!slab_test_pfmemalloc(slab)) {
- n->free_slabs--;
- return slab;
- }
- }
-
- return NULL;
-}
-
-static struct slab *get_first_slab(struct kmem_cache_node *n, bool pfmemalloc)
-{
- struct slab *slab;
-
- assert_raw_spin_locked(&n->list_lock);
- slab = list_first_entry_or_null(&n->slabs_partial, struct slab,
- slab_list);
- if (!slab) {
- n->free_touched = 1;
- slab = list_first_entry_or_null(&n->slabs_free, struct slab,
- slab_list);
- if (slab)
- n->free_slabs--;
- }
-
- if (sk_memalloc_socks())
- slab = get_valid_first_slab(n, slab, pfmemalloc);
-
- return slab;
-}
-
-static noinline void *cache_alloc_pfmemalloc(struct kmem_cache *cachep,
- struct kmem_cache_node *n, gfp_t flags)
-{
- struct slab *slab;
- void *obj;
- void *list = NULL;
-
- if (!gfp_pfmemalloc_allowed(flags))
- return NULL;
-
- raw_spin_lock(&n->list_lock);
- slab = get_first_slab(n, true);
- if (!slab) {
- raw_spin_unlock(&n->list_lock);
- return NULL;
- }
-
- obj = slab_get_obj(cachep, slab);
- n->free_objects--;
-
- fixup_slab_list(cachep, n, slab, &list);
-
- raw_spin_unlock(&n->list_lock);
- fixup_objfreelist_debug(cachep, &list);
-
- return obj;
-}
-
-/*
- * Slab list should be fixed up by fixup_slab_list() for existing slab
- * or cache_grow_end() for new slab
- */
-static __always_inline int alloc_block(struct kmem_cache *cachep,
- struct array_cache *ac, struct slab *slab, int batchcount)
-{
- /*
- * There must be at least one object available for
- * allocation.
- */
- BUG_ON(slab->active >= cachep->num);
-
- while (slab->active < cachep->num && batchcount--) {
- STATS_INC_ALLOCED(cachep);
- STATS_INC_ACTIVE(cachep);
- STATS_SET_HIGH(cachep);
-
- ac->entry[ac->avail++] = slab_get_obj(cachep, slab);
- }
-
- return batchcount;
-}
-
-static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags)
-{
- int batchcount;
- struct kmem_cache_node *n;
- struct array_cache *ac, *shared;
- int node;
- void *list = NULL;
- struct slab *slab;
-
- check_irq_off();
- node = numa_mem_id();
-
- ac = cpu_cache_get(cachep);
- batchcount = ac->batchcount;
- if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
- /*
- * If there was little recent activity on this cache, then
- * perform only a partial refill. Otherwise we could generate
- * refill bouncing.
- */
- batchcount = BATCHREFILL_LIMIT;
- }
- n = get_node(cachep, node);
-
- BUG_ON(ac->avail > 0 || !n);
- shared = READ_ONCE(n->shared);
- if (!n->free_objects && (!shared || !shared->avail))
- goto direct_grow;
-
- raw_spin_lock(&n->list_lock);
- shared = READ_ONCE(n->shared);
-
- /* See if we can refill from the shared array */
- if (shared && transfer_objects(ac, shared, batchcount)) {
- shared->touched = 1;
- goto alloc_done;
- }
-
- while (batchcount > 0) {
- /* Get slab alloc is to come from. */
- slab = get_first_slab(n, false);
- if (!slab)
- goto must_grow;
-
- check_spinlock_acquired(cachep);
-
- batchcount = alloc_block(cachep, ac, slab, batchcount);
- fixup_slab_list(cachep, n, slab, &list);
- }
-
-must_grow:
- n->free_objects -= ac->avail;
-alloc_done:
- raw_spin_unlock(&n->list_lock);
- fixup_objfreelist_debug(cachep, &list);
-
-direct_grow:
- if (unlikely(!ac->avail)) {
- /* Check if we can use obj in pfmemalloc slab */
- if (sk_memalloc_socks()) {
- void *obj = cache_alloc_pfmemalloc(cachep, n, flags);
-
- if (obj)
- return obj;
- }
-
- slab = cache_grow_begin(cachep, gfp_exact_node(flags), node);
-
- /*
- * cache_grow_begin() can reenable interrupts,
- * then ac could change.
- */
- ac = cpu_cache_get(cachep);
- if (!ac->avail && slab)
- alloc_block(cachep, ac, slab, batchcount);
- cache_grow_end(cachep, slab);
-
- if (!ac->avail)
- return NULL;
- }
- ac->touched = 1;
-
- return ac->entry[--ac->avail];
-}
-
-#if DEBUG
-static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
- gfp_t flags, void *objp, unsigned long caller)
-{
- WARN_ON_ONCE(cachep->ctor && (flags & __GFP_ZERO));
- if (!objp || is_kfence_address(objp))
- return objp;
- if (cachep->flags & SLAB_POISON) {
- check_poison_obj(cachep, objp);
- slab_kernel_map(cachep, objp, 1);
- poison_obj(cachep, objp, POISON_INUSE);
- }
- if (cachep->flags & SLAB_STORE_USER)
- *dbg_userword(cachep, objp) = (void *)caller;
-
- if (cachep->flags & SLAB_RED_ZONE) {
- if (*dbg_redzone1(cachep, objp) != RED_INACTIVE ||
- *dbg_redzone2(cachep, objp) != RED_INACTIVE) {
- slab_error(cachep, "double free, or memory outside object was overwritten");
- pr_err("%px: redzone 1:0x%llx, redzone 2:0x%llx\n",
- objp, *dbg_redzone1(cachep, objp),
- *dbg_redzone2(cachep, objp));
- }
- *dbg_redzone1(cachep, objp) = RED_ACTIVE;
- *dbg_redzone2(cachep, objp) = RED_ACTIVE;
- }
-
- objp += obj_offset(cachep);
- if (cachep->ctor && cachep->flags & SLAB_POISON)
- cachep->ctor(objp);
- if ((unsigned long)objp & (arch_slab_minalign() - 1)) {
- pr_err("0x%px: not aligned to arch_slab_minalign()=%u\n", objp,
- arch_slab_minalign());
- }
- return objp;
-}
-#else
-#define cache_alloc_debugcheck_after(a, b, objp, d) (objp)
-#endif
-
-static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
-{
- void *objp;
- struct array_cache *ac;
-
- check_irq_off();
-
- ac = cpu_cache_get(cachep);
- if (likely(ac->avail)) {
- ac->touched = 1;
- objp = ac->entry[--ac->avail];
-
- STATS_INC_ALLOCHIT(cachep);
- goto out;
- }
-
- STATS_INC_ALLOCMISS(cachep);
- objp = cache_alloc_refill(cachep, flags);
- /*
- * the 'ac' may be updated by cache_alloc_refill(),
- * and kmemleak_erase() requires its correct value.
- */
- ac = cpu_cache_get(cachep);
-
-out:
- /*
- * To avoid a false negative, if an object that is in one of the
- * per-CPU caches is leaked, we need to make sure kmemleak doesn't
- * treat the array pointers as a reference to the object.
- */
- if (objp)
- kmemleak_erase(&ac->entry[ac->avail]);
- return objp;
-}
-
-#ifdef CONFIG_NUMA
-static void *____cache_alloc_node(struct kmem_cache *, gfp_t, int);
-
-/*
- * Try allocating on another node if PFA_SPREAD_SLAB is a mempolicy is set.
- *
- * If we are in_interrupt, then process context, including cpusets and
- * mempolicy, may not apply and should not be used for allocation policy.
- */
-static void *alternate_node_alloc(struct kmem_cache *cachep, gfp_t flags)
-{
- int nid_alloc, nid_here;
-
- if (in_interrupt() || (flags & __GFP_THISNODE))
- return NULL;
- nid_alloc = nid_here = numa_mem_id();
- if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
- nid_alloc = cpuset_slab_spread_node();
- else if (current->mempolicy)
- nid_alloc = mempolicy_slab_node();
- if (nid_alloc != nid_here)
- return ____cache_alloc_node(cachep, flags, nid_alloc);
- return NULL;
-}
-
-/*
- * Fallback function if there was no memory available and no objects on a
- * certain node and fall back is permitted. First we scan all the
- * available node for available objects. If that fails then we
- * perform an allocation without specifying a node. This allows the page
- * allocator to do its reclaim / fallback magic. We then insert the
- * slab into the proper nodelist and then allocate from it.
- */
-static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
-{
- struct zonelist *zonelist;
- struct zoneref *z;
- struct zone *zone;
- enum zone_type highest_zoneidx = gfp_zone(flags);
- void *obj = NULL;
- struct slab *slab;
- int nid;
- unsigned int cpuset_mems_cookie;
-
- if (flags & __GFP_THISNODE)
- return NULL;
-
-retry_cpuset:
- cpuset_mems_cookie = read_mems_allowed_begin();
- zonelist = node_zonelist(mempolicy_slab_node(), flags);
-
-retry:
- /*
- * Look through allowed nodes for objects available
- * from existing per node queues.
- */
- for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) {
- nid = zone_to_nid(zone);
-
- if (cpuset_zone_allowed(zone, flags) &&
- get_node(cache, nid) &&
- get_node(cache, nid)->free_objects) {
- obj = ____cache_alloc_node(cache,
- gfp_exact_node(flags), nid);
- if (obj)
- break;
- }
- }
-
- if (!obj) {
- /*
- * This allocation will be performed within the constraints
- * of the current cpuset / memory policy requirements.
- * We may trigger various forms of reclaim on the allowed
- * set and go into memory reserves if necessary.
- */
- slab = cache_grow_begin(cache, flags, numa_mem_id());
- cache_grow_end(cache, slab);
- if (slab) {
- nid = slab_nid(slab);
- obj = ____cache_alloc_node(cache,
- gfp_exact_node(flags), nid);
-
- /*
- * Another processor may allocate the objects in
- * the slab since we are not holding any locks.
- */
- if (!obj)
- goto retry;
- }
- }
-
- if (unlikely(!obj && read_mems_allowed_retry(cpuset_mems_cookie)))
- goto retry_cpuset;
- return obj;
-}
-
-/*
- * An interface to enable slab creation on nodeid
- */
-static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
- int nodeid)
-{
- struct slab *slab;
- struct kmem_cache_node *n;
- void *obj = NULL;
- void *list = NULL;
-
- VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
- n = get_node(cachep, nodeid);
- BUG_ON(!n);
-
- check_irq_off();
- raw_spin_lock(&n->list_lock);
- slab = get_first_slab(n, false);
- if (!slab)
- goto must_grow;
-
- check_spinlock_acquired_node(cachep, nodeid);
-
- STATS_INC_NODEALLOCS(cachep);
- STATS_INC_ACTIVE(cachep);
- STATS_SET_HIGH(cachep);
-
- BUG_ON(slab->active == cachep->num);
-
- obj = slab_get_obj(cachep, slab);
- n->free_objects--;
-
- fixup_slab_list(cachep, n, slab, &list);
-
- raw_spin_unlock(&n->list_lock);
- fixup_objfreelist_debug(cachep, &list);
- return obj;
-
-must_grow:
- raw_spin_unlock(&n->list_lock);
- slab = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid);
- if (slab) {
- /* This slab isn't counted yet so don't update free_objects */
- obj = slab_get_obj(cachep, slab);
- }
- cache_grow_end(cachep, slab);
-
- return obj ? obj : fallback_alloc(cachep, flags);
-}
-
-static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
-{
- void *objp = NULL;
- int slab_node = numa_mem_id();
-
- if (nodeid == NUMA_NO_NODE) {
- if (current->mempolicy || cpuset_do_slab_mem_spread()) {
- objp = alternate_node_alloc(cachep, flags);
- if (objp)
- goto out;
- }
- /*
- * Use the locally cached objects if possible.
- * However ____cache_alloc does not allow fallback
- * to other nodes. It may fail while we still have
- * objects on other nodes available.
- */
- objp = ____cache_alloc(cachep, flags);
- nodeid = slab_node;
- } else if (nodeid == slab_node) {
- objp = ____cache_alloc(cachep, flags);
- } else if (!get_node(cachep, nodeid)) {
- /* Node not bootstrapped yet */
- objp = fallback_alloc(cachep, flags);
- goto out;
- }
-
- /*
- * We may just have run out of memory on the local node.
- * ____cache_alloc_node() knows how to locate memory on other nodes
- */
- if (!objp)
- objp = ____cache_alloc_node(cachep, flags, nodeid);
-out:
- return objp;
-}
-#else
-
-static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid __maybe_unused)
-{
- return ____cache_alloc(cachep, flags);
-}
-
-#endif /* CONFIG_NUMA */
-
-static __always_inline void *
-slab_alloc_node(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
- int nodeid, size_t orig_size, unsigned long caller)
-{
- unsigned long save_flags;
- void *objp;
- struct obj_cgroup *objcg = NULL;
- bool init = false;
-
- flags &= gfp_allowed_mask;
- cachep = slab_pre_alloc_hook(cachep, lru, &objcg, 1, flags);
- if (unlikely(!cachep))
- return NULL;
-
- objp = kfence_alloc(cachep, orig_size, flags);
- if (unlikely(objp))
- goto out;
-
- local_irq_save(save_flags);
- objp = __do_cache_alloc(cachep, flags, nodeid);
- local_irq_restore(save_flags);
- objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
- prefetchw(objp);
- init = slab_want_init_on_alloc(flags, cachep);
-
-out:
- slab_post_alloc_hook(cachep, objcg, flags, 1, &objp, init,
- cachep->object_size);
- return objp;
-}
-
-static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
- size_t orig_size, unsigned long caller)
-{
- return slab_alloc_node(cachep, lru, flags, NUMA_NO_NODE, orig_size,
- caller);
-}
-
-/*
- * Caller needs to acquire correct kmem_cache_node's list_lock
- * @list: List of detached free slabs should be freed by caller
- */
-static void free_block(struct kmem_cache *cachep, void **objpp,
- int nr_objects, int node, struct list_head *list)
-{
- int i;
- struct kmem_cache_node *n = get_node(cachep, node);
- struct slab *slab;
-
- n->free_objects += nr_objects;
-
- for (i = 0; i < nr_objects; i++) {
- void *objp;
- struct slab *slab;
-
- objp = objpp[i];
-
- slab = virt_to_slab(objp);
- list_del(&slab->slab_list);
- check_spinlock_acquired_node(cachep, node);
- slab_put_obj(cachep, slab, objp);
- STATS_DEC_ACTIVE(cachep);
-
- /* fixup slab chains */
- if (slab->active == 0) {
- list_add(&slab->slab_list, &n->slabs_free);
- n->free_slabs++;
- } else {
- /* Unconditionally move a slab to the end of the
- * partial list on free - maximum time for the
- * other objects to be freed, too.
- */
- list_add_tail(&slab->slab_list, &n->slabs_partial);
- }
- }
-
- while (n->free_objects > n->free_limit && !list_empty(&n->slabs_free)) {
- n->free_objects -= cachep->num;
-
- slab = list_last_entry(&n->slabs_free, struct slab, slab_list);
- list_move(&slab->slab_list, list);
- n->free_slabs--;
- n->total_slabs--;
- }
-}
-
-static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
-{
- int batchcount;
- struct kmem_cache_node *n;
- int node = numa_mem_id();
- LIST_HEAD(list);
-
- batchcount = ac->batchcount;
-
- check_irq_off();
- n = get_node(cachep, node);
- raw_spin_lock(&n->list_lock);
- if (n->shared) {
- struct array_cache *shared_array = n->shared;
- int max = shared_array->limit - shared_array->avail;
- if (max) {
- if (batchcount > max)
- batchcount = max;
- memcpy(&(shared_array->entry[shared_array->avail]),
- ac->entry, sizeof(void *) * batchcount);
- shared_array->avail += batchcount;
- goto free_done;
- }
- }
-
- free_block(cachep, ac->entry, batchcount, node, &list);
-free_done:
-#if STATS
- {
- int i = 0;
- struct slab *slab;
-
- list_for_each_entry(slab, &n->slabs_free, slab_list) {
- BUG_ON(slab->active);
-
- i++;
- }
- STATS_SET_FREEABLE(cachep, i);
- }
-#endif
- raw_spin_unlock(&n->list_lock);
- ac->avail -= batchcount;
- memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
- slabs_destroy(cachep, &list);
-}
-
-/*
- * Release an obj back to its cache. If the obj has a constructed state, it must
- * be in this state _before_ it is released. Called with disabled ints.
- */
-static __always_inline void __cache_free(struct kmem_cache *cachep, void *objp,
- unsigned long caller)
-{
- bool init;
-
- memcg_slab_free_hook(cachep, virt_to_slab(objp), &objp, 1);
-
- if (is_kfence_address(objp)) {
- kmemleak_free_recursive(objp, cachep->flags);
- __kfence_free(objp);
- return;
- }
-
- /*
- * As memory initialization might be integrated into KASAN,
- * kasan_slab_free and initialization memset must be
- * kept together to avoid discrepancies in behavior.
- */
- init = slab_want_init_on_free(cachep);
- if (init && !kasan_has_integrated_init())
- memset(objp, 0, cachep->object_size);
- /* KASAN might put objp into memory quarantine, delaying its reuse. */
- if (kasan_slab_free(cachep, objp, init))
- return;
-
- /* Use KCSAN to help debug racy use-after-free. */
- if (!(cachep->flags & SLAB_TYPESAFE_BY_RCU))
- __kcsan_check_access(objp, cachep->object_size,
- KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT);
-
- ___cache_free(cachep, objp, caller);
-}
-
-void ___cache_free(struct kmem_cache *cachep, void *objp,
- unsigned long caller)
-{
- struct array_cache *ac = cpu_cache_get(cachep);
-
- check_irq_off();
- kmemleak_free_recursive(objp, cachep->flags);
- objp = cache_free_debugcheck(cachep, objp, caller);
-
- /*
- * Skip calling cache_free_alien() when the platform is not numa.
- * This will avoid cache misses that happen while accessing slabp (which
- * is per page memory reference) to get nodeid. Instead use a global
- * variable to skip the call, which is mostly likely to be present in
- * the cache.
- */
- if (nr_online_nodes > 1 && cache_free_alien(cachep, objp))
- return;
-
- if (ac->avail < ac->limit) {
- STATS_INC_FREEHIT(cachep);
- } else {
- STATS_INC_FREEMISS(cachep);
- cache_flusharray(cachep, ac);
- }
-
- if (sk_memalloc_socks()) {
- struct slab *slab = virt_to_slab(objp);
-
- if (unlikely(slab_test_pfmemalloc(slab))) {
- cache_free_pfmemalloc(cachep, slab, objp);
- return;
- }
- }
-
- __free_one(ac, objp);
-}
-
-static __always_inline
-void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
- gfp_t flags)
-{
- void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
-
- trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, NUMA_NO_NODE);
-
- return ret;
-}
-
-void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
-{
- return __kmem_cache_alloc_lru(cachep, NULL, flags);
-}
-EXPORT_SYMBOL(kmem_cache_alloc);
-
-void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
- gfp_t flags)
-{
- return __kmem_cache_alloc_lru(cachep, lru, flags);
-}
-EXPORT_SYMBOL(kmem_cache_alloc_lru);
-
-static __always_inline void
-cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags,
- size_t size, void **p, unsigned long caller)
-{
- size_t i;
-
- for (i = 0; i < size; i++)
- p[i] = cache_alloc_debugcheck_after(s, flags, p[i], caller);
-}
-
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
- void **p)
-{
- struct obj_cgroup *objcg = NULL;
- unsigned long irqflags;
- size_t i;
-
- s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
- if (!s)
- return 0;
-
- local_irq_save(irqflags);
- for (i = 0; i < size; i++) {
- void *objp = kfence_alloc(s, s->object_size, flags) ?:
- __do_cache_alloc(s, flags, NUMA_NO_NODE);
-
- if (unlikely(!objp))
- goto error;
- p[i] = objp;
- }
- local_irq_restore(irqflags);
-
- cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
-
- /*
- * memcg and kmem_cache debug support and memory initialization.
- * Done outside of the IRQ disabled section.
- */
- slab_post_alloc_hook(s, objcg, flags, size, p,
- slab_want_init_on_alloc(flags, s), s->object_size);
- /* FIXME: Trace call missing. Christoph would like a bulk variant */
- return size;
-error:
- local_irq_restore(irqflags);
- cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_);
- slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size);
- kmem_cache_free_bulk(s, i, p);
- return 0;
-}
-EXPORT_SYMBOL(kmem_cache_alloc_bulk);
-
-void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
-{
- void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
-
- trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);
-
- return ret;
-}
-EXPORT_SYMBOL(kmem_cache_alloc_node);
-
-void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
- int nodeid, size_t orig_size,
- unsigned long caller)
-{
- return slab_alloc_node(cachep, NULL, flags, nodeid,
- orig_size, caller);
-}
-
-#ifdef CONFIG_PRINTK
-void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
-{
- struct kmem_cache *cachep;
- unsigned int objnr;
- void *objp;
-
- kpp->kp_ptr = object;
- kpp->kp_slab = slab;
- cachep = slab->slab_cache;
- kpp->kp_slab_cache = cachep;
- objp = object - obj_offset(cachep);
- kpp->kp_data_offset = obj_offset(cachep);
- slab = virt_to_slab(objp);
- objnr = obj_to_index(cachep, slab, objp);
- objp = index_to_obj(cachep, slab, objnr);
- kpp->kp_objp = objp;
- if (DEBUG && cachep->flags & SLAB_STORE_USER)
- kpp->kp_ret = *dbg_userword(cachep, objp);
-}
-#endif
-
-static __always_inline
-void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
- unsigned long caller)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- debug_check_no_locks_freed(objp, cachep->object_size);
- if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
- debug_check_no_obj_freed(objp, cachep->object_size);
- __cache_free(cachep, objp, caller);
- local_irq_restore(flags);
-}
-
-void __kmem_cache_free(struct kmem_cache *cachep, void *objp,
- unsigned long caller)
-{
- __do_kmem_cache_free(cachep, objp, caller);
-}
-
-void kmem_cache_free(struct kmem_cache *cachep, void *objp)
-{
- cachep = cache_from_obj(cachep, objp);
- if (!cachep)
- return;
-
- trace_kmem_cache_free(_RET_IP_, objp, cachep);
- __do_kmem_cache_free(cachep, objp, _RET_IP_);
-}
-EXPORT_SYMBOL(kmem_cache_free);
-
-void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- for (int i = 0; i < size; i++) {
- void *objp = p[i];
- struct kmem_cache *s;
-
- if (!orig_s) {
- struct folio *folio = virt_to_folio(objp);
-
- /* called via kfree_bulk */
- if (!folio_test_slab(folio)) {
- local_irq_restore(flags);
- free_large_kmalloc(folio, objp);
- local_irq_save(flags);
- continue;
- }
- s = folio_slab(folio)->slab_cache;
- } else {
- s = cache_from_obj(orig_s, objp);
- }
-
- if (!s)
- continue;
-
- debug_check_no_locks_freed(objp, s->object_size);
- if (!(s->flags & SLAB_DEBUG_OBJECTS))
- debug_check_no_obj_freed(objp, s->object_size);
-
- __cache_free(s, objp, _RET_IP_);
- }
- local_irq_restore(flags);
-
- /* FIXME: add tracing */
-}
-EXPORT_SYMBOL(kmem_cache_free_bulk);
-
-/*
- * This initializes kmem_cache_node or resizes various caches for all nodes.
- */
-static int setup_kmem_cache_nodes(struct kmem_cache *cachep, gfp_t gfp)
-{
- int ret;
- int node;
- struct kmem_cache_node *n;
-
- for_each_online_node(node) {
- ret = setup_kmem_cache_node(cachep, node, gfp, true);
- if (ret)
- goto fail;
-
- }
-
- return 0;
-
-fail:
- if (!cachep->list.next) {
- /* Cache is not active yet. Roll back what we did */
- node--;
- while (node >= 0) {
- n = get_node(cachep, node);
- if (n) {
- kfree(n->shared);
- free_alien_cache(n->alien);
- kfree(n);
- cachep->node[node] = NULL;
- }
- node--;
- }
- }
- return -ENOMEM;
-}
-
-/* Always called with the slab_mutex held */
-static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
- int batchcount, int shared, gfp_t gfp)
-{
- struct array_cache __percpu *cpu_cache, *prev;
- int cpu;
-
- cpu_cache = alloc_kmem_cache_cpus(cachep, limit, batchcount);
- if (!cpu_cache)
- return -ENOMEM;
-
- prev = cachep->cpu_cache;
- cachep->cpu_cache = cpu_cache;
- /*
- * Without a previous cpu_cache there's no need to synchronize remote
- * cpus, so skip the IPIs.
- */
- if (prev)
- kick_all_cpus_sync();
-
- check_irq_on();
- cachep->batchcount = batchcount;
- cachep->limit = limit;
- cachep->shared = shared;
-
- if (!prev)
- goto setup_node;
-
- for_each_online_cpu(cpu) {
- LIST_HEAD(list);
- int node;
- struct kmem_cache_node *n;
- struct array_cache *ac = per_cpu_ptr(prev, cpu);
-
- node = cpu_to_mem(cpu);
- n = get_node(cachep, node);
- raw_spin_lock_irq(&n->list_lock);
- free_block(cachep, ac->entry, ac->avail, node, &list);
- raw_spin_unlock_irq(&n->list_lock);
- slabs_destroy(cachep, &list);
- }
- free_percpu(prev);
-
-setup_node:
- return setup_kmem_cache_nodes(cachep, gfp);
-}
-
-/* Called with slab_mutex held always */
-static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
-{
- int err;
- int limit = 0;
- int shared = 0;
- int batchcount = 0;
-
- err = cache_random_seq_create(cachep, cachep->num, gfp);
- if (err)
- goto end;
-
- /*
- * The head array serves three purposes:
- * - create a LIFO ordering, i.e. return objects that are cache-warm
- * - reduce the number of spinlock operations.
- * - reduce the number of linked list operations on the slab and
- * bufctl chains: array operations are cheaper.
- * The numbers are guessed, we should auto-tune as described by
- * Bonwick.
- */
- if (cachep->size > 131072)
- limit = 1;
- else if (cachep->size > PAGE_SIZE)
- limit = 8;
- else if (cachep->size > 1024)
- limit = 24;
- else if (cachep->size > 256)
- limit = 54;
- else
- limit = 120;
-
- /*
- * CPU bound tasks (e.g. network routing) can exhibit cpu bound
- * allocation behaviour: Most allocs on one cpu, most free operations
- * on another cpu. For these cases, an efficient object passing between
- * cpus is necessary. This is provided by a shared array. The array
- * replaces Bonwick's magazine layer.
- * On uniprocessor, it's functionally equivalent (but less efficient)
- * to a larger limit. Thus disabled by default.
- */
- shared = 0;
- if (cachep->size <= PAGE_SIZE && num_possible_cpus() > 1)
- shared = 8;
-
-#if DEBUG
- /*
- * With debugging enabled, large batchcount lead to excessively long
- * periods with disabled local interrupts. Limit the batchcount
- */
- if (limit > 32)
- limit = 32;
-#endif
- batchcount = (limit + 1) / 2;
- err = do_tune_cpucache(cachep, limit, batchcount, shared, gfp);
-end:
- if (err)
- pr_err("enable_cpucache failed for %s, error %d\n",
- cachep->name, -err);
- return err;
-}
-
-/*
- * Drain an array if it contains any elements taking the node lock only if
- * necessary. Note that the node listlock also protects the array_cache
- * if drain_array() is used on the shared array.
- */
-static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n,
- struct array_cache *ac, int node)
-{
- LIST_HEAD(list);
-
- /* ac from n->shared can be freed if we don't hold the slab_mutex. */
- check_mutex_acquired();
-
- if (!ac || !ac->avail)
- return;
-
- if (ac->touched) {
- ac->touched = 0;
- return;
- }
-
- raw_spin_lock_irq(&n->list_lock);
- drain_array_locked(cachep, ac, node, false, &list);
- raw_spin_unlock_irq(&n->list_lock);
-
- slabs_destroy(cachep, &list);
-}
-
-/**
- * cache_reap - Reclaim memory from caches.
- * @w: work descriptor
- *
- * Called from workqueue/eventd every few seconds.
- * Purpose:
- * - clear the per-cpu caches for this CPU.
- * - return freeable pages to the main free memory pool.
- *
- * If we cannot acquire the cache chain mutex then just give up - we'll try
- * again on the next iteration.
- */
-static void cache_reap(struct work_struct *w)
-{
- struct kmem_cache *searchp;
- struct kmem_cache_node *n;
- int node = numa_mem_id();
- struct delayed_work *work = to_delayed_work(w);
-
- if (!mutex_trylock(&slab_mutex))
- /* Give up. Setup the next iteration. */
- goto out;
-
- list_for_each_entry(searchp, &slab_caches, list) {
- check_irq_on();
-
- /*
- * We only take the node lock if absolutely necessary and we
- * have established with reasonable certainty that
- * we can do some work if the lock was obtained.
- */
- n = get_node(searchp, node);
-
- reap_alien(searchp, n);
-
- drain_array(searchp, n, cpu_cache_get(searchp), node);
-
- /*
- * These are racy checks but it does not matter
- * if we skip one check or scan twice.
- */
- if (time_after(n->next_reap, jiffies))
- goto next;
-
- n->next_reap = jiffies + REAPTIMEOUT_NODE;
-
- drain_array(searchp, n, n->shared, node);
-
- if (n->free_touched)
- n->free_touched = 0;
- else {
- int freed;
-
- freed = drain_freelist(searchp, n, (n->free_limit +
- 5 * searchp->num - 1) / (5 * searchp->num));
- STATS_ADD_REAPED(searchp, freed);
- }
-next:
- cond_resched();
- }
- check_irq_on();
- mutex_unlock(&slab_mutex);
- next_reap_node();
-out:
- /* Set up the next iteration */
- schedule_delayed_work_on(smp_processor_id(), work,
- round_jiffies_relative(REAPTIMEOUT_AC));
-}
-
-void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
-{
- unsigned long active_objs, num_objs, active_slabs;
- unsigned long total_slabs = 0, free_objs = 0, shared_avail = 0;
- unsigned long free_slabs = 0;
- int node;
- struct kmem_cache_node *n;
-
- for_each_kmem_cache_node(cachep, node, n) {
- check_irq_on();
- raw_spin_lock_irq(&n->list_lock);
-
- total_slabs += n->total_slabs;
- free_slabs += n->free_slabs;
- free_objs += n->free_objects;
-
- if (n->shared)
- shared_avail += n->shared->avail;
-
- raw_spin_unlock_irq(&n->list_lock);
- }
- num_objs = total_slabs * cachep->num;
- active_slabs = total_slabs - free_slabs;
- active_objs = num_objs - free_objs;
-
- sinfo->active_objs = active_objs;
- sinfo->num_objs = num_objs;
- sinfo->active_slabs = active_slabs;
- sinfo->num_slabs = total_slabs;
- sinfo->shared_avail = shared_avail;
- sinfo->limit = cachep->limit;
- sinfo->batchcount = cachep->batchcount;
- sinfo->shared = cachep->shared;
- sinfo->objects_per_slab = cachep->num;
- sinfo->cache_order = cachep->gfporder;
-}
-
-void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
-{
-#if STATS
- { /* node stats */
- unsigned long high = cachep->high_mark;
- unsigned long allocs = cachep->num_allocations;
- unsigned long grown = cachep->grown;
- unsigned long reaped = cachep->reaped;
- unsigned long errors = cachep->errors;
- unsigned long max_freeable = cachep->max_freeable;
- unsigned long node_allocs = cachep->node_allocs;
- unsigned long node_frees = cachep->node_frees;
- unsigned long overflows = cachep->node_overflow;
-
- seq_printf(m, " : globalstat %7lu %6lu %5lu %4lu %4lu %4lu %4lu %4lu %4lu",
- allocs, high, grown,
- reaped, errors, max_freeable, node_allocs,
- node_frees, overflows);
- }
- /* cpu stats */
- {
- unsigned long allochit = atomic_read(&cachep->allochit);
- unsigned long allocmiss = atomic_read(&cachep->allocmiss);
- unsigned long freehit = atomic_read(&cachep->freehit);
- unsigned long freemiss = atomic_read(&cachep->freemiss);
-
- seq_printf(m, " : cpustat %6lu %6lu %6lu %6lu",
- allochit, allocmiss, freehit, freemiss);
- }
-#endif
-}
-
-#define MAX_SLABINFO_WRITE 128
-/**
- * slabinfo_write - Tuning for the slab allocator
- * @file: unused
- * @buffer: user buffer
- * @count: data length
- * @ppos: unused
- *
- * Return: %0 on success, negative error code otherwise.
- */
-ssize_t slabinfo_write(struct file *file, const char __user *buffer,
- size_t count, loff_t *ppos)
-{
- char kbuf[MAX_SLABINFO_WRITE + 1], *tmp;
- int limit, batchcount, shared, res;
- struct kmem_cache *cachep;
-
- if (count > MAX_SLABINFO_WRITE)
- return -EINVAL;
- if (copy_from_user(&kbuf, buffer, count))
- return -EFAULT;
- kbuf[MAX_SLABINFO_WRITE] = '\0';
-
- tmp = strchr(kbuf, ' ');
- if (!tmp)
- return -EINVAL;
- *tmp = '\0';
- tmp++;
- if (sscanf(tmp, " %d %d %d", &limit, &batchcount, &shared) != 3)
- return -EINVAL;
-
- /* Find the cache in the chain of caches. */
- mutex_lock(&slab_mutex);
- res = -EINVAL;
- list_for_each_entry(cachep, &slab_caches, list) {
- if (!strcmp(cachep->name, kbuf)) {
- if (limit < 1 || batchcount < 1 ||
- batchcount > limit || shared < 0) {
- res = 0;
- } else {
- res = do_tune_cpucache(cachep, limit,
- batchcount, shared,
- GFP_KERNEL);
- }
- break;
- }
- }
- mutex_unlock(&slab_mutex);
- if (res >= 0)
- res = count;
- return res;
-}
-
-#ifdef CONFIG_HARDENED_USERCOPY
-/*
- * Rejects incorrectly sized objects and objects that are to be copied
- * to/from userspace but do not fall entirely within the containing slab
- * cache's usercopy region.
- *
- * Returns NULL if check passes, otherwise const char * to name of cache
- * to indicate an error.
- */
-void __check_heap_object(const void *ptr, unsigned long n,
- const struct slab *slab, bool to_user)
-{
- struct kmem_cache *cachep;
- unsigned int objnr;
- unsigned long offset;
-
- ptr = kasan_reset_tag(ptr);
-
- /* Find and validate object. */
- cachep = slab->slab_cache;
- objnr = obj_to_index(cachep, slab, (void *)ptr);
- BUG_ON(objnr >= cachep->num);
-
- /* Find offset within object. */
- if (is_kfence_address(ptr))
- offset = ptr - kfence_object_start(ptr);
- else
- offset = ptr - index_to_obj(cachep, slab, objnr) - obj_offset(cachep);
-
- /* Allow address range falling entirely within usercopy region. */
- if (offset >= cachep->useroffset &&
- offset - cachep->useroffset <= cachep->usersize &&
- n <= cachep->useroffset - offset + cachep->usersize)
- return;
-
- usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
-}
-#endif /* CONFIG_HARDENED_USERCOPY */

--
2.42.1

Subject: Re: [PATCH v2 09/21] mm/slab: remove mm/slab.c and slab_def.h

On Mon, 20 Nov 2023, Vlastimil Babka wrote:

> Remove the SLAB implementation. Update CREDITS.
> Also update and properly sort the SLOB entry there.
>
> RIP SLAB allocator (1996 - 2024)
>
> Reviewed-by: Kees Cook <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>

Acked-by: Christoph Lameter <[email protected]>

2023-11-24 00:45:30

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v2 00/21] remove the SLAB allocator

On Mon, 20 Nov 2023, Vlastimil Babka wrote:

> Changes from v1:
> - Added new Patch 01 to fix up kernel docs build (thanks Marco Elver)
> - Additional changes to Kconfig user visible texts in Patch 02 (thanks Kees
> Cook)
> - Whitespace fixes and other fixups (thanks Kees)
>
> The SLAB allocator has been deprecated since 6.5 and nobody has objected
> so far. As we agreed at LSF/MM, we should wait with the removal until
> the next LTS kernel is released. This is now determined to be 6.6, and
> we just missed 6.7, so now we can aim for 6.8 and start exposing the
> removal to linux-next during the 6.7 cycle. If nothing substantial pops
> up, will start including this in slab-next later this week.
>

I agree with the decision to remove the SLAB allocator, same as at LSF/MM.
Thanks for doing this, Vlastimil!

And thanks for deferring this until the next LTS kernel, it will give any
last minute hold outs a full year to raise any issues in their switch to
SLUB if they only only upgrade to LTS kernels at which point we'll have
done our due diligence to make people aware of SLAB's deprecation in 6.6.

I've completed testing on v1 of the series, so feel free to add

Acked-by: David Rientjes <[email protected]>
Tested-by: David Rientjes <[email protected]>

to each patch so I don't spam the list unnecessarily. I'll respond to
individual changes that were not in v1.

Thanks again!

> To keep the series reasonably sized and not pull in people from other
> subsystems than mm and closely related ones, I didn't attempt to remove
> every trace of unnecessary reference to dead config options in external
> areas, nor in the defconfigs. Such cleanups can be sent to and handled
> by respective maintainers after this is merged.
>
> Instead I have added some patches aimed to reap some immediate benefits
> of the removal, mainly by not having to split some fastpath code between
> slab_common.c and slub.c anymore. But that is also not an exhaustive
> effort and I expect more cleanups and optimizations will follow later.
>
> Patch 09 updates CREDITS for the removed mm/slab.c. Please point out if
> I missed someone not yet credited.
>
> Git version: https://git.kernel.org/vbabka/l/slab-remove-slab-v2r1
>
> ---
> Vlastimil Babka (21):
> mm/slab, docs: switch mm-api docs generation from slab.c to slub.c
> mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile
> KASAN: remove code paths guarded by CONFIG_SLAB
> KFENCE: cleanup kfence_guarded_alloc() after CONFIG_SLAB removal
> mm/memcontrol: remove CONFIG_SLAB #ifdef guards
> cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks
> mm/slab: remove CONFIG_SLAB code from slab common code
> mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs
> mm/slab: remove mm/slab.c and slab_def.h
> mm/slab: move struct kmem_cache_cpu declaration to slub.c
> mm/slab: move the rest of slub_def.h to mm/slab.h
> mm/slab: consolidate includes in the internal mm/slab.h
> mm/slab: move pre/post-alloc hooks from slab.h to slub.c
> mm/slab: move memcg related functions from slab.h to slub.c
> mm/slab: move struct kmem_cache_node from slab.h to slub.c
> mm/slab: move kfree() from slab_common.c to slub.c
> mm/slab: move kmalloc_slab() to mm/slab.h
> mm/slab: move kmalloc() functions from slab_common.c to slub.c
> mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers
> mm/slub: optimize alloc fastpath code layout
> mm/slub: optimize free fast path code layout
>
> CREDITS | 12 +-
> Documentation/core-api/mm-api.rst | 2 +-
> arch/arm64/Kconfig | 2 +-
> arch/s390/Kconfig | 2 +-
> arch/x86/Kconfig | 2 +-
> include/linux/cpuhotplug.h | 1 -
> include/linux/slab.h | 22 +-
> include/linux/slab_def.h | 124 --
> include/linux/slub_def.h | 204 --
> kernel/cpu.c | 5 -
> lib/Kconfig.debug | 1 -
> lib/Kconfig.kasan | 11 +-
> lib/Kconfig.kfence | 2 +-
> lib/Kconfig.kmsan | 2 +-
> mm/Kconfig | 68 +-
> mm/Kconfig.debug | 16 +-
> mm/Makefile | 6 +-
> mm/dmapool.c | 2 +-
> mm/kasan/common.c | 13 +-
> mm/kasan/kasan.h | 3 +-
> mm/kasan/quarantine.c | 7 -
> mm/kasan/report.c | 1 +
> mm/kfence/core.c | 4 -
> mm/memcontrol.c | 6 +-
> mm/mempool.c | 6 +-
> mm/slab.c | 4026 -------------------------------------
> mm/slab.h | 551 ++---
> mm/slab_common.c | 231 +--
> mm/slub.c | 617 +++++-
> 29 files changed, 815 insertions(+), 5134 deletions(-)
> ---
> base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
> change-id: 20231120-slab-remove-slab-a76ec668d8c6
>
> Best regards,
> --
> Vlastimil Babka <[email protected]>
>
>

2023-11-24 09:36:35

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 00/21] remove the SLAB allocator

On 11/24/23 01:45, David Rientjes wrote:
> On Mon, 20 Nov 2023, Vlastimil Babka wrote:
>
>> Changes from v1:
>> - Added new Patch 01 to fix up kernel docs build (thanks Marco Elver)
>> - Additional changes to Kconfig user visible texts in Patch 02 (thanks Kees
>> Cook)
>> - Whitespace fixes and other fixups (thanks Kees)
>>
>> The SLAB allocator has been deprecated since 6.5 and nobody has objected
>> so far. As we agreed at LSF/MM, we should wait with the removal until
>> the next LTS kernel is released. This is now determined to be 6.6, and
>> we just missed 6.7, so now we can aim for 6.8 and start exposing the
>> removal to linux-next during the 6.7 cycle. If nothing substantial pops
>> up, will start including this in slab-next later this week.
>>
>
> I agree with the decision to remove the SLAB allocator, same as at LSF/MM.
> Thanks for doing this, Vlastimil!
>
> And thanks for deferring this until the next LTS kernel, it will give any
> last minute hold outs a full year to raise any issues in their switch to
> SLUB if they only only upgrade to LTS kernels at which point we'll have
> done our due diligence to make people aware of SLAB's deprecation in 6.6.
>
> I've completed testing on v1 of the series, so feel free to add
>
> Acked-by: David Rientjes <[email protected]>
> Tested-by: David Rientjes <[email protected]>

Thanks! And others too.

I've now pushed this series to slab/for-6.8/slab-removal and slab/for-next


2023-12-06 08:13:56

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 05/21] mm/memcontrol: remove CONFIG_SLAB #ifdef guards

On Mon, Nov 20, 2023 at 07:34:16PM +0100, Vlastimil Babka wrote:
> With SLAB removed, these are never true anymore so we can clean up.
>
> Reviewed-by: Kees Cook <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/memcontrol.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 774bd6e21e27..947fb50eba31 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5149,7 +5149,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
> return ret;
> }
>
> -#if defined(CONFIG_MEMCG_KMEM) && (defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG))
> +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG)
> static int mem_cgroup_slab_show(struct seq_file *m, void *p)
> {
> /*
> @@ -5258,8 +5258,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
> .write = mem_cgroup_reset,
> .read_u64 = mem_cgroup_read_u64,
> },
> -#if defined(CONFIG_MEMCG_KMEM) && \
> - (defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG))
> +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG)
> {
> .name = "kmem.slabinfo",
> .seq_show = mem_cgroup_slab_show,

Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>

>
> --
> 2.42.1
>
>

2023-12-06 09:31:59

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 09/21] mm/slab: remove mm/slab.c and slab_def.h

On Mon, Nov 20, 2023 at 07:34:20PM +0100, Vlastimil Babka wrote:
> Remove the SLAB implementation. Update CREDITS.
> Also update and properly sort the SLOB entry there.
>
> RIP SLAB allocator (1996 - 2024)
>
> Reviewed-by: Kees Cook <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> CREDITS | 12 +-
> include/linux/slab_def.h | 124 --
> mm/slab.c | 4005 ----------------------------------------------
> 3 files changed, 8 insertions(+), 4133 deletions(-)

Acked-by: Hyeonggon Yoo <[email protected]>

> diff --git a/CREDITS b/CREDITS
> index f33a33fd2371..943a73e96149 100644
> --- a/CREDITS
> +++ b/CREDITS
> @@ -9,10 +9,6 @@
> Linus
> ----------
>
> -N: Matt Mackal
> -E: [email protected]
> -D: SLOB slab allocator

by the way I just realized that commit 16e943bf8db
("MAINTAINERS: SLAB maintainer update") incorrectly put her lastname
(Mackall is correct), maybe update that too?

> N: Matti Aarnio
> E: [email protected]
> D: Alpha systems hacking, IPv6 and other network related stuff
> @@ -1572,6 +1568,10 @@ S: Ampferstr. 50 / 4
> S: 6020 Innsbruck
> S: Austria
>
> +N: Mark Hemment
> +E: [email protected]
> +D: SLAB allocator implementation
> +
> N: Richard Henderson
> E: [email protected]
> E: [email protected]
> @@ -2437,6 +2437,10 @@ D: work on suspend-to-ram/disk, killing duplicates from ioctl32,
> D: Altera SoCFPGA and Nokia N900 support.
> S: Czech Republic
>
> +N: Olivia Mackal
> +E: [email protected]
> +D: SLOB slab allocator
> +
> N: Paul Mackerras
> E: [email protected]
> D: PPP driver
>
> --
> 2.42.1
>
>

2023-12-06 09:37:22

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 09/21] mm/slab: remove mm/slab.c and slab_def.h

On 12/6/23 10:31, Hyeonggon Yoo wrote:
> On Mon, Nov 20, 2023 at 07:34:20PM +0100, Vlastimil Babka wrote:
>> Remove the SLAB implementation. Update CREDITS.
>> Also update and properly sort the SLOB entry there.
>>
>> RIP SLAB allocator (1996 - 2024)
>>
>> Reviewed-by: Kees Cook <[email protected]>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> ---
>> CREDITS | 12 +-
>> include/linux/slab_def.h | 124 --
>> mm/slab.c | 4005 ----------------------------------------------
>> 3 files changed, 8 insertions(+), 4133 deletions(-)
>
> Acked-by: Hyeonggon Yoo <[email protected]>
>
>> diff --git a/CREDITS b/CREDITS
>> index f33a33fd2371..943a73e96149 100644
>> --- a/CREDITS
>> +++ b/CREDITS
>> @@ -9,10 +9,6 @@
>> Linus
>> ----------
>>
>> -N: Matt Mackal
>> -E: [email protected]
>> -D: SLOB slab allocator
>
> by the way I just realized that commit 16e943bf8db
> ("MAINTAINERS: SLAB maintainer update") incorrectly put her lastname
> (Mackall is correct), maybe update that too?

Right, thanks a lot for noticing, will fix.

2023-12-07 00:43:43

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 13/21] mm/slab: move pre/post-alloc hooks from slab.h to slub.c

On Mon, Nov 20, 2023 at 07:34:24PM +0100, Vlastimil Babka wrote:
> We don't share the hooks between two slab implementations anymore so
> they can be moved away from the header. As part of the move, also move
> should_failslab() from slab_common.c as the pre_alloc hook uses it.
> This means slab.h can stop including fault-inject.h and kmemleak.h.
> Fix up some files that were depending on the includes transitively.
>
> Reviewed-by: Kees Cook <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/kasan/report.c | 1 +
> mm/memcontrol.c | 1 +
> mm/slab.h | 72 -------------------------------------------------
> mm/slab_common.c | 8 +-----
> mm/slub.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 84 insertions(+), 79 deletions(-)
>
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index e77facb62900..011f727bfaff 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -23,6 +23,7 @@
> #include <linux/stacktrace.h>
> #include <linux/string.h>
> #include <linux/types.h>
> +#include <linux/vmalloc.h>
> #include <linux/kasan.h>
> #include <linux/module.h>
> #include <linux/sched/task_stack.h>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 947fb50eba31..8a0603517065 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -64,6 +64,7 @@
> #include <linux/psi.h>
> #include <linux/seq_buf.h>
> #include <linux/sched/isolation.h>
> +#include <linux/kmemleak.h>
> #include "internal.h"
> #include <net/sock.h>
> #include <net/ip.h>
> diff --git a/mm/slab.h b/mm/slab.h
> index 1ac3a2f8d4c0..65ebf86b3fe9 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -9,8 +9,6 @@
> #include <linux/kobject.h>
> #include <linux/sched/mm.h>
> #include <linux/memcontrol.h>
> -#include <linux/fault-inject.h>
> -#include <linux/kmemleak.h>
> #include <linux/kfence.h>
> #include <linux/kasan.h>
>
> @@ -796,76 +794,6 @@ static inline size_t slab_ksize(const struct kmem_cache *s)
> return s->size;
> }
>
> -static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
> - struct list_lru *lru,
> - struct obj_cgroup **objcgp,
> - size_t size, gfp_t flags)
> -{
> - flags &= gfp_allowed_mask;
> -
> - might_alloc(flags);
> -
> - if (should_failslab(s, flags))
> - return NULL;
> -
> - if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
> - return NULL;
> -
> - return s;
> -}
> -
> -static inline void slab_post_alloc_hook(struct kmem_cache *s,
> - struct obj_cgroup *objcg, gfp_t flags,
> - size_t size, void **p, bool init,
> - unsigned int orig_size)
> -{
> - unsigned int zero_size = s->object_size;
> - bool kasan_init = init;
> - size_t i;
> -
> - flags &= gfp_allowed_mask;
> -
> - /*
> - * For kmalloc object, the allocated memory size(object_size) is likely
> - * larger than the requested size(orig_size). If redzone check is
> - * enabled for the extra space, don't zero it, as it will be redzoned
> - * soon. The redzone operation for this extra space could be seen as a
> - * replacement of current poisoning under certain debug option, and
> - * won't break other sanity checks.
> - */
> - if (kmem_cache_debug_flags(s, SLAB_STORE_USER | SLAB_RED_ZONE) &&
> - (s->flags & SLAB_KMALLOC))
> - zero_size = orig_size;
> -
> - /*
> - * When slub_debug is enabled, avoid memory initialization integrated
> - * into KASAN and instead zero out the memory via the memset below with
> - * the proper size. Otherwise, KASAN might overwrite SLUB redzones and
> - * cause false-positive reports. This does not lead to a performance
> - * penalty on production builds, as slub_debug is not intended to be
> - * enabled there.
> - */
> - if (__slub_debug_enabled())
> - kasan_init = false;
> -
> - /*
> - * As memory initialization might be integrated into KASAN,
> - * kasan_slab_alloc and initialization memset must be
> - * kept together to avoid discrepancies in behavior.
> - *
> - * As p[i] might get tagged, memset and kmemleak hook come after KASAN.
> - */
> - for (i = 0; i < size; i++) {
> - p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init);
> - if (p[i] && init && (!kasan_init || !kasan_has_integrated_init()))
> - memset(p[i], 0, zero_size);
> - kmemleak_alloc_recursive(p[i], s->object_size, 1,
> - s->flags, flags);
> - kmsan_slab_alloc(s, p[i], flags);
> - }
> -
> - memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> -}
>
> /*
> * The slab lists for all objects.
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 63b8411db7ce..bbc2e3f061f1 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -21,6 +21,7 @@
> #include <linux/swiotlb.h>
> #include <linux/proc_fs.h>
> #include <linux/debugfs.h>
> +#include <linux/kmemleak.h>
> #include <linux/kasan.h>
> #include <asm/cacheflush.h>
> #include <asm/tlbflush.h>
> @@ -1470,10 +1471,3 @@ EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
> EXPORT_TRACEPOINT_SYMBOL(kfree);
> EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
>
> -int should_failslab(struct kmem_cache *s, gfp_t gfpflags)
> -{
> - if (__should_failslab(s, gfpflags))
> - return -ENOMEM;
> - return 0;
> -}
> -ALLOW_ERROR_INJECTION(should_failslab, ERRNO);
> diff --git a/mm/slub.c b/mm/slub.c
> index 979932d046fd..9eb6508152c2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -34,6 +34,7 @@
> #include <linux/memory.h>
> #include <linux/math64.h>
> #include <linux/fault-inject.h>
> +#include <linux/kmemleak.h>
> #include <linux/stacktrace.h>
> #include <linux/prefetch.h>
> #include <linux/memcontrol.h>
> @@ -3494,6 +3495,86 @@ static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s,
> 0, sizeof(void *));
> }
>
> +noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags)
> +{
> + if (__should_failslab(s, gfpflags))
> + return -ENOMEM;
> + return 0;
> +}
> +ALLOW_ERROR_INJECTION(should_failslab, ERRNO);
> +
> +static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
> + struct list_lru *lru,
> + struct obj_cgroup **objcgp,
> + size_t size, gfp_t flags)
> +{
> + flags &= gfp_allowed_mask;
> +
> + might_alloc(flags);
> +
> + if (should_failslab(s, flags))
> + return NULL;
> +
> + if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
> + return NULL;
> +
> + return s;
> +}
> +
> +static inline void slab_post_alloc_hook(struct kmem_cache *s,
> + struct obj_cgroup *objcg, gfp_t flags,
> + size_t size, void **p, bool init,
> + unsigned int orig_size)
> +{
> + unsigned int zero_size = s->object_size;
> + bool kasan_init = init;
> + size_t i;
> +
> + flags &= gfp_allowed_mask;
> +
> + /*
> + * For kmalloc object, the allocated memory size(object_size) is likely
> + * larger than the requested size(orig_size). If redzone check is
> + * enabled for the extra space, don't zero it, as it will be redzoned
> + * soon. The redzone operation for this extra space could be seen as a
> + * replacement of current poisoning under certain debug option, and
> + * won't break other sanity checks.
> + */
> + if (kmem_cache_debug_flags(s, SLAB_STORE_USER | SLAB_RED_ZONE) &&
> + (s->flags & SLAB_KMALLOC))
> + zero_size = orig_size;
> +
> + /*
> + * When slub_debug is enabled, avoid memory initialization integrated
> + * into KASAN and instead zero out the memory via the memset below with
> + * the proper size. Otherwise, KASAN might overwrite SLUB redzones and
> + * cause false-positive reports. This does not lead to a performance
> + * penalty on production builds, as slub_debug is not intended to be
> + * enabled there.
> + */
> + if (__slub_debug_enabled())
> + kasan_init = false;
> +
> + /*
> + * As memory initialization might be integrated into KASAN,
> + * kasan_slab_alloc and initialization memset must be
> + * kept together to avoid discrepancies in behavior.
> + *
> + * As p[i] might get tagged, memset and kmemleak hook come after KASAN.
> + */
> + for (i = 0; i < size; i++) {
> + p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init);
> + if (p[i] && init && (!kasan_init ||
> + !kasan_has_integrated_init()))
> + memset(p[i], 0, zero_size);
> + kmemleak_alloc_recursive(p[i], s->object_size, 1,
> + s->flags, flags);
> + kmsan_slab_alloc(s, p[i], flags);
> + }
> +
> + memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> +}
> +
> /*
> * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc)
> * have the fastpath folded into their functions. So no function call
>
> --

Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>

> 2.42.1
>
>

2023-12-07 01:00:33

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 14/21] mm/slab: move memcg related functions from slab.h to slub.c

On Mon, Nov 20, 2023 at 07:34:25PM +0100, Vlastimil Babka wrote:
> We don't share those between SLAB and SLUB anymore, so most memcg
> related functions can be moved to slub.c proper.
>
> Reviewed-by: Kees Cook <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slab.h | 206 --------------------------------------------------------------
> mm/slub.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 205 insertions(+), 206 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 65ebf86b3fe9..a81ef7c9282d 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -486,12 +486,6 @@ void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s);
> ssize_t slabinfo_write(struct file *file, const char __user *buffer,
> size_t count, loff_t *ppos);
>
> -static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
> -{
> - return (s->flags & SLAB_RECLAIM_ACCOUNT) ?
> - NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B;
> -}
> -
> #ifdef CONFIG_SLUB_DEBUG
> #ifdef CONFIG_SLUB_DEBUG_ON
> DECLARE_STATIC_KEY_TRUE(slub_debug_enabled);
> @@ -551,220 +545,20 @@ int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
> gfp_t gfp, bool new_slab);
> void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> enum node_stat_item idx, int nr);
> -
> -static inline void memcg_free_slab_cgroups(struct slab *slab)
> -{
> - kfree(slab_objcgs(slab));
> - slab->memcg_data = 0;
> -}
> -
> -static inline size_t obj_full_size(struct kmem_cache *s)
> -{
> - /*
> - * For each accounted object there is an extra space which is used
> - * to store obj_cgroup membership. Charge it too.
> - */
> - return s->size + sizeof(struct obj_cgroup *);
> -}
> -
> -/*
> - * Returns false if the allocation should fail.
> - */
> -static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
> - struct list_lru *lru,
> - struct obj_cgroup **objcgp,
> - size_t objects, gfp_t flags)
> -{
> - struct obj_cgroup *objcg;
> -
> - if (!memcg_kmem_online())
> - return true;
> -
> - if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
> - return true;
> -
> - /*
> - * The obtained objcg pointer is safe to use within the current scope,
> - * defined by current task or set_active_memcg() pair.
> - * obj_cgroup_get() is used to get a permanent reference.
> - */
> - objcg = current_obj_cgroup();
> - if (!objcg)
> - return true;
> -
> - if (lru) {
> - int ret;
> - struct mem_cgroup *memcg;
> -
> - memcg = get_mem_cgroup_from_objcg(objcg);
> - ret = memcg_list_lru_alloc(memcg, lru, flags);
> - css_put(&memcg->css);
> -
> - if (ret)
> - return false;
> - }
> -
> - if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s)))
> - return false;
> -
> - *objcgp = objcg;
> - return true;
> -}
> -
> -static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> - struct obj_cgroup *objcg,
> - gfp_t flags, size_t size,
> - void **p)
> -{
> - struct slab *slab;
> - unsigned long off;
> - size_t i;
> -
> - if (!memcg_kmem_online() || !objcg)
> - return;
> -
> - for (i = 0; i < size; i++) {
> - if (likely(p[i])) {
> - slab = virt_to_slab(p[i]);
> -
> - if (!slab_objcgs(slab) &&
> - memcg_alloc_slab_cgroups(slab, s, flags,
> - false)) {
> - obj_cgroup_uncharge(objcg, obj_full_size(s));
> - continue;
> - }
> -
> - off = obj_to_index(s, slab, p[i]);
> - obj_cgroup_get(objcg);
> - slab_objcgs(slab)[off] = objcg;
> - mod_objcg_state(objcg, slab_pgdat(slab),
> - cache_vmstat_idx(s), obj_full_size(s));
> - } else {
> - obj_cgroup_uncharge(objcg, obj_full_size(s));
> - }
> - }
> -}
> -
> -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> - void **p, int objects)
> -{
> - struct obj_cgroup **objcgs;
> - int i;
> -
> - if (!memcg_kmem_online())
> - return;
> -
> - objcgs = slab_objcgs(slab);
> - if (!objcgs)
> - return;
> -
> - for (i = 0; i < objects; i++) {
> - struct obj_cgroup *objcg;
> - unsigned int off;
> -
> - off = obj_to_index(s, slab, p[i]);
> - objcg = objcgs[off];
> - if (!objcg)
> - continue;
> -
> - objcgs[off] = NULL;
> - obj_cgroup_uncharge(objcg, obj_full_size(s));
> - mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
> - -obj_full_size(s));
> - obj_cgroup_put(objcg);
> - }
> -}
> -
> #else /* CONFIG_MEMCG_KMEM */
> static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
> {
> return NULL;
> }
>
> -static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
> -{
> - return NULL;
> -}
> -
> static inline int memcg_alloc_slab_cgroups(struct slab *slab,
> struct kmem_cache *s, gfp_t gfp,
> bool new_slab)
> {
> return 0;
> }
> -
> -static inline void memcg_free_slab_cgroups(struct slab *slab)
> -{
> -}
> -
> -static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
> - struct list_lru *lru,
> - struct obj_cgroup **objcgp,
> - size_t objects, gfp_t flags)
> -{
> - return true;
> -}
> -
> -static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> - struct obj_cgroup *objcg,
> - gfp_t flags, size_t size,
> - void **p)
> -{
> -}
> -
> -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> - void **p, int objects)
> -{
> -}
> #endif /* CONFIG_MEMCG_KMEM */
>
> -static inline struct kmem_cache *virt_to_cache(const void *obj)
> -{
> - struct slab *slab;
> -
> - slab = virt_to_slab(obj);
> - if (WARN_ONCE(!slab, "%s: Object is not a Slab page!\n",
> - __func__))
> - return NULL;
> - return slab->slab_cache;
> -}
> -
> -static __always_inline void account_slab(struct slab *slab, int order,
> - struct kmem_cache *s, gfp_t gfp)
> -{
> - if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> - memcg_alloc_slab_cgroups(slab, s, gfp, true);
> -
> - mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> - PAGE_SIZE << order);
> -}
> -
> -static __always_inline void unaccount_slab(struct slab *slab, int order,
> - struct kmem_cache *s)
> -{
> - if (memcg_kmem_online())
> - memcg_free_slab_cgroups(slab);
> -
> - mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> - -(PAGE_SIZE << order));
> -}
> -
> -static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
> -{
> - struct kmem_cache *cachep;
> -
> - if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) &&
> - !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS))
> - return s;
> -
> - cachep = virt_to_cache(x);
> - if (WARN(cachep && cachep != s,
> - "%s: Wrong slab cache. %s but object is from %s\n",
> - __func__, s->name, cachep->name))
> - print_tracking(cachep, x);
> - return cachep;
> -}
> -
> void free_large_kmalloc(struct folio *folio, void *object);
>
> size_t __ksize(const void *objp);
> diff --git a/mm/slub.c b/mm/slub.c
> index 9eb6508152c2..844e0beb84ee 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1814,6 +1814,165 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
> #endif
> #endif /* CONFIG_SLUB_DEBUG */
>
> +static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
> +{
> + return (s->flags & SLAB_RECLAIM_ACCOUNT) ?
> + NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B;
> +}
> +
> +#ifdef CONFIG_MEMCG_KMEM
> +static inline void memcg_free_slab_cgroups(struct slab *slab)
> +{
> + kfree(slab_objcgs(slab));
> + slab->memcg_data = 0;
> +}
> +
> +static inline size_t obj_full_size(struct kmem_cache *s)
> +{
> + /*
> + * For each accounted object there is an extra space which is used
> + * to store obj_cgroup membership. Charge it too.
> + */
> + return s->size + sizeof(struct obj_cgroup *);
> +}
> +
> +/*
> + * Returns false if the allocation should fail.
> + */
> +static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
> + struct list_lru *lru,
> + struct obj_cgroup **objcgp,
> + size_t objects, gfp_t flags)
> +{
> + struct obj_cgroup *objcg;
> +
> + if (!memcg_kmem_online())
> + return true;
> +
> + if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
> + return true;
> +
> + /*
> + * The obtained objcg pointer is safe to use within the current scope,
> + * defined by current task or set_active_memcg() pair.
> + * obj_cgroup_get() is used to get a permanent reference.
> + */
> + objcg = current_obj_cgroup();
> + if (!objcg)
> + return true;
> +
> + if (lru) {
> + int ret;
> + struct mem_cgroup *memcg;
> +
> + memcg = get_mem_cgroup_from_objcg(objcg);
> + ret = memcg_list_lru_alloc(memcg, lru, flags);
> + css_put(&memcg->css);
> +
> + if (ret)
> + return false;
> + }
> +
> + if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s)))
> + return false;
> +
> + *objcgp = objcg;
> + return true;
> +}
> +
> +static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> + struct obj_cgroup *objcg,
> + gfp_t flags, size_t size,
> + void **p)
> +{
> + struct slab *slab;
> + unsigned long off;
> + size_t i;
> +
> + if (!memcg_kmem_online() || !objcg)
> + return;
> +
> + for (i = 0; i < size; i++) {
> + if (likely(p[i])) {
> + slab = virt_to_slab(p[i]);
> +
> + if (!slab_objcgs(slab) &&
> + memcg_alloc_slab_cgroups(slab, s, flags, false)) {
> + obj_cgroup_uncharge(objcg, obj_full_size(s));
> + continue;
> + }
> +
> + off = obj_to_index(s, slab, p[i]);
> + obj_cgroup_get(objcg);
> + slab_objcgs(slab)[off] = objcg;
> + mod_objcg_state(objcg, slab_pgdat(slab),
> + cache_vmstat_idx(s), obj_full_size(s));
> + } else {
> + obj_cgroup_uncharge(objcg, obj_full_size(s));
> + }
> + }
> +}
> +
> +static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> + void **p, int objects)
> +{
> + struct obj_cgroup **objcgs;
> + int i;
> +
> + if (!memcg_kmem_online())
> + return;
> +
> + objcgs = slab_objcgs(slab);
> + if (!objcgs)
> + return;
> +
> + for (i = 0; i < objects; i++) {
> + struct obj_cgroup *objcg;
> + unsigned int off;
> +
> + off = obj_to_index(s, slab, p[i]);
> + objcg = objcgs[off];
> + if (!objcg)
> + continue;
> +
> + objcgs[off] = NULL;
> + obj_cgroup_uncharge(objcg, obj_full_size(s));
> + mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
> + -obj_full_size(s));
> + obj_cgroup_put(objcg);
> + }
> +}
> +#else /* CONFIG_MEMCG_KMEM */
> +static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
> +{
> + return NULL;
> +}
> +
> +static inline void memcg_free_slab_cgroups(struct slab *slab)
> +{
> +}
> +
> +static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
> + struct list_lru *lru,
> + struct obj_cgroup **objcgp,
> + size_t objects, gfp_t flags)
> +{
> + return true;
> +}
> +
> +static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> + struct obj_cgroup *objcg,
> + gfp_t flags, size_t size,
> + void **p)
> +{
> +}
> +
> +static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> + void **p, int objects)
> +{
> +}
> +#endif /* CONFIG_MEMCG_KMEM */
> +
> /*
> * Hooks for other subsystems that check memory allocations. In a typical
> * production configuration these hooks all should produce no code at all.
> @@ -2048,6 +2207,26 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
> }
> #endif /* CONFIG_SLAB_FREELIST_RANDOM */
>
> +static __always_inline void account_slab(struct slab *slab, int order,
> + struct kmem_cache *s, gfp_t gfp)
> +{
> + if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> + memcg_alloc_slab_cgroups(slab, s, gfp, true);
> +
> + mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> + PAGE_SIZE << order);
> +}
> +
> +static __always_inline void unaccount_slab(struct slab *slab, int order,
> + struct kmem_cache *s)
> +{
> + if (memcg_kmem_online())
> + memcg_free_slab_cgroups(slab);
> +
> + mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> + -(PAGE_SIZE << order));
> +}
> +
> static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> {
> struct slab *slab;
> @@ -3965,6 +4144,32 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
> }
> #endif
>
> +static inline struct kmem_cache *virt_to_cache(const void *obj)
> +{
> + struct slab *slab;
> +
> + slab = virt_to_slab(obj);
> + if (WARN_ONCE(!slab, "%s: Object is not a Slab page!\n", __func__))
> + return NULL;
> + return slab->slab_cache;
> +}
> +
> +static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
> +{
> + struct kmem_cache *cachep;
> +
> + if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) &&
> + !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS))
> + return s;
> +
> + cachep = virt_to_cache(x);
> + if (WARN(cachep && cachep != s,
> + "%s: Wrong slab cache. %s but object is from %s\n",
> + __func__, s->name, cachep->name))
> + print_tracking(cachep, x);
> + return cachep;
> +}
> +
> void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
> {
> slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);
>
> --

Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>

> 2.42.1
>
>

2023-12-07 01:11:31

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 15/21] mm/slab: move struct kmem_cache_node from slab.h to slub.c

On Mon, Nov 20, 2023 at 07:34:26PM +0100, Vlastimil Babka wrote:
> The declaration and associated helpers are not used anywhere else
> anymore.
>
> Reviewed-by: Kees Cook <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slab.h | 29 -----------------------------
> mm/slub.c | 27 +++++++++++++++++++++++++++
> 2 files changed, 27 insertions(+), 29 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index a81ef7c9282d..5ae6a978e9c2 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -588,35 +588,6 @@ static inline size_t slab_ksize(const struct kmem_cache *s)
> return s->size;
> }
>
> -
> -/*
> - * The slab lists for all objects.
> - */
> -struct kmem_cache_node {
> - spinlock_t list_lock;
> - unsigned long nr_partial;
> - struct list_head partial;
> -#ifdef CONFIG_SLUB_DEBUG
> - atomic_long_t nr_slabs;
> - atomic_long_t total_objects;
> - struct list_head full;
> -#endif
> -};
> -
> -static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> -{
> - return s->node[node];
> -}
> -
> -/*
> - * Iterator over all nodes. The body will be executed for each node that has
> - * a kmem_cache_node structure allocated (which is true for all online nodes)
> - */
> -#define for_each_kmem_cache_node(__s, __node, __n) \
> - for (__node = 0; __node < nr_node_ids; __node++) \
> - if ((__n = get_node(__s, __node)))
> -
> -
> #ifdef CONFIG_SLUB_DEBUG
> void dump_unreclaimable_slab(void);
> #else
> diff --git a/mm/slub.c b/mm/slub.c
> index 844e0beb84ee..cc801f8258fe 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -396,6 +396,33 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si)
> #endif
> }
>
> +/*
> + * The slab lists for all objects.
> + */
> +struct kmem_cache_node {
> + spinlock_t list_lock;
> + unsigned long nr_partial;
> + struct list_head partial;
> +#ifdef CONFIG_SLUB_DEBUG
> + atomic_long_t nr_slabs;
> + atomic_long_t total_objects;
> + struct list_head full;
> +#endif
> +};
> +
> +static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> +{
> + return s->node[node];
> +}
> +
> +/*
> + * Iterator over all nodes. The body will be executed for each node that has
> + * a kmem_cache_node structure allocated (which is true for all online nodes)
> + */
> +#define for_each_kmem_cache_node(__s, __node, __n) \
> + for (__node = 0; __node < nr_node_ids; __node++) \
> + if ((__n = get_node(__s, __node)))
> +
> /*
> * Tracks for which NUMA nodes we have kmem_cache_nodes allocated.
> * Corresponds to node_state[N_NORMAL_MEMORY], but can temporarily
>
> --

Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>

> 2.42.1
>
>

2023-12-07 01:31:45

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 18/21] mm/slab: move kmalloc() functions from slab_common.c to slub.c

On Mon, Nov 20, 2023 at 07:34:29PM +0100, Vlastimil Babka wrote:
> This will eliminate a call between compilation units through
> __kmem_cache_alloc_node() and allow better inlining of the allocation
> fast path.
>
> Reviewed-by: Kees Cook <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slab.h | 3 --
> mm/slab_common.c | 119 ----------------------------------------------------
> mm/slub.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++----
> 3 files changed, 118 insertions(+), 130 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 7d7cc7af614e..54deeb0428c6 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -416,9 +416,6 @@ kmalloc_slab(size_t size, gfp_t flags, unsigned long caller)
> return kmalloc_caches[kmalloc_type(flags, caller)][index];
> }
>
> -void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> - int node, size_t orig_size,
> - unsigned long caller);
> gfp_t kmalloc_fix_flags(gfp_t flags);
>
> /* Functions provided by the slab allocators */
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 31ade17a7ad9..238293b1dbe1 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -936,50 +936,6 @@ void __init create_kmalloc_caches(slab_flags_t flags)
> slab_state = UP;
> }
>
> -static void *__kmalloc_large_node(size_t size, gfp_t flags, int node);
> -static __always_inline
> -void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> -{
> - struct kmem_cache *s;
> - void *ret;
> -
> - if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> - ret = __kmalloc_large_node(size, flags, node);
> - trace_kmalloc(caller, ret, size,
> - PAGE_SIZE << get_order(size), flags, node);
> - return ret;
> - }
> -
> - if (unlikely(!size))
> - return ZERO_SIZE_PTR;
> -
> - s = kmalloc_slab(size, flags, caller);
> -
> - ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
> - ret = kasan_kmalloc(s, ret, size, flags);
> - trace_kmalloc(caller, ret, size, s->size, flags, node);
> - return ret;
> -}
> -
> -void *__kmalloc_node(size_t size, gfp_t flags, int node)
> -{
> - return __do_kmalloc_node(size, flags, node, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc_node);
> -
> -void *__kmalloc(size_t size, gfp_t flags)
> -{
> - return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc);
> -
> -void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> - int node, unsigned long caller)
> -{
> - return __do_kmalloc_node(size, flags, node, caller);
> -}
> -EXPORT_SYMBOL(__kmalloc_node_track_caller);
> -
> /**
> * __ksize -- Report full size of underlying allocation
> * @object: pointer to the object
> @@ -1016,30 +972,6 @@ size_t __ksize(const void *object)
> return slab_ksize(folio_slab(folio)->slab_cache);
> }
>
> -void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
> -{
> - void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
> - size, _RET_IP_);
> -
> - trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
> -
> - ret = kasan_kmalloc(s, ret, size, gfpflags);
> - return ret;
> -}
> -EXPORT_SYMBOL(kmalloc_trace);
> -
> -void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
> - int node, size_t size)
> -{
> - void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
> -
> - trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
> -
> - ret = kasan_kmalloc(s, ret, size, gfpflags);
> - return ret;
> -}
> -EXPORT_SYMBOL(kmalloc_node_trace);
> -
> gfp_t kmalloc_fix_flags(gfp_t flags)
> {
> gfp_t invalid_mask = flags & GFP_SLAB_BUG_MASK;
> @@ -1052,57 +984,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
> return flags;
> }
>
> -/*
> - * To avoid unnecessary overhead, we pass through large allocation requests
> - * directly to the page allocator. We use __GFP_COMP, because we will need to
> - * know the allocation order to free the pages properly in kfree.
> - */
> -
> -static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
> -{
> - struct page *page;
> - void *ptr = NULL;
> - unsigned int order = get_order(size);
> -
> - if (unlikely(flags & GFP_SLAB_BUG_MASK))
> - flags = kmalloc_fix_flags(flags);
> -
> - flags |= __GFP_COMP;
> - page = alloc_pages_node(node, flags, order);
> - if (page) {
> - ptr = page_address(page);
> - mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
> - PAGE_SIZE << order);
> - }
> -
> - ptr = kasan_kmalloc_large(ptr, size, flags);
> - /* As ptr might get tagged, call kmemleak hook after KASAN. */
> - kmemleak_alloc(ptr, size, 1, flags);
> - kmsan_kmalloc_large(ptr, size, flags);
> -
> - return ptr;
> -}
> -
> -void *kmalloc_large(size_t size, gfp_t flags)
> -{
> - void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
> -
> - trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
> - flags, NUMA_NO_NODE);
> - return ret;
> -}
> -EXPORT_SYMBOL(kmalloc_large);
> -
> -void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> -{
> - void *ret = __kmalloc_large_node(size, flags, node);
> -
> - trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
> - flags, node);
> - return ret;
> -}
> -EXPORT_SYMBOL(kmalloc_large_node);
> -
> #ifdef CONFIG_SLAB_FREELIST_RANDOM
> /* Randomize a generic freelist */
> static void freelist_randomize(unsigned int *list,
> diff --git a/mm/slub.c b/mm/slub.c
> index 2baa9e94d9df..d6bc15929d22 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3851,14 +3851,6 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> }
> EXPORT_SYMBOL(kmem_cache_alloc_lru);
>
> -void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> - int node, size_t orig_size,
> - unsigned long caller)
> -{
> - return slab_alloc_node(s, NULL, gfpflags, node,
> - caller, orig_size);
> -}
> -
> /**
> * kmem_cache_alloc_node - Allocate an object on the specified node
> * @s: The cache to allocate from.
> @@ -3882,6 +3874,124 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
> }
> EXPORT_SYMBOL(kmem_cache_alloc_node);
>
> +/*
> + * To avoid unnecessary overhead, we pass through large allocation requests
> + * directly to the page allocator. We use __GFP_COMP, because we will need to
> + * know the allocation order to free the pages properly in kfree.
> + */
> +static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
> +{
> + struct page *page;
> + void *ptr = NULL;
> + unsigned int order = get_order(size);
> +
> + if (unlikely(flags & GFP_SLAB_BUG_MASK))
> + flags = kmalloc_fix_flags(flags);
> +
> + flags |= __GFP_COMP;
> + page = alloc_pages_node(node, flags, order);
> + if (page) {
> + ptr = page_address(page);
> + mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
> + PAGE_SIZE << order);
> + }
> +
> + ptr = kasan_kmalloc_large(ptr, size, flags);
> + /* As ptr might get tagged, call kmemleak hook after KASAN. */
> + kmemleak_alloc(ptr, size, 1, flags);
> + kmsan_kmalloc_large(ptr, size, flags);
> +
> + return ptr;
> +}
> +
> +void *kmalloc_large(size_t size, gfp_t flags)
> +{
> + void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
> +
> + trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
> + flags, NUMA_NO_NODE);
> + return ret;
> +}
> +EXPORT_SYMBOL(kmalloc_large);
> +
> +void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> +{
> + void *ret = __kmalloc_large_node(size, flags, node);
> +
> + trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
> + flags, node);
> + return ret;
> +}
> +EXPORT_SYMBOL(kmalloc_large_node);
> +
> +static __always_inline
> +void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
> + unsigned long caller)
> +{
> + struct kmem_cache *s;
> + void *ret;
> +
> + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> + ret = __kmalloc_large_node(size, flags, node);
> + trace_kmalloc(caller, ret, size,
> + PAGE_SIZE << get_order(size), flags, node);
> + return ret;
> + }
> +
> + if (unlikely(!size))
> + return ZERO_SIZE_PTR;
> +
> + s = kmalloc_slab(size, flags, caller);
> +
> + ret = slab_alloc_node(s, NULL, flags, node, caller, size);
> + ret = kasan_kmalloc(s, ret, size, flags);
> + trace_kmalloc(caller, ret, size, s->size, flags, node);
> + return ret;
> +}
> +
> +void *__kmalloc_node(size_t size, gfp_t flags, int node)
> +{
> + return __do_kmalloc_node(size, flags, node, _RET_IP_);
> +}
> +EXPORT_SYMBOL(__kmalloc_node);
> +
> +void *__kmalloc(size_t size, gfp_t flags)
> +{
> + return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> +}
> +EXPORT_SYMBOL(__kmalloc);
> +
> +void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> + int node, unsigned long caller)
> +{
> + return __do_kmalloc_node(size, flags, node, caller);
> +}
> +EXPORT_SYMBOL(__kmalloc_node_track_caller);
> +
> +void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
> +{
> + void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
> + _RET_IP_, size);
> +
> + trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
> +
> + ret = kasan_kmalloc(s, ret, size, gfpflags);
> + return ret;
> +}
> +EXPORT_SYMBOL(kmalloc_trace);
> +
> +void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
> + int node, size_t size)
> +{
> + void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
> +
> + trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
> +
> + ret = kasan_kmalloc(s, ret, size, gfpflags);
> + return ret;
> +}
> +EXPORT_SYMBOL(kmalloc_node_trace);
> +
> static noinline void free_to_partial_list(
> struct kmem_cache *s, struct slab *slab,
> void *head, void *tail, int bulk_cnt,
>
> --

Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>

> 2.42.1
>
>

2023-12-07 01:36:03

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 19/21] mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers

On Mon, Nov 20, 2023 at 07:34:30PM +0100, Vlastimil Babka wrote:
> slab_alloc() is a thin wrapper around slab_alloc_node() with only one
> caller. Replace with direct call of slab_alloc_node().
> __kmem_cache_alloc_lru() itself is a thin wrapper with two callers,
> so replace it with direct calls of slab_alloc_node() and
> trace_kmem_cache_alloc().
>
> This also makes sure _RET_IP_ has always the expected value and not
> depending on inlining decisions.
>
> Reviewed-by: Kees Cook <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slub.c | 25 +++++++++----------------
> 1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index d6bc15929d22..5683f1d02e4f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3821,33 +3821,26 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
> return object;
> }
>
> -static __fastpath_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
> - gfp_t gfpflags, unsigned long addr, size_t orig_size)
> -{
> - return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size);
> -}
> -
> -static __fastpath_inline
> -void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> - gfp_t gfpflags)
> +void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
> {
> - void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
> + void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE, _RET_IP_,
> + s->object_size);
>
> trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
>
> return ret;
> }
> -
> -void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
> -{
> - return __kmem_cache_alloc_lru(s, NULL, gfpflags);
> -}
> EXPORT_SYMBOL(kmem_cache_alloc);
>
> void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> gfp_t gfpflags)
> {
> - return __kmem_cache_alloc_lru(s, lru, gfpflags);
> + void *ret = slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, _RET_IP_,
> + s->object_size);
> +
> + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
> +
> + return ret;
> }
> EXPORT_SYMBOL(kmem_cache_alloc_lru);

Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>

>
>
> --
> 2.42.1
>
>

2023-12-07 02:32:55

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 20/21] mm/slub: optimize alloc fastpath code layout

On Mon, Nov 20, 2023 at 07:34:31PM +0100, Vlastimil Babka wrote:
> With allocation fastpaths no longer divided between two .c files, we
> have better inlining, however checking the disassembly of
> kmem_cache_alloc() reveals we can do better to make the fastpaths
> smaller and move the less common situations out of line or to separate
> functions, to reduce instruction cache pressure.
>
> - split memcg pre/post alloc hooks to inlined checks that use likely()
> to assume there will be no objcg handling necessary, and non-inline
> functions doing the actual handling
>
> - add some more likely/unlikely() to pre/post alloc hooks to indicate
> which scenarios should be out of line
>
> - change gfp_allowed_mask handling in slab_post_alloc_hook() so the
> code can be optimized away when kasan/kmsan/kmemleak is configured out
>
> bloat-o-meter shows:
> add/remove: 4/2 grow/shrink: 1/8 up/down: 521/-2924 (-2403)
> Function old new delta
> __memcg_slab_post_alloc_hook - 461 +461
> kmem_cache_alloc_bulk 775 791 +16
> __pfx_should_failslab.constprop - 16 +16
> __pfx___memcg_slab_post_alloc_hook - 16 +16
> should_failslab.constprop - 12 +12
> __pfx_memcg_slab_post_alloc_hook 16 - -16
> kmem_cache_alloc_lru 1295 1023 -272
> kmem_cache_alloc_node 1118 817 -301
> kmem_cache_alloc 1076 772 -304
> kmalloc_node_trace 1149 838 -311
> kmalloc_trace 1102 789 -313
> __kmalloc_node_track_caller 1393 1080 -313
> __kmalloc_node 1397 1082 -315
> __kmalloc 1374 1059 -315
> memcg_slab_post_alloc_hook 464 - -464
>
> Note that gcc still decided to inline __memcg_pre_alloc_hook(), but the
> code is out of line. Forcing noinline did not improve the results. As a
> result the fastpaths are shorter and overal code size is reduced.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slub.c | 89 ++++++++++++++++++++++++++++++++++++++-------------------------
> 1 file changed, 54 insertions(+), 35 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 5683f1d02e4f..77d259f3d592 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1866,25 +1866,17 @@ static inline size_t obj_full_size(struct kmem_cache *s)
> /*
> * Returns false if the allocation should fail.
> */
> -static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
> - struct list_lru *lru,
> - struct obj_cgroup **objcgp,
> - size_t objects, gfp_t flags)
> +static bool __memcg_slab_pre_alloc_hook(struct kmem_cache *s,
> + struct list_lru *lru,
> + struct obj_cgroup **objcgp,
> + size_t objects, gfp_t flags)
> {
> - struct obj_cgroup *objcg;
> -
> - if (!memcg_kmem_online())
> - return true;
> -
> - if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
> - return true;
> -
> /*
> * The obtained objcg pointer is safe to use within the current scope,
> * defined by current task or set_active_memcg() pair.
> * obj_cgroup_get() is used to get a permanent reference.
> */
> - objcg = current_obj_cgroup();
> + struct obj_cgroup *objcg = current_obj_cgroup();
> if (!objcg)
> return true;
>
> @@ -1907,17 +1899,34 @@ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
> return true;
> }
>
> -static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> - struct obj_cgroup *objcg,
> - gfp_t flags, size_t size,
> - void **p)
> +/*
> + * Returns false if the allocation should fail.
> + */
> +static __fastpath_inline
> +bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> + struct obj_cgroup **objcgp, size_t objects,
> + gfp_t flags)
> +{
> + if (!memcg_kmem_online())
> + return true;
> +
> + if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)))
> + return true;
> +
> + return likely(__memcg_slab_pre_alloc_hook(s, lru, objcgp, objects,
> + flags));
> +}
> +
> +static void __memcg_slab_post_alloc_hook(struct kmem_cache *s,
> + struct obj_cgroup *objcg,
> + gfp_t flags, size_t size,
> + void **p)
> {
> struct slab *slab;
> unsigned long off;
> size_t i;
>
> - if (!memcg_kmem_online() || !objcg)
> - return;
> + flags &= gfp_allowed_mask;
>
> for (i = 0; i < size; i++) {
> if (likely(p[i])) {
> @@ -1940,6 +1949,16 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> }
> }
>
> +static __fastpath_inline
> +void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
> + gfp_t flags, size_t size, void **p)
> +{
> + if (likely(!memcg_kmem_online() || !objcg))
> + return;
> +
> + return __memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> +}
> +
> static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> void **p, int objects)
> {
> @@ -3709,34 +3728,34 @@ noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags)
> }
> ALLOW_ERROR_INJECTION(should_failslab, ERRNO);
>
> -static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
> - struct list_lru *lru,
> - struct obj_cgroup **objcgp,
> - size_t size, gfp_t flags)
> +static __fastpath_inline
> +struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
> + struct list_lru *lru,
> + struct obj_cgroup **objcgp,
> + size_t size, gfp_t flags)
> {
> flags &= gfp_allowed_mask;
>
> might_alloc(flags);
>
> - if (should_failslab(s, flags))
> + if (unlikely(should_failslab(s, flags)))
> return NULL;
>
> - if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
> + if (unlikely(!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags)))
> return NULL;
>
> return s;
> }
>
> -static inline void slab_post_alloc_hook(struct kmem_cache *s,
> - struct obj_cgroup *objcg, gfp_t flags,
> - size_t size, void **p, bool init,
> - unsigned int orig_size)
> +static __fastpath_inline
> +void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
> + gfp_t flags, size_t size, void **p, bool init,
> + unsigned int orig_size)
> {
> unsigned int zero_size = s->object_size;
> bool kasan_init = init;
> size_t i;
> -
> - flags &= gfp_allowed_mask;
> + gfp_t init_flags = flags & gfp_allowed_mask;
>
> /*
> * For kmalloc object, the allocated memory size(object_size) is likely
> @@ -3769,13 +3788,13 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
> * As p[i] might get tagged, memset and kmemleak hook come after KASAN.
> */
> for (i = 0; i < size; i++) {
> - p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init);
> + p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
> if (p[i] && init && (!kasan_init ||
> !kasan_has_integrated_init()))
> memset(p[i], 0, zero_size);
> kmemleak_alloc_recursive(p[i], s->object_size, 1,
> - s->flags, flags);
> - kmsan_slab_alloc(s, p[i], flags);
> + s->flags, init_flags);
> + kmsan_slab_alloc(s, p[i], init_flags);
> }
>
> memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> @@ -3799,7 +3818,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
> bool init = false;
>
> s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags);
> - if (!s)
> + if (unlikely(!s))
> return NULL;
>
> object = kfence_alloc(s, orig_size, gfpflags);
>
> --

Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>

> 2.42.1
>
>

2023-12-07 02:40:42

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 21/21] mm/slub: optimize free fast path code layout

On Mon, Nov 20, 2023 at 07:34:32PM +0100, Vlastimil Babka wrote:
> Inspection of kmem_cache_free() disassembly showed we could make the
> fast path smaller by providing few more hints to the compiler, and
> splitting the memcg_slab_free_hook() into an inline part that only
> checks if there's work to do, and an out of line part doing the actual
> uncharge.
>
> bloat-o-meter results:
> add/remove: 2/0 grow/shrink: 0/3 up/down: 286/-554 (-268)
> Function old new delta
> __memcg_slab_free_hook - 270 +270
> __pfx___memcg_slab_free_hook - 16 +16
> kfree 828 665 -163
> kmem_cache_free 1116 948 -168
> kmem_cache_free_bulk.part 1701 1478 -223
>
> Checking kmem_cache_free() disassembly now shows the non-fastpath
> cases are handled out of line, which should reduce instruction cache
> usage.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slub.c | 40 ++++++++++++++++++++++++----------------
> 1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 77d259f3d592..3f8b95757106 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1959,20 +1959,11 @@ void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
> return __memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> }
>
> -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> - void **p, int objects)
> +static void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> + void **p, int objects,
> + struct obj_cgroup **objcgs)
> {
> - struct obj_cgroup **objcgs;
> - int i;
> -
> - if (!memcg_kmem_online())
> - return;
> -
> - objcgs = slab_objcgs(slab);
> - if (!objcgs)
> - return;
> -
> - for (i = 0; i < objects; i++) {
> + for (int i = 0; i < objects; i++) {
> struct obj_cgroup *objcg;
> unsigned int off;
>
> @@ -1988,6 +1979,22 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> obj_cgroup_put(objcg);
> }
> }
> +
> +static __fastpath_inline
> +void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> + int objects)
> +{
> + struct obj_cgroup **objcgs;
> +
> + if (!memcg_kmem_online())
> + return;
> +
> + objcgs = slab_objcgs(slab);
> + if (likely(!objcgs))
> + return;
> +
> + __memcg_slab_free_hook(s, slab, p, objects, objcgs);
> +}
> #else /* CONFIG_MEMCG_KMEM */
> static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
> {
> @@ -2047,7 +2054,7 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s,
> * The initialization memset's clear the object and the metadata,
> * but don't touch the SLAB redzone.
> */
> - if (init) {
> + if (unlikely(init)) {
> int rsize;
>
> if (!kasan_has_integrated_init())
> @@ -2083,7 +2090,8 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> next = get_freepointer(s, object);
>
> /* If object's reuse doesn't have to be delayed */
> - if (!slab_free_hook(s, object, slab_want_init_on_free(s))) {
> + if (likely(!slab_free_hook(s, object,
> + slab_want_init_on_free(s)))) {
> /* Move object to the new freelist */
> set_freepointer(s, object, *head);
> *head = object;
> @@ -4282,7 +4290,7 @@ static __fastpath_inline void slab_free(struct kmem_cache *s, struct slab *slab,
> * With KASAN enabled slab_free_freelist_hook modifies the freelist
> * to remove objects, whose reuse must be delayed.
> */
> - if (slab_free_freelist_hook(s, &head, &tail, &cnt))
> + if (likely(slab_free_freelist_hook(s, &head, &tail, &cnt)))
> do_slab_free(s, slab, head, tail, cnt, addr);
> }
>
>
> --

Looks good to me,
Reviewed-by: Hyeonggon Yoo <[email protected]>

> 2.42.1
>
>

2023-12-07 02:46:00

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 00/21] remove the SLAB allocator

On Mon, Nov 20, 2023 at 07:34:11PM +0100, Vlastimil Babka wrote:
> Changes from v1:
> - Added new Patch 01 to fix up kernel docs build (thanks Marco Elver)
> - Additional changes to Kconfig user visible texts in Patch 02 (thanks Kees
> Cook)
> - Whitespace fixes and other fixups (thanks Kees)
>
> The SLAB allocator has been deprecated since 6.5 and nobody has objected
> so far. As we agreed at LSF/MM, we should wait with the removal until
> the next LTS kernel is released. This is now determined to be 6.6, and
> we just missed 6.7, so now we can aim for 6.8 and start exposing the
> removal to linux-next during the 6.7 cycle. If nothing substantial pops
> up, will start including this in slab-next later this week.

I've been testing this for a few weeks on my testing system,
It passed a set of mm and slab tests on various SLUB configurations.

For the series, feel free to add:
Tested-by: Hyeonggon Yoo <[email protected]>

Thanks!

> To keep the series reasonably sized and not pull in people from other
> subsystems than mm and closely related ones, I didn't attempt to remove
> every trace of unnecessary reference to dead config options in external
> areas, nor in the defconfigs. Such cleanups can be sent to and handled
> by respective maintainers after this is merged.
>
> Instead I have added some patches aimed to reap some immediate benefits
> of the removal, mainly by not having to split some fastpath code between
> slab_common.c and slub.c anymore. But that is also not an exhaustive
> effort and I expect more cleanups and optimizations will follow later.
>
> Patch 09 updates CREDITS for the removed mm/slab.c. Please point out if
> I missed someone not yet credited.
>
> Git version: https://git.kernel.org/vbabka/l/slab-remove-slab-v2r1