2022-07-12 13:52:31

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 00/15] common kmalloc v3

This is v3 of common kmalloc series.

This series generalize most of kmalloc functions and move its
implementation into mm/slab_common.c.

I believe this series give better maintainability of code for SLAB and SLUB.
Please consider applying.

This series is based on slab/for-next and also available at
https://github.com/hygoni/linux/tree/slab-common-v3r0

Any feedbacks/reviews will be appreciated.

v2->v3:
- Now this series does not unify tracepoints between kmalloc and
kmem_cache_alloc(). And does not print its name.
Instead, it implements common alloc/free function without
tracepoint.

- Dropped UMA version of tracepoints and removed _node suffix
for NUMA version. (Vlastimil Babka)

- Split kmem_alloc event class and defined kmem_cache_alloc and
kmalloc events using TRACE_EVENT() macro. And avoided
dereferencing fields of struct kmem_cache when not using
tracepoints. (Vlastimil Babka)

- Dropped patches cleaning up kmalloc() to avoid increasing size of
kernel image. (Joe Perches)

- Tried to avoid affecting inlined kmalloc calls
and kmem_cache_alloc varients.

- Fixed uninitialized variable bug in SLAB with NUMA.

- Fixed an issue printing useless caller in
kmalloc_node_track_caller() when it calls kmalloc_large_node().

- call WARN_ON() when __ksize() returns zero. (Christoph Lameter)

- Adjusted feedbacks from Vlastimil Babka.
(Coding Style, patch organization, etc.)

Thank you all for feedback :)

===== sequence of patches =====

patch 1-2 make slab_alloc_node() available for non-NUMA
configurations for generalization.

patch 3-5 remove duplicate code in common kmalloc functions.

patch 6-10 move implementation of kmalloc_large{,_node}()
to slab_common.c and make SLAB pass requests larger than
order-1 page to page allocator. (like SLUB)

patch 11-12 generalize most of kmalloc functions.

patch 13 drops kmem_alloc event class and rename
kmem_alloc_node to kmem_alloc. and remove _node suffix
for its events.

patch 14 drop kmem_alloc event class and then define
kmalloc and kmem_cache_alloc events using TRACE_EVENT() macro.
It also avoids dereferencing fields when not using tracepoints.

patch 15-16 are small improvements of __ksize(). They are
not part of generalization but depends on this series.

mm/slab: move NUMA-related code to __do_cache_alloc()
mm/slab: cleanup slab_alloc() and slab_alloc_node()
mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
mm/slab_common: cleanup kmalloc_track_caller()
mm/sl[au]b: factor out __do_kmalloc_node()
mm/sl[auo]b: fold kmalloc_order_trace() into kmalloc_large()
mm/slub: move kmalloc_large_node() to slab_common.c
mm/slab_common: kmalloc_node: pass large requests to page allocator
mm/slab_common: cleanup kmalloc_large()
mm/slab: kmalloc: pass requests larger than order-1 page to page
allocator
mm/sl[au]b: introduce common alloc/free functions without tracepoint
mm/sl[au]b: generalize kmalloc subsystem
mm/slab_common: unify NUMA and UMA version of tracepoints
mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not
using
mm/sl[auo]b: move definition of __ksize() to mm/slab.h
mm/sl[au]b: check if large object is valid in __ksize()

include/linux/slab.h | 117 +++++----------
include/trace/events/kmem.h | 68 +++------
mm/slab.c | 290 ++++++++++--------------------------
mm/slab.h | 11 ++
mm/slab_common.c | 176 +++++++++++++++++++---
mm/slob.c | 28 +---
mm/slub.c | 218 +++------------------------
7 files changed, 320 insertions(+), 588 deletions(-)

--
2.34.1


2022-07-12 13:59:51

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 2/15] mm/slab: cleanup slab_alloc() and slab_alloc_node()

Make slab_alloc_node() available even when CONFIG_NUMA=n and
make slab_alloc() wrapper of slab_alloc_node().

This is necessary for further cleanup.

Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
---
mm/slab.c | 50 +++++++++++++-------------------------------------
1 file changed, 13 insertions(+), 37 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 3d83d17ff3b3..5bcd2b62b5a2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3186,38 +3186,6 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
return obj ? obj : fallback_alloc(cachep, flags);
}

-static void *__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid);
-
-static __always_inline void *
-slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size,
- unsigned long caller)
-{
- unsigned long save_flags;
- void *ptr;
- struct obj_cgroup *objcg = NULL;
- bool init = false;
-
- flags &= gfp_allowed_mask;
- cachep = slab_pre_alloc_hook(cachep, NULL, &objcg, 1, flags);
- if (unlikely(!cachep))
- return NULL;
-
- ptr = kfence_alloc(cachep, orig_size, flags);
- if (unlikely(ptr))
- goto out_hooks;
-
- cache_alloc_debugcheck_before(cachep, flags);
- local_irq_save(save_flags);
- ptr = __do_cache_alloc(cachep, flags, nodeid);
- local_irq_restore(save_flags);
- ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
- init = slab_want_init_on_alloc(flags, cachep);
-
-out_hooks:
- slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr, init);
- return ptr;
-}
-
static __always_inline void *
__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
@@ -3266,8 +3234,8 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid __maybe_unus
#endif /* CONFIG_NUMA */

static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
- size_t orig_size, unsigned long caller)
+slab_alloc_node(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+ int nodeid, size_t orig_size, unsigned long caller)
{
unsigned long save_flags;
void *objp;
@@ -3285,7 +3253,7 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,

cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
- objp = __do_cache_alloc(cachep, flags, NUMA_NO_NODE);
+ objp = __do_cache_alloc(cachep, flags, nodeid);
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
prefetchw(objp);
@@ -3296,6 +3264,14 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
return objp;
}

+static __always_inline void *
+slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+ size_t orig_size, unsigned long caller)
+{
+ return slab_alloc_node(cachep, lru, flags, NUMA_NO_NODE, orig_size,
+ caller);
+}
+
/*
* Caller needs to acquire correct kmem_cache_node's list_lock
* @list: List of detached free slabs should be freed by caller
@@ -3584,7 +3560,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
*/
void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
- void *ret = slab_alloc_node(cachep, flags, nodeid, cachep->object_size, _RET_IP_);
+ void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);

trace_kmem_cache_alloc_node(_RET_IP_, ret, cachep,
cachep->object_size, cachep->size,
@@ -3602,7 +3578,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
{
void *ret;

- ret = slab_alloc_node(cachep, flags, nodeid, size, _RET_IP_);
+ ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);

ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc_node(_RET_IP_, ret, cachep,
--
2.34.1

2022-07-12 14:00:32

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 13/15] mm/slab_common: unify NUMA and UMA version of tracepoints

Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
remove _node postfix for NUMA version of tracepoints.

This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
but at this point maintaining both kmem_alloc and kmem_alloc_node
event classes does not makes sense at all.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
include/trace/events/kmem.h | 60 ++-----------------------------------
mm/slab.c | 18 +++++------
mm/slab_common.c | 18 +++++------
mm/slob.c | 20 ++++++-------
mm/slub.c | 12 ++++----
5 files changed, 35 insertions(+), 93 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4cb51ace600d..e078ebcdc4b1 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -11,62 +11,6 @@

DECLARE_EVENT_CLASS(kmem_alloc,

- TP_PROTO(unsigned long call_site,
- const void *ptr,
- struct kmem_cache *s,
- size_t bytes_req,
- size_t bytes_alloc,
- gfp_t gfp_flags),
-
- TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags),
-
- TP_STRUCT__entry(
- __field( unsigned long, call_site )
- __field( const void *, ptr )
- __field( size_t, bytes_req )
- __field( size_t, bytes_alloc )
- __field( unsigned long, gfp_flags )
- __field( bool, accounted )
- ),
-
- TP_fast_assign(
- __entry->call_site = call_site;
- __entry->ptr = ptr;
- __entry->bytes_req = bytes_req;
- __entry->bytes_alloc = bytes_alloc;
- __entry->gfp_flags = (__force unsigned long)gfp_flags;
- __entry->accounted = IS_ENABLED(CONFIG_MEMCG_KMEM) ?
- ((gfp_flags & __GFP_ACCOUNT) ||
- (s && s->flags & SLAB_ACCOUNT)) : false;
- ),
-
- TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s accounted=%s",
- (void *)__entry->call_site,
- __entry->ptr,
- __entry->bytes_req,
- __entry->bytes_alloc,
- show_gfp_flags(__entry->gfp_flags),
- __entry->accounted ? "true" : "false")
-);
-
-DEFINE_EVENT(kmem_alloc, kmalloc,
-
- TP_PROTO(unsigned long call_site, const void *ptr, struct kmem_cache *s,
- size_t bytes_req, size_t bytes_alloc, gfp_t gfp_flags),
-
- TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags)
-);
-
-DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
-
- TP_PROTO(unsigned long call_site, const void *ptr, struct kmem_cache *s,
- size_t bytes_req, size_t bytes_alloc, gfp_t gfp_flags),
-
- TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags)
-);
-
-DECLARE_EVENT_CLASS(kmem_alloc_node,
-
TP_PROTO(unsigned long call_site,
const void *ptr,
struct kmem_cache *s,
@@ -109,7 +53,7 @@ DECLARE_EVENT_CLASS(kmem_alloc_node,
__entry->accounted ? "true" : "false")
);

-DEFINE_EVENT(kmem_alloc_node, kmalloc_node,
+DEFINE_EVENT(kmem_alloc, kmalloc,

TP_PROTO(unsigned long call_site, const void *ptr,
struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
@@ -118,7 +62,7 @@ DEFINE_EVENT(kmem_alloc_node, kmalloc_node,
TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
);

-DEFINE_EVENT(kmem_alloc_node, kmem_cache_alloc_node,
+DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,

TP_PROTO(unsigned long call_site, const void *ptr,
struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
diff --git a/mm/slab.c b/mm/slab.c
index 6407dad13d5c..a361d2f9d4d9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3440,8 +3440,8 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
{
void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);

- trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
- cachep->object_size, cachep->size, flags);
+ trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
+ cachep->size, flags, NUMA_NO_NODE);

return ret;
}
@@ -3529,7 +3529,7 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)

ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(_RET_IP_, ret, cachep,
- size, cachep->size, flags);
+ size, cachep->size, flags, NUMA_NO_NODE);
return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc_trace);
@@ -3552,9 +3552,9 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);

- trace_kmem_cache_alloc_node(_RET_IP_, ret, cachep,
- cachep->object_size, cachep->size,
- flags, nodeid);
+ trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
+ cachep->object_size, cachep->size,
+ flags, nodeid);

return ret;
}
@@ -3579,9 +3579,9 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);

ret = kasan_kmalloc(cachep, ret, size, flags);
- trace_kmalloc_node(_RET_IP_, ret, cachep,
- size, cachep->size,
- flags, nodeid);
+ trace_kmalloc(_RET_IP_, ret, cachep,
+ size, cachep->size,
+ flags, nodeid);
return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1000e05c77df..0e66b4911ebf 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -933,9 +933,9 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller

if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
ret = kmalloc_large_node_notrace(size, flags, node);
- trace_kmalloc_node(caller, ret, NULL,
- size, PAGE_SIZE << get_order(size),
- flags, node);
+ trace_kmalloc(_RET_IP_, ret, NULL,
+ size, PAGE_SIZE << get_order(size),
+ flags, node);
return ret;
}

@@ -946,8 +946,8 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller

ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
ret = kasan_kmalloc(s, ret, size, flags);
- trace_kmalloc_node(caller, ret, s, size,
- s->size, flags, node);
+ trace_kmalloc(_RET_IP_, ret, s, size,
+ s->size, flags, node);
return ret;
}

@@ -1076,7 +1076,7 @@ void *kmalloc_large(size_t size, gfp_t flags)
void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);

trace_kmalloc(_RET_IP_, ret, NULL, size,
- PAGE_SIZE << get_order(size), flags);
+ PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
return ret;
}
EXPORT_SYMBOL(kmalloc_large);
@@ -1090,8 +1090,8 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
{
void *ret = __kmalloc_large_node_notrace(size, flags, node);

- trace_kmalloc_node(_RET_IP_, ret, NULL, size,
- PAGE_SIZE << get_order(size), flags, node);
+ trace_kmalloc(_RET_IP_, ret, NULL, size,
+ PAGE_SIZE << get_order(size), flags, node);
return ret;
}
EXPORT_SYMBOL(kmalloc_large_node);
@@ -1426,8 +1426,6 @@ EXPORT_SYMBOL(ksize);
/* Tracepoints definitions. */
EXPORT_TRACEPOINT_SYMBOL(kmalloc);
EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
-EXPORT_TRACEPOINT_SYMBOL(kmalloc_node);
-EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc_node);
EXPORT_TRACEPOINT_SYMBOL(kfree);
EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);

diff --git a/mm/slob.c b/mm/slob.c
index 80cdbe4f0d67..a4d50d957c25 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -507,8 +507,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
*m = size;
ret = (void *)m + minalign;

- trace_kmalloc_node(caller, ret, NULL,
- size, size + minalign, gfp, node);
+ trace_kmalloc(caller, ret, NULL,
+ size, size + minalign, gfp, node);
} else {
unsigned int order = get_order(size);

@@ -516,8 +516,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
gfp |= __GFP_COMP;
ret = slob_new_pages(gfp, order, node);

- trace_kmalloc_node(caller, ret, NULL,
- size, PAGE_SIZE << order, gfp, node);
+ trace_kmalloc(caller, ret, NULL,
+ size, PAGE_SIZE << order, gfp, node);
}

kmemleak_alloc(ret, size, 1, gfp);
@@ -608,14 +608,14 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)

if (c->size < PAGE_SIZE) {
b = slob_alloc(c->size, flags, c->align, node, 0);
- trace_kmem_cache_alloc_node(_RET_IP_, b, NULL, c->object_size,
- SLOB_UNITS(c->size) * SLOB_UNIT,
- flags, node);
+ trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
+ SLOB_UNITS(c->size) * SLOB_UNIT,
+ flags, node);
} else {
b = slob_new_pages(flags, get_order(c->size), node);
- trace_kmem_cache_alloc_node(_RET_IP_, b, NULL, c->object_size,
- PAGE_SIZE << get_order(c->size),
- flags, node);
+ trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
+ PAGE_SIZE << get_order(c->size),
+ flags, node);
}

if (b && c->ctor) {
diff --git a/mm/slub.c b/mm/slub.c
index 836292c32e58..f1aa51480dc4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3244,7 +3244,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);

trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
- s->size, gfpflags);
+ s->size, gfpflags, NUMA_NO_NODE);

return ret;
}
@@ -3274,7 +3274,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
- trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags);
+ trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
@@ -3285,8 +3285,8 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);

- trace_kmem_cache_alloc_node(_RET_IP_, ret, s,
- s->object_size, s->size, gfpflags, node);
+ trace_kmem_cache_alloc(_RET_IP_, ret, s,
+ s->object_size, s->size, gfpflags, node);

return ret;
}
@@ -3299,8 +3299,8 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);

- trace_kmalloc_node(_RET_IP_, ret, s,
- size, s->size, gfpflags, node);
+ trace_kmalloc(_RET_IP_, ret, s,
+ size, s->size, gfpflags, node);

ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
--
2.34.1

2022-07-12 14:01:53

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 07/15] mm/slub: move kmalloc_large_node() to slab_common.c

In later patch SLAB will also pass requests larger than order-1 page
to page allocator. Move kmalloc_large_node() to slab_common.c.

Fold kmalloc_large_node_hook() into kmalloc_large_node() as there is
no other caller.

Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
---
include/linux/slab.h | 4 ++++
mm/slab_common.c | 22 ++++++++++++++++++++++
mm/slub.c | 25 -------------------------
3 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 15a4c59da59e..082499306098 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -491,6 +491,10 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g

void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
__alloc_size(1);
+
+void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
+ __alloc_size(1);
+
/**
* kmalloc - allocate memory
* @size: how many bytes of memory are required.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1f8af2106df0..6f855587b635 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -956,6 +956,28 @@ void *kmalloc_large(size_t size, gfp_t flags)
}
EXPORT_SYMBOL(kmalloc_large);

+void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+{
+ struct page *page;
+ void *ptr = NULL;
+ unsigned int order = get_order(size);
+
+ flags |= __GFP_COMP;
+ page = alloc_pages_node(node, flags, order);
+ if (page) {
+ ptr = page_address(page);
+ mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
+ PAGE_SIZE << order);
+ }
+
+ ptr = kasan_kmalloc_large(ptr, size, flags);
+ /* As ptr might get tagged, call kmemleak hook after KASAN. */
+ kmemleak_alloc(ptr, size, 1, flags);
+
+ return ptr;
+}
+EXPORT_SYMBOL(kmalloc_large_node);
+
#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Randomize a generic freelist */
static void freelist_randomize(struct rnd_state *state, unsigned int *list,
diff --git a/mm/slub.c b/mm/slub.c
index 2ccc473e0ae7..f22a84dd27de 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1704,14 +1704,6 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
* Hooks for other subsystems that check memory allocations. In a typical
* production configuration these hooks all should produce no code at all.
*/
-static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
-{
- ptr = kasan_kmalloc_large(ptr, size, flags);
- /* As ptr might get tagged, call kmemleak hook after KASAN. */
- kmemleak_alloc(ptr, size, 1, flags);
- return ptr;
-}
-
static __always_inline void kfree_hook(void *x)
{
kmemleak_free(x);
@@ -4402,23 +4394,6 @@ static int __init setup_slub_min_objects(char *str)

__setup("slub_min_objects=", setup_slub_min_objects);

-static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
-{
- struct page *page;
- void *ptr = NULL;
- unsigned int order = get_order(size);
-
- flags |= __GFP_COMP;
- page = alloc_pages_node(node, flags, order);
- if (page) {
- ptr = page_address(page);
- mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
- PAGE_SIZE << order);
- }
-
- return kmalloc_large_node_hook(ptr, size, flags);
-}
-
static __always_inline
void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
{
--
2.34.1

2022-07-12 14:01:53

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()

There is no caller of kmalloc_order_trace() except kmalloc_large().
Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().

Also add tracepoint in kmalloc_large() that was previously
in kmalloc_order_trace().

Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
---
include/linux/slab.h | 22 ++--------------------
mm/slab_common.c | 14 +++-----------
2 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index a0e57df3d5a4..15a4c59da59e 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -489,26 +489,8 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
}
#endif /* CONFIG_TRACING */

-extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment
- __alloc_size(1);
-
-#ifdef CONFIG_TRACING
-extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
- __assume_page_alignment __alloc_size(1);
-#else
-static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags,
- unsigned int order)
-{
- return kmalloc_order(size, flags, order);
-}
-#endif
-
-static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags)
-{
- unsigned int order = get_order(size);
- return kmalloc_order_trace(size, flags, order);
-}
-
+void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
+ __alloc_size(1);
/**
* kmalloc - allocate memory
* @size: how many bytes of memory are required.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6c9aac5d8f4a..1f8af2106df0 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -932,10 +932,11 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
* directly to the page allocator. We use __GFP_COMP, because we will need to
* know the allocation order to free the pages properly in kfree.
*/
-void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
+void *kmalloc_large(size_t size, gfp_t flags)
{
void *ret = NULL;
struct page *page;
+ unsigned int order = get_order(size);

if (unlikely(flags & GFP_SLAB_BUG_MASK))
flags = kmalloc_fix_flags(flags);
@@ -950,19 +951,10 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
ret = kasan_kmalloc_large(ret, size, flags);
/* As ret might get tagged, call kmemleak hook after KASAN. */
kmemleak_alloc(ret, size, 1, flags);
- return ret;
-}
-EXPORT_SYMBOL(kmalloc_order);
-
-#ifdef CONFIG_TRACING
-void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
-{
- void *ret = kmalloc_order(size, flags, order);
trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
return ret;
}
-EXPORT_SYMBOL(kmalloc_order_trace);
-#endif
+EXPORT_SYMBOL(kmalloc_large);

#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Randomize a generic freelist */
--
2.34.1

2022-07-12 14:02:25

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h

__ksize() is only called by KASAN. Remove export symbol and move
definition to mm/slab.h as we don't want to grow its callers.

Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
---
include/linux/slab.h | 1 -
mm/slab.h | 2 ++
mm/slab_common.c | 11 +----------
mm/slob.c | 1 -
4 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4ee5b2fed164..701fe538650f 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -187,7 +187,6 @@ int kmem_cache_shrink(struct kmem_cache *s);
void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
void kfree(const void *objp);
void kfree_sensitive(const void *objp);
-size_t __ksize(const void *objp);
size_t ksize(const void *objp);
#ifdef CONFIG_PRINTK
bool kmem_valid_obj(void *object);
diff --git a/mm/slab.h b/mm/slab.h
index 9193e9c1f040..ad634e02b3cb 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -678,6 +678,8 @@ void free_large_kmalloc(struct folio *folio, void *object);

#endif /* CONFIG_SLOB */

+size_t __ksize(const void *objp);
+
static inline size_t slab_ksize(const struct kmem_cache *s)
{
#ifndef CONFIG_SLUB
diff --git a/mm/slab_common.c b/mm/slab_common.c
index c01c6b8f0d34..1f8db7959366 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1003,15 +1003,7 @@ void kfree(const void *object)
}
EXPORT_SYMBOL(kfree);

-/**
- * __ksize -- Uninstrumented ksize.
- * @objp: pointer to the object
- *
- * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- * safety checks as ksize() with KASAN instrumentation enabled.
- *
- * Return: size of the actual memory used by @objp in bytes
- */
+/* Uninstrumented ksize. Only called by KASAN. */
size_t __ksize(const void *object)
{
struct folio *folio;
@@ -1026,7 +1018,6 @@ size_t __ksize(const void *object)

return slab_ksize(folio_slab(folio)->slab_cache);
}
-EXPORT_SYMBOL(__ksize);
#endif /* !CONFIG_SLOB */

gfp_t kmalloc_fix_flags(gfp_t flags)
diff --git a/mm/slob.c b/mm/slob.c
index 97a4d2407f96..91d6e2b19929 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -584,7 +584,6 @@ size_t __ksize(const void *block)
m = (unsigned int *)(block - align);
return SLOB_UNITS(*m) * SLOB_UNIT;
}
-EXPORT_SYMBOL(__ksize);

int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
{
--
2.34.1

2022-07-12 14:03:16

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using

Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
using TRACE_EVENT() macro.

And then this patch does:
- Do not pass pointer to struct kmem_cache to trace_kmalloc.
gfp flag is enough to know if it's accounted or not.
- Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
- Avoid dereferencing s->name in when not using kmem_cache_free event.

Suggested-by: Vlastimil Babka <[email protected]>
Signed-off-by: Hyeonggon Yoo <[email protected]>
---
include/trace/events/kmem.h | 64 +++++++++++++++++++++++++------------
mm/slab.c | 16 ++++------
mm/slab_common.c | 17 ++++++----
mm/slob.c | 16 +++-------
mm/slub.c | 13 +++-----
5 files changed, 70 insertions(+), 56 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index e078ebcdc4b1..0ded2c351062 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -9,17 +9,15 @@
#include <linux/tracepoint.h>
#include <trace/events/mmflags.h>

-DECLARE_EVENT_CLASS(kmem_alloc,
+TRACE_EVENT(kmem_cache_alloc,

TP_PROTO(unsigned long call_site,
const void *ptr,
struct kmem_cache *s,
- size_t bytes_req,
- size_t bytes_alloc,
gfp_t gfp_flags,
int node),

- TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node),
+ TP_ARGS(call_site, ptr, s, gfp_flags, node),

TP_STRUCT__entry(
__field( unsigned long, call_site )
@@ -34,8 +32,8 @@ DECLARE_EVENT_CLASS(kmem_alloc,
TP_fast_assign(
__entry->call_site = call_site;
__entry->ptr = ptr;
- __entry->bytes_req = bytes_req;
- __entry->bytes_alloc = bytes_alloc;
+ __entry->bytes_req = s->object_size;
+ __entry->bytes_alloc = s->size;
__entry->gfp_flags = (__force unsigned long)gfp_flags;
__entry->node = node;
__entry->accounted = IS_ENABLED(CONFIG_MEMCG_KMEM) ?
@@ -53,22 +51,46 @@ DECLARE_EVENT_CLASS(kmem_alloc,
__entry->accounted ? "true" : "false")
);

-DEFINE_EVENT(kmem_alloc, kmalloc,
+TRACE_EVENT(kmalloc,

- TP_PROTO(unsigned long call_site, const void *ptr,
- struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
- gfp_t gfp_flags, int node),
+ TP_PROTO(unsigned long call_site,
+ const void *ptr,
+ size_t bytes_req,
+ size_t bytes_alloc,
+ gfp_t gfp_flags,
+ int node),

- TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
-);
+ TP_ARGS(call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node),

-DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
+ TP_STRUCT__entry(
+ __field( unsigned long, call_site )
+ __field( const void *, ptr )
+ __field( size_t, bytes_req )
+ __field( size_t, bytes_alloc )
+ __field( unsigned long, gfp_flags )
+ __field( int, node )
+ __field( bool, accounted )
+ ),

- TP_PROTO(unsigned long call_site, const void *ptr,
- struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
- gfp_t gfp_flags, int node),
+ TP_fast_assign(
+ __entry->call_site = call_site;
+ __entry->ptr = ptr;
+ __entry->bytes_req = bytes_req;
+ __entry->bytes_alloc = bytes_alloc;
+ __entry->gfp_flags = (__force unsigned long)gfp_flags;
+ __entry->node = node;
+ __entry->accounted = IS_ENABLED(CONFIG_MEMCG_KMEM) ?
+ (gfp_flags & __GFP_ACCOUNT) : false;
+ ),

- TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
+ TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
+ (void *)__entry->call_site,
+ __entry->ptr,
+ __entry->bytes_req,
+ __entry->bytes_alloc,
+ show_gfp_flags(__entry->gfp_flags),
+ __entry->node,
+ __entry->accounted ? "true" : "false")
);

TRACE_EVENT(kfree,
@@ -93,20 +115,20 @@ TRACE_EVENT(kfree,

TRACE_EVENT(kmem_cache_free,

- TP_PROTO(unsigned long call_site, const void *ptr, const char *name),
+ TP_PROTO(unsigned long call_site, const void *ptr, const struct kmem_cache *s),

- TP_ARGS(call_site, ptr, name),
+ TP_ARGS(call_site, ptr, s),

TP_STRUCT__entry(
__field( unsigned long, call_site )
__field( const void *, ptr )
- __string( name, name )
+ __string( name, s->name )
),

TP_fast_assign(
__entry->call_site = call_site;
__entry->ptr = ptr;
- __assign_str(name, name);
+ __assign_str(name, s->name);
),

TP_printk("call_site=%pS ptr=%p name=%s",
diff --git a/mm/slab.c b/mm/slab.c
index f9b74831e3f4..1685e5507a59 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3447,8 +3447,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
{
void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);

- trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
- cachep->size, flags, NUMA_NO_NODE);
+ trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, NUMA_NO_NODE);

return ret;
}
@@ -3537,8 +3536,9 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);

ret = kasan_kmalloc(cachep, ret, size, flags);
- trace_kmalloc(_RET_IP_, ret, cachep,
- size, cachep->size, flags, NUMA_NO_NODE);
+ trace_kmalloc(_RET_IP_, ret,
+ size, cachep->size,
+ flags, NUMA_NO_NODE);
return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc_trace);
@@ -3561,9 +3561,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);

- trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
- cachep->object_size, cachep->size,
- flags, nodeid);
+ trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);

return ret;
}
@@ -3588,7 +3586,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);

ret = kasan_kmalloc(cachep, ret, size, flags);
- trace_kmalloc(_RET_IP_, ret, cachep,
+ trace_kmalloc(_RET_IP_, ret,
size, cachep->size,
flags, nodeid);
return ret;
@@ -3652,7 +3650,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
if (!cachep)
return;

- trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
+ trace_kmem_cache_free(_RET_IP_, objp, cachep);
__do_kmem_cache_free(cachep, objp, _RET_IP_);
}
EXPORT_SYMBOL(kmem_cache_free);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0e66b4911ebf..c01c6b8f0d34 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -933,7 +933,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller

if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
ret = kmalloc_large_node_notrace(size, flags, node);
- trace_kmalloc(_RET_IP_, ret, NULL,
+ trace_kmalloc(_RET_IP_, ret,
size, PAGE_SIZE << get_order(size),
flags, node);
return ret;
@@ -946,8 +946,9 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller

ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
ret = kasan_kmalloc(s, ret, size, flags);
- trace_kmalloc(_RET_IP_, ret, s, size,
- s->size, flags, node);
+ trace_kmalloc(_RET_IP_, ret,
+ size, s->size,
+ flags, node);
return ret;
}

@@ -1075,8 +1076,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
{
void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);

- trace_kmalloc(_RET_IP_, ret, NULL, size,
- PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
+ trace_kmalloc(_RET_IP_, ret,
+ size, PAGE_SIZE << get_order(size),
+ flags, NUMA_NO_NODE);
return ret;
}
EXPORT_SYMBOL(kmalloc_large);
@@ -1090,8 +1092,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
{
void *ret = __kmalloc_large_node_notrace(size, flags, node);

- trace_kmalloc(_RET_IP_, ret, NULL, size,
- PAGE_SIZE << get_order(size), flags, node);
+ trace_kmalloc(_RET_IP_, ret,
+ size, PAGE_SIZE << get_order(size),
+ flags, node);
return ret;
}
EXPORT_SYMBOL(kmalloc_large_node);
diff --git a/mm/slob.c b/mm/slob.c
index a4d50d957c25..97a4d2407f96 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -507,8 +507,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
*m = size;
ret = (void *)m + minalign;

- trace_kmalloc(caller, ret, NULL,
- size, size + minalign, gfp, node);
+ trace_kmalloc(caller, ret, size, size + minalign, gfp, node);
} else {
unsigned int order = get_order(size);

@@ -516,8 +515,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
gfp |= __GFP_COMP;
ret = slob_new_pages(gfp, order, node);

- trace_kmalloc(caller, ret, NULL,
- size, PAGE_SIZE << order, gfp, node);
+ trace_kmalloc(caller, ret, size, PAGE_SIZE << order, gfp, node);
}

kmemleak_alloc(ret, size, 1, gfp);
@@ -608,14 +606,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)

if (c->size < PAGE_SIZE) {
b = slob_alloc(c->size, flags, c->align, node, 0);
- trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
- SLOB_UNITS(c->size) * SLOB_UNIT,
- flags, node);
+ trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
} else {
b = slob_new_pages(flags, get_order(c->size), node);
- trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
- PAGE_SIZE << get_order(c->size),
- flags, node);
+ trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
}

if (b && c->ctor) {
@@ -671,7 +665,7 @@ static void kmem_rcu_free(struct rcu_head *head)
void kmem_cache_free(struct kmem_cache *c, void *b)
{
kmemleak_free_recursive(b, c->flags);
- trace_kmem_cache_free(_RET_IP_, b, c->name);
+ trace_kmem_cache_free(_RET_IP_, b, c);
if (unlikely(c->flags & SLAB_TYPESAFE_BY_RCU)) {
struct slob_rcu *slob_rcu;
slob_rcu = b + (c->size - sizeof(struct slob_rcu));
diff --git a/mm/slub.c b/mm/slub.c
index f1aa51480dc4..a49c69469c64 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3243,8 +3243,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
{
void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);

- trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
- s->size, gfpflags, NUMA_NO_NODE);
+ trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);

return ret;
}
@@ -3274,7 +3273,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
- trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
+ trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
@@ -3285,8 +3284,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);

- trace_kmem_cache_alloc(_RET_IP_, ret, s,
- s->object_size, s->size, gfpflags, node);
+ trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);

return ret;
}
@@ -3299,8 +3297,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);

- trace_kmalloc(_RET_IP_, ret, s,
- size, s->size, gfpflags, node);
+ trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);

ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
@@ -3544,7 +3541,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
s = cache_from_obj(s, x);
if (!s)
return;
- trace_kmem_cache_free(_RET_IP_, x, s->name);
+ trace_kmem_cache_free(_RET_IP_, x, s);
slab_free(s, virt_to_slab(x), x, NULL, &x, 1, _RET_IP_);
}
EXPORT_SYMBOL(kmem_cache_free);
--
2.34.1

2022-07-12 14:03:58

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 12/15] mm/sl[au]b: generalize kmalloc subsystem

Now everything in kmalloc subsystem can be generalized.
Let's do it!

Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(),
kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them
to slab_common.c.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/slab.c | 108 -----------------------------------------------
mm/slab_common.c | 102 ++++++++++++++++++++++++++++++++++++++++++++
mm/slub.c | 87 --------------------------------------
3 files changed, 102 insertions(+), 195 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 375e35c14430..6407dad13d5c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3587,44 +3587,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
#endif

-static __always_inline void *
-__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
-{
- struct kmem_cache *cachep;
- void *ret;
-
- if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
- ret = kmalloc_large_node_notrace(size, flags, node);
-
- trace_kmalloc_node(caller, ret, NULL, size,
- PAGE_SIZE << get_order(size),
- flags, node);
- return ret;
- }
-
- cachep = kmalloc_slab(size, flags);
- if (unlikely(ZERO_OR_NULL_PTR(cachep)))
- return cachep;
-
- ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
- ret = kasan_kmalloc(cachep, ret, size, flags);
-
- return ret;
-}
-
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
- return __do_kmalloc_node(size, flags, node, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
- int node, unsigned long caller)
-{
- return __do_kmalloc_node(size, flags, node, caller);
-}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
-
#ifdef CONFIG_PRINTK
void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
{
@@ -3647,12 +3609,6 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
}
#endif

-void *__kmalloc(size_t size, gfp_t flags)
-{
- return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc);
-
static __always_inline
void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
unsigned long caller)
@@ -3730,43 +3686,6 @@ void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
}
EXPORT_SYMBOL(kmem_cache_free_bulk);

-/**
- * kfree - free previously allocated memory
- * @objp: pointer returned by kmalloc.
- *
- * If @objp is NULL, no operation is performed.
- *
- * Don't free memory not originally allocated by kmalloc()
- * or you will run into trouble.
- */
-void kfree(const void *objp)
-{
- struct kmem_cache *c;
- unsigned long flags;
- struct folio *folio;
-
- trace_kfree(_RET_IP_, objp);
-
- if (unlikely(ZERO_OR_NULL_PTR(objp)))
- return;
-
- folio = virt_to_folio(objp);
- if (!folio_test_slab(folio)) {
- free_large_kmalloc(folio, (void *)objp);
- return;
- }
-
- c = folio_slab(folio)->slab_cache;
-
- local_irq_save(flags);
- kfree_debugcheck(objp);
- debug_check_no_locks_freed(objp, c->object_size);
- debug_check_no_obj_freed(objp, c->object_size);
- __cache_free(c, (void *)objp, _RET_IP_);
- local_irq_restore(flags);
-}
-EXPORT_SYMBOL(kfree);
-
/*
* This initializes kmem_cache_node or resizes various caches for all nodes.
*/
@@ -4169,30 +4088,3 @@ void __check_heap_object(const void *ptr, unsigned long n,
usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
}
#endif /* CONFIG_HARDENED_USERCOPY */
-
-/**
- * __ksize -- Uninstrumented ksize.
- * @objp: pointer to the object
- *
- * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- * safety checks as ksize() with KASAN instrumentation enabled.
- *
- * Return: size of the actual memory used by @objp in bytes
- */
-size_t __ksize(const void *objp)
-{
- struct kmem_cache *c;
- struct folio *folio;
-
- BUG_ON(!objp);
- if (unlikely(objp == ZERO_SIZE_PTR))
- return 0;
-
- folio = virt_to_folio(objp);
- if (!folio_test_slab(folio))
- return folio_size(folio);
-
- c = folio_slab(folio)->slab_cache;
- return c->object_size;
-}
-EXPORT_SYMBOL(__ksize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0dff46fb7193..1000e05c77df 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -924,6 +924,108 @@ void free_large_kmalloc(struct folio *folio, void *object)
-(PAGE_SIZE << order));
__free_pages(folio_page(folio, 0), order);
}
+
+static __always_inline
+void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
+{
+ struct kmem_cache *s;
+ void *ret;
+
+ if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
+ ret = kmalloc_large_node_notrace(size, flags, node);
+ trace_kmalloc_node(caller, ret, NULL,
+ size, PAGE_SIZE << get_order(size),
+ flags, node);
+ return ret;
+ }
+
+ s = kmalloc_slab(size, flags);
+
+ if (unlikely(ZERO_OR_NULL_PTR(s)))
+ return s;
+
+ ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
+ ret = kasan_kmalloc(s, ret, size, flags);
+ trace_kmalloc_node(caller, ret, s, size,
+ s->size, flags, node);
+ return ret;
+}
+
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+ return __do_kmalloc_node(size, flags, node, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc_node);
+
+void *__kmalloc(size_t size, gfp_t flags)
+{
+ return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
+ int node, unsigned long caller)
+{
+ return __do_kmalloc_node(size, flags, node, caller);
+}
+EXPORT_SYMBOL(__kmalloc_node_track_caller);
+
+/**
+ * kfree - free previously allocated memory
+ * @objp: pointer returned by kmalloc.
+ *
+ * If @objp is NULL, no operation is performed.
+ *
+ * Don't free memory not originally allocated by kmalloc()
+ * or you will run into trouble.
+ */
+void kfree(const void *object)
+{
+ struct folio *folio;
+ struct slab *slab;
+ struct kmem_cache *s;
+
+ trace_kfree(_RET_IP_, object);
+
+ if (unlikely(ZERO_OR_NULL_PTR(object)))
+ return;
+
+ folio = virt_to_folio(object);
+ if (unlikely(!folio_test_slab(folio))) {
+ free_large_kmalloc(folio, (void *)object);
+ return;
+ }
+
+ slab = folio_slab(folio);
+ s = slab->slab_cache;
+ __kmem_cache_free(s, (void *)object, _RET_IP_);
+}
+EXPORT_SYMBOL(kfree);
+
+/**
+ * __ksize -- Uninstrumented ksize.
+ * @objp: pointer to the object
+ *
+ * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
+ * safety checks as ksize() with KASAN instrumentation enabled.
+ *
+ * Return: size of the actual memory used by @objp in bytes
+ */
+size_t __ksize(const void *object)
+{
+ struct folio *folio;
+
+ if (unlikely(object == ZERO_SIZE_PTR))
+ return 0;
+
+ folio = virt_to_folio(object);
+
+ if (unlikely(!folio_test_slab(folio)))
+ return folio_size(folio);
+
+ return slab_ksize(folio_slab(folio)->slab_cache);
+}
+EXPORT_SYMBOL(__ksize);
#endif /* !CONFIG_SLOB */

gfp_t kmalloc_fix_flags(gfp_t flags)
diff --git a/mm/slub.c b/mm/slub.c
index 74eb78678c98..836292c32e58 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4388,49 +4388,6 @@ static int __init setup_slub_min_objects(char *str)

__setup("slub_min_objects=", setup_slub_min_objects);

-static __always_inline
-void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
-{
- struct kmem_cache *s;
- void *ret;
-
- if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
- ret = kmalloc_large_node_notrace(size, flags, node);
-
- trace_kmalloc_node(caller, ret, NULL,
- size, PAGE_SIZE << get_order(size),
- flags, node);
-
- return ret;
- }
-
- s = kmalloc_slab(size, flags);
-
- if (unlikely(ZERO_OR_NULL_PTR(s)))
- return s;
-
- ret = slab_alloc_node(s, NULL, flags, node, caller, size);
-
- trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
-
- ret = kasan_kmalloc(s, ret, size, flags);
-
- return ret;
-}
-
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
- return __do_kmalloc_node(size, flags, node, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void *__kmalloc(size_t size, gfp_t flags)
-{
- return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc);
-
-
#ifdef CONFIG_HARDENED_USERCOPY
/*
* Rejects incorrectly sized objects and objects that are to be copied
@@ -4481,43 +4438,6 @@ void __check_heap_object(const void *ptr, unsigned long n,
}
#endif /* CONFIG_HARDENED_USERCOPY */

-size_t __ksize(const void *object)
-{
- struct folio *folio;
-
- if (unlikely(object == ZERO_SIZE_PTR))
- return 0;
-
- folio = virt_to_folio(object);
-
- if (unlikely(!folio_test_slab(folio)))
- return folio_size(folio);
-
- return slab_ksize(folio_slab(folio)->slab_cache);
-}
-EXPORT_SYMBOL(__ksize);
-
-void kfree(const void *x)
-{
- struct folio *folio;
- struct slab *slab;
- void *object = (void *)x;
-
- trace_kfree(_RET_IP_, x);
-
- if (unlikely(ZERO_OR_NULL_PTR(x)))
- return;
-
- folio = virt_to_folio(x);
- if (unlikely(!folio_test_slab(folio))) {
- free_large_kmalloc(folio, object);
- return;
- }
- slab = folio_slab(folio);
- slab_free(slab->slab_cache, slab, object, NULL, &object, 1, _RET_IP_);
-}
-EXPORT_SYMBOL(kfree);
-
#define SHRINK_PROMOTE_MAX 32

/*
@@ -4863,13 +4783,6 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
return 0;
}

-void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
- int node, unsigned long caller)
-{
- return __do_kmalloc_node(size, gfpflags, node, caller);
-}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
-
#ifdef CONFIG_SYSFS
static int count_inuse(struct slab *slab)
{
--
2.34.1

2022-07-12 14:05:09

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 09/15] mm/slab_common: cleanup kmalloc_large()

Now that kmalloc_large() and kmalloc_large_node() do mostly same job,
make kmalloc_large() wrapper of __kmalloc_large_node_notrace().

In the meantime, add missing flag fix code in
__kmalloc_large_node_notrace().

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/slab_common.c | 36 +++++++++++++-----------------------
1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index dc872e0ef0fc..9c46e2f9589f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -932,29 +932,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
* directly to the page allocator. We use __GFP_COMP, because we will need to
* know the allocation order to free the pages properly in kfree.
*/
-void *kmalloc_large(size_t size, gfp_t flags)
-{
- void *ret = NULL;
- struct page *page;
- unsigned int order = get_order(size);
-
- if (unlikely(flags & GFP_SLAB_BUG_MASK))
- flags = kmalloc_fix_flags(flags);
-
- flags |= __GFP_COMP;
- page = alloc_pages(flags, order);
- if (likely(page)) {
- ret = page_address(page);
- mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
- PAGE_SIZE << order);
- }
- ret = kasan_kmalloc_large(ret, size, flags);
- /* As ret might get tagged, call kmemleak hook after KASAN. */
- kmemleak_alloc(ret, size, 1, flags);
- trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
- return ret;
-}
-EXPORT_SYMBOL(kmalloc_large);

static __always_inline
void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
@@ -963,6 +940,9 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
void *ptr = NULL;
unsigned int order = get_order(size);

+ if (unlikely(flags & GFP_SLAB_BUG_MASK))
+ flags = kmalloc_fix_flags(flags);
+
flags |= __GFP_COMP;
page = alloc_pages_node(node, flags, order);
if (page) {
@@ -978,6 +958,16 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
return ptr;
}

+void *kmalloc_large(size_t size, gfp_t flags)
+{
+ void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
+
+ trace_kmalloc(_RET_IP_, ret, NULL, size,
+ PAGE_SIZE << get_order(size), flags);
+ return ret;
+}
+EXPORT_SYMBOL(kmalloc_large);
+
void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
{
return __kmalloc_large_node_notrace(size, flags, node);
--
2.34.1

2022-07-12 14:06:03

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc()

To implement slab_alloc_node() independent of NUMA configuration,
move NUMA fallback/alternate allocation code into __do_cache_alloc().

One functional change here is not to check availability of node
when allocating from local node.

Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
---

v3:
Fixed uninitialized variable bug due to missing
NULL-initialization of variable objp.

mm/slab.c | 68 +++++++++++++++++++++++++------------------------------
1 file changed, 31 insertions(+), 37 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 764cbadba69c..3d83d17ff3b3 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3186,13 +3186,14 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
return obj ? obj : fallback_alloc(cachep, flags);
}

+static void *__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid);
+
static __always_inline void *
slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size,
unsigned long caller)
{
unsigned long save_flags;
void *ptr;
- int slab_node = numa_mem_id();
struct obj_cgroup *objcg = NULL;
bool init = false;

@@ -3207,30 +3208,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_

cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
-
- if (nodeid == NUMA_NO_NODE)
- nodeid = slab_node;
-
- if (unlikely(!get_node(cachep, nodeid))) {
- /* Node not bootstrapped yet */
- ptr = fallback_alloc(cachep, flags);
- goto out;
- }
-
- if (nodeid == slab_node) {
- /*
- * Use the locally cached objects if possible.
- * However ____cache_alloc does not allow fallback
- * to other nodes. It may fail while we still have
- * objects on other nodes available.
- */
- ptr = ____cache_alloc(cachep, flags);
- if (ptr)
- goto out;
- }
- /* ___cache_alloc_node can fall back to other nodes */
- ptr = ____cache_alloc_node(cachep, flags, nodeid);
-out:
+ ptr = __do_cache_alloc(cachep, flags, nodeid);
local_irq_restore(save_flags);
ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
init = slab_want_init_on_alloc(flags, cachep);
@@ -3241,31 +3219,46 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
}

static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
+__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
- void *objp;
+ void *objp = NULL;
+ int slab_node = numa_mem_id();

- if (current->mempolicy || cpuset_do_slab_mem_spread()) {
- objp = alternate_node_alloc(cache, flags);
- if (objp)
- goto out;
+ if (nodeid == NUMA_NO_NODE) {
+ if (current->mempolicy || cpuset_do_slab_mem_spread()) {
+ objp = alternate_node_alloc(cachep, flags);
+ if (objp)
+ goto out;
+ }
+ /*
+ * Use the locally cached objects if possible.
+ * However ____cache_alloc does not allow fallback
+ * to other nodes. It may fail while we still have
+ * objects on other nodes available.
+ */
+ objp = ____cache_alloc(cachep, flags);
+ nodeid = slab_node;
+ } else if (nodeid == slab_node) {
+ objp = ____cache_alloc(cachep, flags);
+ } else if (!get_node(cachep, nodeid)) {
+ /* Node not bootstrapped yet */
+ objp = fallback_alloc(cachep, flags);
+ goto out;
}
- objp = ____cache_alloc(cache, flags);

/*
* We may just have run out of memory on the local node.
* ____cache_alloc_node() knows how to locate memory on other nodes
*/
if (!objp)
- objp = ____cache_alloc_node(cache, flags, numa_mem_id());
-
+ objp = ____cache_alloc_node(cachep, flags, nodeid);
out:
return objp;
}
#else

static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
+__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid __maybe_unused)
{
return ____cache_alloc(cachep, flags);
}
@@ -3292,7 +3285,7 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,

cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
- objp = __do_cache_alloc(cachep, flags);
+ objp = __do_cache_alloc(cachep, flags, NUMA_NO_NODE);
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
prefetchw(objp);
@@ -3531,7 +3524,8 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,

local_irq_disable();
for (i = 0; i < size; i++) {
- void *objp = kfence_alloc(s, s->object_size, flags) ?: __do_cache_alloc(s, flags);
+ void *objp = kfence_alloc(s, s->object_size, flags) ?:
+ __do_cache_alloc(s, flags, NUMA_NO_NODE);

if (unlikely(!objp))
goto error;
--
2.34.1

2022-07-12 14:09:08

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 10/15] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator

There is not much benefit for serving large objects in kmalloc().
Let's pass large requests to page allocator like SLUB for better
maintenance of common code.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
v2->v3:
This patch is basically same but it now depends on
kmalloc_large_node_notrace().

include/linux/slab.h | 23 ++++-------------
mm/slab.c | 60 +++++++++++++++++++++++++++++++-------------
mm/slab.h | 3 +++
mm/slab_common.c | 25 ++++++++++++------
mm/slub.c | 19 --------------
5 files changed, 68 insertions(+), 62 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index fd2e129fc813..4ee5b2fed164 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -243,27 +243,17 @@ static inline unsigned int arch_slab_minalign(void)

#ifdef CONFIG_SLAB
/*
- * The largest kmalloc size supported by the SLAB allocators is
- * 32 megabyte (2^25) or the maximum allocatable page order if that is
- * less than 32 MB.
- *
- * WARNING: Its not easy to increase this value since the allocators have
- * to do various tricks to work around compiler limitations in order to
- * ensure proper constant folding.
+ * SLAB and SLUB directly allocates requests fitting in to an order-1 page
+ * (PAGE_SIZE*2). Larger requests are passed to the page allocator.
*/
-#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
- (MAX_ORDER + PAGE_SHIFT - 1) : 25)
-#define KMALLOC_SHIFT_MAX KMALLOC_SHIFT_HIGH
+#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
+#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
#ifndef KMALLOC_SHIFT_LOW
#define KMALLOC_SHIFT_LOW 5
#endif
#endif

#ifdef CONFIG_SLUB
-/*
- * SLUB directly allocates requests fitting in to an order-1 page
- * (PAGE_SIZE*2). Larger requests are passed to the page allocator.
- */
#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
#ifndef KMALLOC_SHIFT_LOW
@@ -415,10 +405,6 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
if (size <= 512 * 1024) return 19;
if (size <= 1024 * 1024) return 20;
if (size <= 2 * 1024 * 1024) return 21;
- if (size <= 4 * 1024 * 1024) return 22;
- if (size <= 8 * 1024 * 1024) return 23;
- if (size <= 16 * 1024 * 1024) return 24;
- if (size <= 32 * 1024 * 1024) return 25;

if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
@@ -428,6 +414,7 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
/* Will never be reached. Needed because the compiler may complain */
return -1;
}
+static_assert(PAGE_SHIFT <= 20);
#define kmalloc_index(s) __kmalloc_index(s, true)
#endif /* !CONFIG_SLOB */

diff --git a/mm/slab.c b/mm/slab.c
index ab34727d61b2..a2f43425a0ae 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3585,11 +3585,19 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
struct kmem_cache *cachep;
void *ret;

- if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
- return NULL;
+ if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
+ ret = kmalloc_large_node_notrace(size, flags, node);
+
+ trace_kmalloc_node(caller, ret, NULL, size,
+ PAGE_SIZE << get_order(size),
+ flags, node);
+ return ret;
+ }
+
cachep = kmalloc_slab(size, flags);
if (unlikely(ZERO_OR_NULL_PTR(cachep)))
return cachep;
+
ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
ret = kasan_kmalloc(cachep, ret, size, flags);

@@ -3664,17 +3672,27 @@ EXPORT_SYMBOL(kmem_cache_free);

void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
{
- struct kmem_cache *s;
- size_t i;

local_irq_disable();
- for (i = 0; i < size; i++) {
+ for (int i = 0; i < size; i++) {
void *objp = p[i];
+ struct kmem_cache *s;

- if (!orig_s) /* called via kfree_bulk */
- s = virt_to_cache(objp);
- else
+ if (!orig_s) {
+ struct folio *folio = virt_to_folio(objp);
+
+ /* called via kfree_bulk */
+ if (!folio_test_slab(folio)) {
+ local_irq_enable();
+ free_large_kmalloc(folio, objp);
+ local_irq_disable();
+ continue;
+ }
+ s = folio_slab(folio)->slab_cache;
+ } else {
s = cache_from_obj(orig_s, objp);
+ }
+
if (!s)
continue;

@@ -3703,20 +3721,24 @@ void kfree(const void *objp)
{
struct kmem_cache *c;
unsigned long flags;
+ struct folio *folio;

trace_kfree(_RET_IP_, objp);

if (unlikely(ZERO_OR_NULL_PTR(objp)))
return;
- local_irq_save(flags);
- kfree_debugcheck(objp);
- c = virt_to_cache(objp);
- if (!c) {
- local_irq_restore(flags);
+
+ folio = virt_to_folio(objp);
+ if (!folio_test_slab(folio)) {
+ free_large_kmalloc(folio, (void *)objp);
return;
}
- debug_check_no_locks_freed(objp, c->object_size);

+ c = folio_slab(folio)->slab_cache;
+
+ local_irq_save(flags);
+ kfree_debugcheck(objp);
+ debug_check_no_locks_freed(objp, c->object_size);
debug_check_no_obj_freed(objp, c->object_size);
__cache_free(c, (void *)objp, _RET_IP_);
local_irq_restore(flags);
@@ -4138,15 +4160,17 @@ void __check_heap_object(const void *ptr, unsigned long n,
size_t __ksize(const void *objp)
{
struct kmem_cache *c;
- size_t size;
+ struct folio *folio;

BUG_ON(!objp);
if (unlikely(objp == ZERO_SIZE_PTR))
return 0;

- c = virt_to_cache(objp);
- size = c ? c->object_size : 0;
+ folio = virt_to_folio(objp);
+ if (!folio_test_slab(folio))
+ return folio_size(folio);

- return size;
+ c = folio_slab(folio)->slab_cache;
+ return c->object_size;
}
EXPORT_SYMBOL(__ksize);
diff --git a/mm/slab.h b/mm/slab.h
index 7cb51ff44f0c..c81c92d421f1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -669,6 +669,9 @@ static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
print_tracking(cachep, x);
return cachep;
}
+
+void free_large_kmalloc(struct folio *folio, void *object);
+
#endif /* CONFIG_SLOB */

static inline size_t slab_ksize(const struct kmem_cache *s)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c46e2f9589f..0dff46fb7193 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -771,8 +771,8 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)

/*
* kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time.
- * kmalloc_index() supports up to 2^25=32MB, so the final entry of the table is
- * kmalloc-32M.
+ * kmalloc_index() supports up to 2^21=2MB, so the final entry of the table is
+ * kmalloc-2M.
*/
const struct kmalloc_info_struct kmalloc_info[] __initconst = {
INIT_KMALLOC_INFO(0, 0),
@@ -796,11 +796,7 @@ const struct kmalloc_info_struct kmalloc_info[] __initconst = {
INIT_KMALLOC_INFO(262144, 256k),
INIT_KMALLOC_INFO(524288, 512k),
INIT_KMALLOC_INFO(1048576, 1M),
- INIT_KMALLOC_INFO(2097152, 2M),
- INIT_KMALLOC_INFO(4194304, 4M),
- INIT_KMALLOC_INFO(8388608, 8M),
- INIT_KMALLOC_INFO(16777216, 16M),
- INIT_KMALLOC_INFO(33554432, 32M)
+ INIT_KMALLOC_INFO(2097152, 2M)
};

/*
@@ -913,6 +909,21 @@ void __init create_kmalloc_caches(slab_flags_t flags)
/* Kmalloc array is now usable */
slab_state = UP;
}
+
+void free_large_kmalloc(struct folio *folio, void *object)
+{
+ unsigned int order = folio_order(folio);
+
+ if (WARN_ON_ONCE(order == 0))
+ pr_warn_once("object pointer: 0x%p\n", object);
+
+ kmemleak_free(object);
+ kasan_kfree_large(object);
+
+ mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
+ -(PAGE_SIZE << order));
+ __free_pages(folio_page(folio, 0), order);
+}
#endif /* !CONFIG_SLOB */

gfp_t kmalloc_fix_flags(gfp_t flags)
diff --git a/mm/slub.c b/mm/slub.c
index 3d02cf44adf7..6cb7ca27f3b7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1704,12 +1704,6 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
* Hooks for other subsystems that check memory allocations. In a typical
* production configuration these hooks all should produce no code at all.
*/
-static __always_inline void kfree_hook(void *x)
-{
- kmemleak_free(x);
- kasan_kfree_large(x);
-}
-
static __always_inline bool slab_free_hook(struct kmem_cache *s,
void *x, bool init)
{
@@ -3550,19 +3544,6 @@ struct detached_freelist {
struct kmem_cache *s;
};

-static inline void free_large_kmalloc(struct folio *folio, void *object)
-{
- unsigned int order = folio_order(folio);
-
- if (WARN_ON_ONCE(order == 0))
- pr_warn_once("object pointer: 0x%p\n", object);
-
- kfree_hook(object);
- mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
- -(PAGE_SIZE << order));
- __free_pages(folio_page(folio, 0), order);
-}
-
/*
* This function progressively scans the array with free objects (with
* a limited look ahead) and extract objects belonging to the same
--
2.34.1

2022-07-12 14:09:09

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 05/15] mm/sl[au]b: factor out __do_kmalloc_node()

__kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller()
mostly do same job. Factor out common code into __do_kmalloc_node().

Note that this patch also fixes missing kasan_kmalloc() in SLUB's
__kmalloc_node_track_caller().

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/slab.c | 30 +----------------------
mm/slub.c | 71 +++++++++++++++----------------------------------------
2 files changed, 20 insertions(+), 81 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index da2f6a5dd8fa..ab34727d61b2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3631,37 +3631,9 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
}
#endif

-/**
- * __do_kmalloc - allocate memory
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kmalloc).
- * @caller: function caller for debug tracking of the caller
- *
- * Return: pointer to the allocated memory or %NULL in case of error
- */
-static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
- unsigned long caller)
-{
- struct kmem_cache *cachep;
- void *ret;
-
- if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
- return NULL;
- cachep = kmalloc_slab(size, flags);
- if (unlikely(ZERO_OR_NULL_PTR(cachep)))
- return cachep;
- ret = slab_alloc(cachep, NULL, flags, size, caller);
-
- ret = kasan_kmalloc(cachep, ret, size, flags);
- trace_kmalloc(caller, ret, cachep,
- size, cachep->size, flags);
-
- return ret;
-}
-
void *__kmalloc(size_t size, gfp_t flags)
{
- return __do_kmalloc(size, flags, _RET_IP_);
+ return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
}
EXPORT_SYMBOL(__kmalloc);

diff --git a/mm/slub.c b/mm/slub.c
index 7c284535a62b..2ccc473e0ae7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4402,29 +4402,6 @@ static int __init setup_slub_min_objects(char *str)

__setup("slub_min_objects=", setup_slub_min_objects);

-void *__kmalloc(size_t size, gfp_t flags)
-{
- struct kmem_cache *s;
- void *ret;
-
- if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
- return kmalloc_large(size, flags);
-
- s = kmalloc_slab(size, flags);
-
- if (unlikely(ZERO_OR_NULL_PTR(s)))
- return s;
-
- ret = slab_alloc(s, NULL, flags, _RET_IP_, size);
-
- trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags);
-
- ret = kasan_kmalloc(s, ret, size, flags);
-
- return ret;
-}
-EXPORT_SYMBOL(__kmalloc);
-
static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
{
struct page *page;
@@ -4442,7 +4419,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
return kmalloc_large_node_hook(ptr, size, flags);
}

-void *__kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline
+void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
{
struct kmem_cache *s;
void *ret;
@@ -4450,7 +4428,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
ret = kmalloc_large_node(size, flags, node);

- trace_kmalloc_node(_RET_IP_, ret, NULL,
+ trace_kmalloc_node(caller, ret, NULL,
size, PAGE_SIZE << get_order(size),
flags, node);

@@ -4462,16 +4440,28 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;

- ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size);
+ ret = slab_alloc_node(s, NULL, flags, node, caller, size);

- trace_kmalloc_node(_RET_IP_, ret, s, size, s->size, flags, node);
+ trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);

ret = kasan_kmalloc(s, ret, size, flags);

return ret;
}
+
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+ return __do_kmalloc_node(size, flags, node, _RET_IP_);
+}
EXPORT_SYMBOL(__kmalloc_node);

+void *__kmalloc(size_t size, gfp_t flags)
+{
+ return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+
#ifdef CONFIG_HARDENED_USERCOPY
/*
* Rejects incorrectly sized objects and objects that are to be copied
@@ -4905,32 +4895,9 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
}

void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
- int node, unsigned long caller)
+ int node, unsigned long caller)
{
- struct kmem_cache *s;
- void *ret;
-
- if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
- ret = kmalloc_large_node(size, gfpflags, node);
-
- trace_kmalloc_node(caller, ret, NULL,
- size, PAGE_SIZE << get_order(size),
- gfpflags, node);
-
- return ret;
- }
-
- s = kmalloc_slab(size, gfpflags);
-
- if (unlikely(ZERO_OR_NULL_PTR(s)))
- return s;
-
- ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size);
-
- /* Honor the call site pointer we received. */
- trace_kmalloc_node(caller, ret, s, size, s->size, gfpflags, node);
-
- return ret;
+ return __do_kmalloc_node(size, gfpflags, node, caller);
}
EXPORT_SYMBOL(__kmalloc_node_track_caller);

--
2.34.1

2022-07-12 14:26:52

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 11/15] mm/sl[au]b: introduce common alloc/free functions without tracepoint

To unify kmalloc functions in later patch, introduce common alloc/free
functions that does not have tracepoint.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---

v3:
Tried to avoid affecting existing functions.

mm/slab.c | 36 +++++++++++++++++++++++++++++-------
mm/slab.h | 4 ++++
mm/slub.c | 13 +++++++++++++
3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index a2f43425a0ae..375e35c14430 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3560,6 +3560,14 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
}
EXPORT_SYMBOL(kmem_cache_alloc_node);

+void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
+ int nodeid, size_t orig_size,
+ unsigned long caller)
+{
+ return slab_alloc_node(cachep, NULL, flags, nodeid,
+ orig_size, caller);
+}
+
#ifdef CONFIG_TRACING
void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
gfp_t flags,
@@ -3645,6 +3653,26 @@ void *__kmalloc(size_t size, gfp_t flags)
}
EXPORT_SYMBOL(__kmalloc);

+static __always_inline
+void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
+ unsigned long caller)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ debug_check_no_locks_freed(objp, cachep->object_size);
+ if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
+ debug_check_no_obj_freed(objp, cachep->object_size);
+ __cache_free(cachep, objp, caller);
+ local_irq_restore(flags);
+}
+
+void __kmem_cache_free(struct kmem_cache *cachep, void *objp,
+ unsigned long caller)
+{
+ __do_kmem_cache_free(cachep, objp, caller);
+}
+
/**
* kmem_cache_free - Deallocate an object
* @cachep: The cache the allocation was from.
@@ -3655,18 +3683,12 @@ EXPORT_SYMBOL(__kmalloc);
*/
void kmem_cache_free(struct kmem_cache *cachep, void *objp)
{
- unsigned long flags;
cachep = cache_from_obj(cachep, objp);
if (!cachep)
return;

trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
- local_irq_save(flags);
- debug_check_no_locks_freed(objp, cachep->object_size);
- if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
- debug_check_no_obj_freed(objp, cachep->object_size);
- __cache_free(cachep, objp, _RET_IP_);
- local_irq_restore(flags);
+ __do_kmem_cache_free(cachep, objp, _RET_IP_);
}
EXPORT_SYMBOL(kmem_cache_free);

diff --git a/mm/slab.h b/mm/slab.h
index c81c92d421f1..9193e9c1f040 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -275,6 +275,10 @@ void create_kmalloc_caches(slab_flags_t);
struct kmem_cache *kmalloc_slab(size_t, gfp_t);

void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
+void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
+ int node, size_t orig_size,
+ unsigned long caller);
+void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller);
#endif

gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slub.c b/mm/slub.c
index 6cb7ca27f3b7..74eb78678c98 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3262,6 +3262,14 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
}
EXPORT_SYMBOL(kmem_cache_alloc_lru);

+void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
+ int node, size_t orig_size,
+ unsigned long caller)
+{
+ return slab_alloc_node(s, NULL, gfpflags, node,
+ caller, orig_size);
+}
+
#ifdef CONFIG_TRACING
void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
@@ -3526,6 +3534,11 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
}
#endif

+void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
+{
+ slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);
+}
+
void kmem_cache_free(struct kmem_cache *s, void *x)
{
s = cache_from_obj(s, x);
--
2.34.1

2022-07-12 14:28:36

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

__ksize() returns size of objects allocated from slab allocator.
When invalid object is passed to __ksize(), returning zero
prevents further memory corruption and makes caller be able to
check if there is an error.

If address of large object is not beginning of folio or size of
the folio is too small, it must be invalid. Return zero in such cases.

Suggested-by: Vlastimil Babka <[email protected]>
Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/slab_common.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1f8db7959366..0d6cbe9d7ad0 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1013,8 +1013,12 @@ size_t __ksize(const void *object)

folio = virt_to_folio(object);

- if (unlikely(!folio_test_slab(folio)))
+ if (unlikely(!folio_test_slab(folio))) {
+ if (WARN_ON(object != folio_address(folio) ||
+ folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
+ return 0;
return folio_size(folio);
+ }

return slab_ksize(folio_slab(folio)->slab_cache);
}
--
2.34.1

2022-07-12 14:29:54

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v3 03/15] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions

Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n,
remove CONFIG_NUMA ifdefs for common kmalloc functions.

Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
---
include/linux/slab.h | 28 ----------------------------
mm/slab.c | 2 --
mm/slob.c | 5 +----
mm/slub.c | 6 ------
4 files changed, 1 insertion(+), 40 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0fefdf528e0d..4754c834b0e3 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -456,38 +456,18 @@ static __always_inline void kfree_bulk(size_t size, void **p)
kmem_cache_free_bulk(NULL, size, p);
}

-#ifdef CONFIG_NUMA
void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
__alloc_size(1);
void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
__malloc;
-#else
-static __always_inline __alloc_size(1) void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
- return __kmalloc(size, flags);
-}
-
-static __always_inline void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node)
-{
- return kmem_cache_alloc(s, flags);
-}
-#endif

#ifdef CONFIG_TRACING
extern void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
__assume_slab_alignment __alloc_size(3);

-#ifdef CONFIG_NUMA
extern void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t size) __assume_slab_alignment
__alloc_size(4);
-#else
-static __always_inline __alloc_size(4) void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
- gfp_t gfpflags, int node, size_t size)
-{
- return kmem_cache_alloc_trace(s, gfpflags, size);
-}
-#endif /* CONFIG_NUMA */

#else /* CONFIG_TRACING */
static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_cache *s,
@@ -701,20 +681,12 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t
}


-#ifdef CONFIG_NUMA
extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
unsigned long caller) __alloc_size(1);
#define kmalloc_node_track_caller(size, flags, node) \
__kmalloc_node_track_caller(size, flags, node, \
_RET_IP_)

-#else /* CONFIG_NUMA */
-
-#define kmalloc_node_track_caller(size, flags, node) \
- kmalloc_track_caller(size, flags)
-
-#endif /* CONFIG_NUMA */
-
/*
* Shortcuts
*/
diff --git a/mm/slab.c b/mm/slab.c
index 5ca8bb7335dc..81804921c538 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3535,7 +3535,6 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
EXPORT_SYMBOL(kmem_cache_alloc_trace);
#endif

-#ifdef CONFIG_NUMA
/**
* kmem_cache_alloc_node - Allocate an object on the specified node
* @cachep: The cache to allocate from.
@@ -3609,7 +3608,6 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
return __do_kmalloc_node(size, flags, node, caller);
}
EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif /* CONFIG_NUMA */

#ifdef CONFIG_PRINTK
void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
diff --git a/mm/slob.c b/mm/slob.c
index 56421fe461c4..c54aad6b106c 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -536,14 +536,12 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
}
EXPORT_SYMBOL(__kmalloc_track_caller);

-#ifdef CONFIG_NUMA
void *__kmalloc_node_track_caller(size_t size, gfp_t gfp,
int node, unsigned long caller)
{
return __do_kmalloc_node(size, gfp, node, caller);
}
EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif

void kfree(const void *block)
{
@@ -647,7 +645,7 @@ void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, gfp_
return slob_alloc_node(cachep, flags, NUMA_NO_NODE);
}
EXPORT_SYMBOL(kmem_cache_alloc_lru);
-#ifdef CONFIG_NUMA
+
void *__kmalloc_node(size_t size, gfp_t gfp, int node)
{
return __do_kmalloc_node(size, gfp, node, _RET_IP_);
@@ -659,7 +657,6 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int node)
return slob_alloc_node(cachep, gfp, node);
}
EXPORT_SYMBOL(kmem_cache_alloc_node);
-#endif

static void __kmem_cache_free(void *b, int size)
{
diff --git a/mm/slub.c b/mm/slub.c
index 26b00951aad1..a5b3a4484263 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3287,7 +3287,6 @@ void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
EXPORT_SYMBOL(kmem_cache_alloc_trace);
#endif

-#ifdef CONFIG_NUMA
void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
@@ -3314,7 +3313,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
}
EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
#endif
-#endif /* CONFIG_NUMA */

/*
* Slow path handling. This may still be called frequently since objects
@@ -4427,7 +4425,6 @@ void *__kmalloc(size_t size, gfp_t flags)
}
EXPORT_SYMBOL(__kmalloc);

-#ifdef CONFIG_NUMA
static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
{
struct page *page;
@@ -4474,7 +4471,6 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
return ret;
}
EXPORT_SYMBOL(__kmalloc_node);
-#endif /* CONFIG_NUMA */

#ifdef CONFIG_HARDENED_USERCOPY
/*
@@ -4930,7 +4926,6 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
}
EXPORT_SYMBOL(__kmalloc_track_caller);

-#ifdef CONFIG_NUMA
void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
int node, unsigned long caller)
{
@@ -4960,7 +4955,6 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
return ret;
}
EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif

#ifdef CONFIG_SYSFS
static int count_inuse(struct slab *slab)
--
2.34.1

2022-07-12 14:35:57

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc()

On Tue, 12 Jul 2022, Hyeonggon Yoo wrote:

> @@ -3241,31 +3219,46 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
> }
>
> static __always_inline void *
> -__do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
> +__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> {
> - void *objp;
> + void *objp = NULL;
> + int slab_node = numa_mem_id();
>
> - if (current->mempolicy || cpuset_do_slab_mem_spread()) {
> - objp = alternate_node_alloc(cache, flags);
> - if (objp)
> - goto out;
> + if (nodeid == NUMA_NO_NODE) {
> + if (current->mempolicy || cpuset_do_slab_mem_spread()) {
> + objp = alternate_node_alloc(cachep, flags);
> + if (objp)
> + goto out;
> + }
> + /*
> + * Use the locally cached objects if possible.
> + * However ____cache_alloc does not allow fallback
> + * to other nodes. It may fail while we still have
> + * objects on other nodes available.
> + */
> + objp = ____cache_alloc(cachep, flags);
> + nodeid = slab_node;
> + } else if (nodeid == slab_node) {
> + objp = ____cache_alloc(cachep, flags);
> + } else if (!get_node(cachep, nodeid)) {
> + /* Node not bootstrapped yet */
> + objp = fallback_alloc(cachep, flags);
> + goto out;
> }
> - objp = ____cache_alloc(cache, flags);
>
> /*
> * We may just have run out of memory on the local node.
> * ____cache_alloc_node() knows how to locate memory on other nodes
> */
> if (!objp)
> - objp = ____cache_alloc_node(cache, flags, numa_mem_id());
> -
> + objp = ____cache_alloc_node(cachep, flags, nodeid);


Does this preserve the original behavior? nodeid is the parameter passed
to __do_cache_alloc(). numa_mem_id() is the nearest memory node.

2022-07-12 15:57:31

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

On Tue, 12 Jul 2022, Hyeonggon Yoo wrote:

> __ksize() returns size of objects allocated from slab allocator.
> When invalid object is passed to __ksize(), returning zero
> prevents further memory corruption and makes caller be able to
> check if there is an error.
>
> If address of large object is not beginning of folio or size of
> the folio is too small, it must be invalid. Return zero in such cases.

Why return 0 if there is an error and why bother the callers with these
checks. BUG()?

> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1f8db7959366..0d6cbe9d7ad0 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1013,8 +1013,12 @@ size_t __ksize(const void *object)
>
> folio = virt_to_folio(object);
>
> - if (unlikely(!folio_test_slab(folio)))
> + if (unlikely(!folio_test_slab(folio))) {
> + if (WARN_ON(object != folio_address(folio) ||
> + folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))

Hmmm... This may change things a bit. Before this patch it was possible to
determine the storage size of order-0 pages using ksize(). Now this
returns 0?

I guess this is an error since the order-0 page cannot come from slab
allocations.

2022-07-13 09:47:09

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc()

On Tue, Jul 12, 2022 at 04:29:10PM +0200, Christoph Lameter wrote:
> On Tue, 12 Jul 2022, Hyeonggon Yoo wrote:
>
> > @@ -3241,31 +3219,46 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
> > }
> >
> > static __always_inline void *
> > -__do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
> > +__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> > {
> > - void *objp;
> > + void *objp = NULL;
> > + int slab_node = numa_mem_id();
> >
> > - if (current->mempolicy || cpuset_do_slab_mem_spread()) {
> > - objp = alternate_node_alloc(cache, flags);
> > - if (objp)
> > - goto out;
> > + if (nodeid == NUMA_NO_NODE) {
> > + if (current->mempolicy || cpuset_do_slab_mem_spread()) {
> > + objp = alternate_node_alloc(cachep, flags);
> > + if (objp)
> > + goto out;
> > + }
> > + /*
> > + * Use the locally cached objects if possible.
> > + * However ____cache_alloc does not allow fallback
> > + * to other nodes. It may fail while we still have
> > + * objects on other nodes available.
> > + */
> > + objp = ____cache_alloc(cachep, flags);
> > + nodeid = slab_node;
> > + } else if (nodeid == slab_node) {
> > + objp = ____cache_alloc(cachep, flags);
> > + } else if (!get_node(cachep, nodeid)) {
> > + /* Node not bootstrapped yet */
> > + objp = fallback_alloc(cachep, flags);
> > + goto out;
> > }
> > - objp = ____cache_alloc(cache, flags);
> >
> > /*
> > * We may just have run out of memory on the local node.
> > * ____cache_alloc_node() knows how to locate memory on other nodes
> > */
> > if (!objp)
> > - objp = ____cache_alloc_node(cache, flags, numa_mem_id());
> > -
> > + objp = ____cache_alloc_node(cachep, flags, nodeid);
>
>
> Does this preserve the original behavior? nodeid is the parameter passed
> to __do_cache_alloc(). numa_mem_id() is the nearest memory node.

Yes it does preserve the original behavior.

nodeid equals to value of numa_mem_id() when nodeid was NUMA_NO_NODE and
____cache_alloc() failed to allocate.

2022-07-13 09:56:30

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

On Tue, Jul 12, 2022 at 05:13:44PM +0200, Christoph Lameter wrote:
> On Tue, 12 Jul 2022, Hyeonggon Yoo wrote:
>
> > __ksize() returns size of objects allocated from slab allocator.
> > When invalid object is passed to __ksize(), returning zero
> > prevents further memory corruption and makes caller be able to
> > check if there is an error.
> >
> > If address of large object is not beginning of folio or size of
> > the folio is too small, it must be invalid. Return zero in such cases.
>
> Why return 0 if there is an error and why bother the callers with these
> checks. BUG()?

I thought BUG should be used when there is no other solution.

> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 1f8db7959366..0d6cbe9d7ad0 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1013,8 +1013,12 @@ size_t __ksize(const void *object)
> >
> > folio = virt_to_folio(object);
> >
> > - if (unlikely(!folio_test_slab(folio)))
> > + if (unlikely(!folio_test_slab(folio))) {
> > + if (WARN_ON(object != folio_address(folio) ||
> > + folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
>
> Hmmm... This may change things a bit. Before this patch it was possible to
> determine the storage size of order-0 pages using ksize(). Now this
> returns 0?
>
> I guess this is an error since the order-0 page cannot come from slab
> allocations.

comment in ksize() says:
"The caller must guarantee that objp points to a valid object
previously allocated with either kmalloc() or kmem_cache_alloc()."

It should not be used on order-0 page that is not allocated from slab. No?

2022-07-13 10:46:41

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

On Wed, 13 Jul 2022 at 12:07, Christoph Lameter <[email protected]> wrote:
>
> On Wed, 13 Jul 2022, Hyeonggon Yoo wrote:
>
> > > Why return 0 if there is an error and why bother the callers with these
> > > checks. BUG()?
> >
> > I thought BUG should be used when there is no other solution.
>
> Spurios returns of 0 that the caller has to check for is a solution?

It's never really been about the caller checking for 0, see below.

> > > I guess this is an error since the order-0 page cannot come from slab
> > > allocations.
> >
> > comment in ksize() says:
> > "The caller must guarantee that objp points to a valid object
> > previously allocated with either kmalloc() or kmem_cache_alloc()."
> >
> > It should not be used on order-0 page that is not allocated from slab. No?
>
> I guess we would need to check. Code could exist that does this.
>
> Getting a 0 size would be surprising too here. BUG()? Or WARN() and return
> PAGE_SIZE.

We shouldn't crash, so it should be WARN(), but also returning
PAGE_SIZE is bad. The intuition behind returning 0 is to try and make
the buggy code cause less harm to the rest of the kernel.

From [1]:

> Similarly, if you are able to tell if the passed pointer is not a
> valid object some other way, you can do something better - namely,
> return 0. The intuition here is that the caller has a pointer to an
> invalid object, and wants to use ksize() to determine its size, and
> most likely access all those bytes. Arguably, at that point the kernel
> is already in a degrading state. But we can try to not let things get
> worse by having ksize() return 0, in the hopes that it will stop
> corrupting more memory. It won't work in all cases, but should avoid
> things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> bounds the memory accessed corrupting random memory.

[1] https://lore.kernel.org/all/CANpmjNNYt9AG8RrGF0pq2dPbFc=vw2kaOnL2k5+8kfJeEMGuwg@mail.gmail.com/

Thanks,
-- Marco

2022-07-13 10:52:06

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

On Wed, 13 Jul 2022, Hyeonggon Yoo wrote:

> > Why return 0 if there is an error and why bother the callers with these
> > checks. BUG()?
>
> I thought BUG should be used when there is no other solution.

Spurios returns of 0 that the caller has to check for is a solution?

> > I guess this is an error since the order-0 page cannot come from slab
> > allocations.
>
> comment in ksize() says:
> "The caller must guarantee that objp points to a valid object
> previously allocated with either kmalloc() or kmem_cache_alloc()."
>
> It should not be used on order-0 page that is not allocated from slab. No?

I guess we would need to check. Code could exist that does this.

Getting a 0 size would be surprising too here. BUG()? Or WARN() and return
PAGE_SIZE.


2022-07-14 09:27:45

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

On Wed, 13 Jul 2022, Marco Elver wrote:

> We shouldn't crash, so it should be WARN(), but also returning
> PAGE_SIZE is bad. The intuition behind returning 0 is to try and make
> the buggy code cause less harm to the rest of the kernel.
>
> >From [1]:
>
> > Similarly, if you are able to tell if the passed pointer is not a
> > valid object some other way, you can do something better - namely,
> > return 0. The intuition here is that the caller has a pointer to an
> > invalid object, and wants to use ksize() to determine its size, and
> > most likely access all those bytes. Arguably, at that point the kernel
> > is already in a degrading state. But we can try to not let things get
> > worse by having ksize() return 0, in the hopes that it will stop
> > corrupting more memory. It won't work in all cases, but should avoid
> > things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> > bounds the memory accessed corrupting random memory.

"in the hopes that it will stop corrupting memory"!!!???

Do a BUG() then and definitely stop all chances of memory corruption.

2022-07-14 10:39:26

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

On Thu, 14 Jul 2022 at 11:16, Christoph Lameter <[email protected]> wrote:
>
> On Wed, 13 Jul 2022, Marco Elver wrote:
>
> > We shouldn't crash, so it should be WARN(), but also returning
> > PAGE_SIZE is bad. The intuition behind returning 0 is to try and make
> > the buggy code cause less harm to the rest of the kernel.
> >
> > >From [1]:
> >
> > > Similarly, if you are able to tell if the passed pointer is not a
> > > valid object some other way, you can do something better - namely,
> > > return 0. The intuition here is that the caller has a pointer to an
> > > invalid object, and wants to use ksize() to determine its size, and
> > > most likely access all those bytes. Arguably, at that point the kernel
> > > is already in a degrading state. But we can try to not let things get
> > > worse by having ksize() return 0, in the hopes that it will stop
> > > corrupting more memory. It won't work in all cases, but should avoid
> > > things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> > > bounds the memory accessed corrupting random memory.
>
> "in the hopes that it will stop corrupting memory"!!!???
>
> Do a BUG() then and definitely stop all chances of memory corruption.

Fair enough.

Well, I'd also prefer to just kill the kernel. But some people don't
like that and want the option to continue running. So a WARN() gives
that option, and just have to boot the kernel with panic_on_warn to
kill it. There are other warnings in the kernel where we'd better kill
the kernel as the chances of corrupting memory are pretty damn high if
we hit them. And I still don't quite see why the case here is any more
or less special.

If the situation here is exceedingly rare, let's try BUG() and see what breaks?

2022-07-20 10:20:54

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

On Thu, Jul 14, 2022 at 12:30:09PM +0200, Marco Elver wrote:
> On Thu, 14 Jul 2022 at 11:16, Christoph Lameter <[email protected]> wrote:
> >
> > On Wed, 13 Jul 2022, Marco Elver wrote:
> >
> > > We shouldn't crash, so it should be WARN(), but also returning
> > > PAGE_SIZE is bad. The intuition behind returning 0 is to try and make
> > > the buggy code cause less harm to the rest of the kernel.
> > >
> > > >From [1]:
> > >
> > > > Similarly, if you are able to tell if the passed pointer is not a
> > > > valid object some other way, you can do something better - namely,
> > > > return 0. The intuition here is that the caller has a pointer to an
> > > > invalid object, and wants to use ksize() to determine its size, and
> > > > most likely access all those bytes. Arguably, at that point the kernel
> > > > is already in a degrading state. But we can try to not let things get
> > > > worse by having ksize() return 0, in the hopes that it will stop
> > > > corrupting more memory. It won't work in all cases, but should avoid
> > > > things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> > > > bounds the memory accessed corrupting random memory.
> >
> > "in the hopes that it will stop corrupting memory"!!!???
> >
> > Do a BUG() then and definitely stop all chances of memory corruption.
>
> Fair enough.
>
> Well, I'd also prefer to just kill the kernel. But some people don't
> like that and want the option to continue running. So a WARN() gives
> that option, and just have to boot the kernel with panic_on_warn to
> kill it. There are other warnings in the kernel where we'd better kill
> the kernel as the chances of corrupting memory are pretty damn high if
> we hit them. And I still don't quite see why the case here is any more
> or less special.
>
> If the situation here is exceedingly rare, let's try BUG() and see what breaks?

Let's try BUG() for both conditions and replace it with WARN() later
if kernel hit those often.

I'll update this patch in next version.

And I have no strong opinion on returning 0, but if kernel hits it a
lot, I think returning 0 would be more safe as you said.

Thanks,
Hyeonggon

2022-07-28 14:55:38

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 05/15] mm/sl[au]b: factor out __do_kmalloc_node()

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> __kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller()
> mostly do same job. Factor out common code into __do_kmalloc_node().
>
> Note that this patch also fixes missing kasan_kmalloc() in SLUB's
> __kmalloc_node_track_caller().
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

> ---
> mm/slab.c | 30 +----------------------
> mm/slub.c | 71 +++++++++++++++----------------------------------------
> 2 files changed, 20 insertions(+), 81 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index da2f6a5dd8fa..ab34727d61b2 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3631,37 +3631,9 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
> }
> #endif
>
> -/**
> - * __do_kmalloc - allocate memory
> - * @size: how many bytes of memory are required.
> - * @flags: the type of memory to allocate (see kmalloc).
> - * @caller: function caller for debug tracking of the caller
> - *
> - * Return: pointer to the allocated memory or %NULL in case of error
> - */
> -static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
> - unsigned long caller)
> -{
> - struct kmem_cache *cachep;
> - void *ret;
> -
> - if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> - return NULL;
> - cachep = kmalloc_slab(size, flags);
> - if (unlikely(ZERO_OR_NULL_PTR(cachep)))
> - return cachep;
> - ret = slab_alloc(cachep, NULL, flags, size, caller);
> -
> - ret = kasan_kmalloc(cachep, ret, size, flags);
> - trace_kmalloc(caller, ret, cachep,
> - size, cachep->size, flags);
> -
> - return ret;
> -}
> -
> void *__kmalloc(size_t size, gfp_t flags)
> {
> - return __do_kmalloc(size, flags, _RET_IP_);
> + return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> }
> EXPORT_SYMBOL(__kmalloc);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 7c284535a62b..2ccc473e0ae7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4402,29 +4402,6 @@ static int __init setup_slub_min_objects(char *str)
>
> __setup("slub_min_objects=", setup_slub_min_objects);
>
> -void *__kmalloc(size_t size, gfp_t flags)
> -{
> - struct kmem_cache *s;
> - void *ret;
> -
> - if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> - return kmalloc_large(size, flags);
> -
> - s = kmalloc_slab(size, flags);
> -
> - if (unlikely(ZERO_OR_NULL_PTR(s)))
> - return s;
> -
> - ret = slab_alloc(s, NULL, flags, _RET_IP_, size);
> -
> - trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags);
> -
> - ret = kasan_kmalloc(s, ret, size, flags);
> -
> - return ret;
> -}
> -EXPORT_SYMBOL(__kmalloc);
> -
> static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> {
> struct page *page;
> @@ -4442,7 +4419,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> return kmalloc_large_node_hook(ptr, size, flags);
> }
>
> -void *__kmalloc_node(size_t size, gfp_t flags, int node)
> +static __always_inline
> +void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> {
> struct kmem_cache *s;
> void *ret;
> @@ -4450,7 +4428,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
> if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> ret = kmalloc_large_node(size, flags, node);
>
> - trace_kmalloc_node(_RET_IP_, ret, NULL,
> + trace_kmalloc_node(caller, ret, NULL,
> size, PAGE_SIZE << get_order(size),
> flags, node);
>
> @@ -4462,16 +4440,28 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
> if (unlikely(ZERO_OR_NULL_PTR(s)))
> return s;
>
> - ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size);
> + ret = slab_alloc_node(s, NULL, flags, node, caller, size);
>
> - trace_kmalloc_node(_RET_IP_, ret, s, size, s->size, flags, node);
> + trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
>
> ret = kasan_kmalloc(s, ret, size, flags);
>
> return ret;
> }
> +
> +void *__kmalloc_node(size_t size, gfp_t flags, int node)
> +{
> + return __do_kmalloc_node(size, flags, node, _RET_IP_);
> +}
> EXPORT_SYMBOL(__kmalloc_node);
>
> +void *__kmalloc(size_t size, gfp_t flags)
> +{
> + return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> +}
> +EXPORT_SYMBOL(__kmalloc);
> +
> +
> #ifdef CONFIG_HARDENED_USERCOPY
> /*
> * Rejects incorrectly sized objects and objects that are to be copied
> @@ -4905,32 +4895,9 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
> }
>
> void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> - int node, unsigned long caller)
> + int node, unsigned long caller)
> {
> - struct kmem_cache *s;
> - void *ret;
> -
> - if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> - ret = kmalloc_large_node(size, gfpflags, node);
> -
> - trace_kmalloc_node(caller, ret, NULL,
> - size, PAGE_SIZE << get_order(size),
> - gfpflags, node);
> -
> - return ret;
> - }
> -
> - s = kmalloc_slab(size, gfpflags);
> -
> - if (unlikely(ZERO_OR_NULL_PTR(s)))
> - return s;
> -
> - ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size);
> -
> - /* Honor the call site pointer we received. */
> - trace_kmalloc_node(caller, ret, s, size, s->size, gfpflags, node);
> -
> - return ret;
> + return __do_kmalloc_node(size, gfpflags, node, caller);
> }
> EXPORT_SYMBOL(__kmalloc_node_track_caller);
>

2022-07-28 15:59:01

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> There is no caller of kmalloc_order_trace() except kmalloc_large().
> Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().
>
> Also add tracepoint in kmalloc_large() that was previously
> in kmalloc_order_trace().
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>
> Reviewed-by: Vlastimil Babka <[email protected]>

Hmm noticed a small change in how we call trace_kmalloc() which will now
include the __GFP_COMP. I think we could just call alloc_pages() from
kmalloc_large() with flags | __GFP_COMP instead of doing flags |=
__GFP_COMP; first. AFAICS both kasan_kmalloc_large() and kmemleak_alloc()
will filter it out anyway.

> ---
> include/linux/slab.h | 22 ++--------------------
> mm/slab_common.c | 14 +++-----------
> 2 files changed, 5 insertions(+), 31 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index a0e57df3d5a4..15a4c59da59e 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -489,26 +489,8 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
> }
> #endif /* CONFIG_TRACING */
>
> -extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment
> - __alloc_size(1);
> -
> -#ifdef CONFIG_TRACING
> -extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> - __assume_page_alignment __alloc_size(1);
> -#else
> -static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags,
> - unsigned int order)
> -{
> - return kmalloc_order(size, flags, order);
> -}
> -#endif
> -
> -static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags)
> -{
> - unsigned int order = get_order(size);
> - return kmalloc_order_trace(size, flags, order);
> -}
> -
> +void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
> + __alloc_size(1);
> /**
> * kmalloc - allocate memory
> * @size: how many bytes of memory are required.
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 6c9aac5d8f4a..1f8af2106df0 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -932,10 +932,11 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
> * directly to the page allocator. We use __GFP_COMP, because we will need to
> * know the allocation order to free the pages properly in kfree.
> */
> -void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> +void *kmalloc_large(size_t size, gfp_t flags)
> {
> void *ret = NULL;
> struct page *page;
> + unsigned int order = get_order(size);
>
> if (unlikely(flags & GFP_SLAB_BUG_MASK))
> flags = kmalloc_fix_flags(flags);
> @@ -950,19 +951,10 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> ret = kasan_kmalloc_large(ret, size, flags);
> /* As ret might get tagged, call kmemleak hook after KASAN. */
> kmemleak_alloc(ret, size, 1, flags);
> - return ret;
> -}
> -EXPORT_SYMBOL(kmalloc_order);
> -
> -#ifdef CONFIG_TRACING
> -void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> -{
> - void *ret = kmalloc_order(size, flags, order);
> trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
> return ret;
> }
> -EXPORT_SYMBOL(kmalloc_order_trace);
> -#endif
> +EXPORT_SYMBOL(kmalloc_large);
>
> #ifdef CONFIG_SLAB_FREELIST_RANDOM
> /* Randomize a generic freelist */

2022-07-28 16:34:38

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 09/15] mm/slab_common: cleanup kmalloc_large()

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Now that kmalloc_large() and kmalloc_large_node() do mostly same job,
> make kmalloc_large() wrapper of __kmalloc_large_node_notrace().
>
> In the meantime, add missing flag fix code in
> __kmalloc_large_node_notrace().
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

> ---
> mm/slab_common.c | 36 +++++++++++++-----------------------
> 1 file changed, 13 insertions(+), 23 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index dc872e0ef0fc..9c46e2f9589f 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -932,29 +932,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
> * directly to the page allocator. We use __GFP_COMP, because we will need to
> * know the allocation order to free the pages properly in kfree.
> */
> -void *kmalloc_large(size_t size, gfp_t flags)
> -{
> - void *ret = NULL;
> - struct page *page;
> - unsigned int order = get_order(size);
> -
> - if (unlikely(flags & GFP_SLAB_BUG_MASK))
> - flags = kmalloc_fix_flags(flags);
> -
> - flags |= __GFP_COMP;
> - page = alloc_pages(flags, order);
> - if (likely(page)) {
> - ret = page_address(page);
> - mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
> - PAGE_SIZE << order);
> - }
> - ret = kasan_kmalloc_large(ret, size, flags);
> - /* As ret might get tagged, call kmemleak hook after KASAN. */
> - kmemleak_alloc(ret, size, 1, flags);
> - trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
> - return ret;
> -}
> -EXPORT_SYMBOL(kmalloc_large);
>
> static __always_inline
> void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> @@ -963,6 +940,9 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> void *ptr = NULL;
> unsigned int order = get_order(size);
>
> + if (unlikely(flags & GFP_SLAB_BUG_MASK))
> + flags = kmalloc_fix_flags(flags);
> +
> flags |= __GFP_COMP;
> page = alloc_pages_node(node, flags, order);
> if (page) {
> @@ -978,6 +958,16 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> return ptr;
> }
>
> +void *kmalloc_large(size_t size, gfp_t flags)
> +{
> + void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
> +
> + trace_kmalloc(_RET_IP_, ret, NULL, size,
> + PAGE_SIZE << get_order(size), flags);
> + return ret;
> +}
> +EXPORT_SYMBOL(kmalloc_large);
> +
> void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> {
> return __kmalloc_large_node_notrace(size, flags, node);

2022-07-28 16:36:49

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 10/15] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> There is not much benefit for serving large objects in kmalloc().
> Let's pass large requests to page allocator like SLUB for better
> maintenance of common code.
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

> ---
> v2->v3:
> This patch is basically same but it now depends on
> kmalloc_large_node_notrace().
>
> include/linux/slab.h | 23 ++++-------------
> mm/slab.c | 60 +++++++++++++++++++++++++++++++-------------
> mm/slab.h | 3 +++
> mm/slab_common.c | 25 ++++++++++++------
> mm/slub.c | 19 --------------
> 5 files changed, 68 insertions(+), 62 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index fd2e129fc813..4ee5b2fed164 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -243,27 +243,17 @@ static inline unsigned int arch_slab_minalign(void)
>
> #ifdef CONFIG_SLAB
> /*
> - * The largest kmalloc size supported by the SLAB allocators is
> - * 32 megabyte (2^25) or the maximum allocatable page order if that is
> - * less than 32 MB.
> - *
> - * WARNING: Its not easy to increase this value since the allocators have
> - * to do various tricks to work around compiler limitations in order to
> - * ensure proper constant folding.
> + * SLAB and SLUB directly allocates requests fitting in to an order-1 page
> + * (PAGE_SIZE*2). Larger requests are passed to the page allocator.
> */
> -#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
> - (MAX_ORDER + PAGE_SHIFT - 1) : 25)
> -#define KMALLOC_SHIFT_MAX KMALLOC_SHIFT_HIGH
> +#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
> +#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
> #ifndef KMALLOC_SHIFT_LOW
> #define KMALLOC_SHIFT_LOW 5
> #endif
> #endif
>
> #ifdef CONFIG_SLUB
> -/*
> - * SLUB directly allocates requests fitting in to an order-1 page
> - * (PAGE_SIZE*2). Larger requests are passed to the page allocator.
> - */
> #define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
> #define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
> #ifndef KMALLOC_SHIFT_LOW
> @@ -415,10 +405,6 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
> if (size <= 512 * 1024) return 19;
> if (size <= 1024 * 1024) return 20;
> if (size <= 2 * 1024 * 1024) return 21;
> - if (size <= 4 * 1024 * 1024) return 22;
> - if (size <= 8 * 1024 * 1024) return 23;
> - if (size <= 16 * 1024 * 1024) return 24;
> - if (size <= 32 * 1024 * 1024) return 25;
>
> if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
> @@ -428,6 +414,7 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
> /* Will never be reached. Needed because the compiler may complain */
> return -1;
> }
> +static_assert(PAGE_SHIFT <= 20);
> #define kmalloc_index(s) __kmalloc_index(s, true)
> #endif /* !CONFIG_SLOB */
>
> diff --git a/mm/slab.c b/mm/slab.c
> index ab34727d61b2..a2f43425a0ae 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3585,11 +3585,19 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> struct kmem_cache *cachep;
> void *ret;
>
> - if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> - return NULL;
> + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> + ret = kmalloc_large_node_notrace(size, flags, node);
> +
> + trace_kmalloc_node(caller, ret, NULL, size,
> + PAGE_SIZE << get_order(size),
> + flags, node);
> + return ret;
> + }
> +
> cachep = kmalloc_slab(size, flags);
> if (unlikely(ZERO_OR_NULL_PTR(cachep)))
> return cachep;
> +
> ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
> ret = kasan_kmalloc(cachep, ret, size, flags);
>
> @@ -3664,17 +3672,27 @@ EXPORT_SYMBOL(kmem_cache_free);
>
> void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
> {
> - struct kmem_cache *s;
> - size_t i;
>
> local_irq_disable();
> - for (i = 0; i < size; i++) {
> + for (int i = 0; i < size; i++) {
> void *objp = p[i];
> + struct kmem_cache *s;
>
> - if (!orig_s) /* called via kfree_bulk */
> - s = virt_to_cache(objp);
> - else
> + if (!orig_s) {
> + struct folio *folio = virt_to_folio(objp);
> +
> + /* called via kfree_bulk */
> + if (!folio_test_slab(folio)) {
> + local_irq_enable();
> + free_large_kmalloc(folio, objp);
> + local_irq_disable();
> + continue;
> + }
> + s = folio_slab(folio)->slab_cache;
> + } else {
> s = cache_from_obj(orig_s, objp);
> + }
> +
> if (!s)
> continue;
>
> @@ -3703,20 +3721,24 @@ void kfree(const void *objp)
> {
> struct kmem_cache *c;
> unsigned long flags;
> + struct folio *folio;
>
> trace_kfree(_RET_IP_, objp);
>
> if (unlikely(ZERO_OR_NULL_PTR(objp)))
> return;
> - local_irq_save(flags);
> - kfree_debugcheck(objp);
> - c = virt_to_cache(objp);
> - if (!c) {
> - local_irq_restore(flags);
> +
> + folio = virt_to_folio(objp);
> + if (!folio_test_slab(folio)) {
> + free_large_kmalloc(folio, (void *)objp);
> return;
> }
> - debug_check_no_locks_freed(objp, c->object_size);
>
> + c = folio_slab(folio)->slab_cache;
> +
> + local_irq_save(flags);
> + kfree_debugcheck(objp);
> + debug_check_no_locks_freed(objp, c->object_size);
> debug_check_no_obj_freed(objp, c->object_size);
> __cache_free(c, (void *)objp, _RET_IP_);
> local_irq_restore(flags);
> @@ -4138,15 +4160,17 @@ void __check_heap_object(const void *ptr, unsigned long n,
> size_t __ksize(const void *objp)
> {
> struct kmem_cache *c;
> - size_t size;
> + struct folio *folio;
>
> BUG_ON(!objp);
> if (unlikely(objp == ZERO_SIZE_PTR))
> return 0;
>
> - c = virt_to_cache(objp);
> - size = c ? c->object_size : 0;
> + folio = virt_to_folio(objp);
> + if (!folio_test_slab(folio))
> + return folio_size(folio);
>
> - return size;
> + c = folio_slab(folio)->slab_cache;
> + return c->object_size;
> }
> EXPORT_SYMBOL(__ksize);
> diff --git a/mm/slab.h b/mm/slab.h
> index 7cb51ff44f0c..c81c92d421f1 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -669,6 +669,9 @@ static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
> print_tracking(cachep, x);
> return cachep;
> }
> +
> +void free_large_kmalloc(struct folio *folio, void *object);
> +
> #endif /* CONFIG_SLOB */
>
> static inline size_t slab_ksize(const struct kmem_cache *s)
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c46e2f9589f..0dff46fb7193 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -771,8 +771,8 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
>
> /*
> * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time.
> - * kmalloc_index() supports up to 2^25=32MB, so the final entry of the table is
> - * kmalloc-32M.
> + * kmalloc_index() supports up to 2^21=2MB, so the final entry of the table is
> + * kmalloc-2M.
> */
> const struct kmalloc_info_struct kmalloc_info[] __initconst = {
> INIT_KMALLOC_INFO(0, 0),
> @@ -796,11 +796,7 @@ const struct kmalloc_info_struct kmalloc_info[] __initconst = {
> INIT_KMALLOC_INFO(262144, 256k),
> INIT_KMALLOC_INFO(524288, 512k),
> INIT_KMALLOC_INFO(1048576, 1M),
> - INIT_KMALLOC_INFO(2097152, 2M),
> - INIT_KMALLOC_INFO(4194304, 4M),
> - INIT_KMALLOC_INFO(8388608, 8M),
> - INIT_KMALLOC_INFO(16777216, 16M),
> - INIT_KMALLOC_INFO(33554432, 32M)
> + INIT_KMALLOC_INFO(2097152, 2M)
> };
>
> /*
> @@ -913,6 +909,21 @@ void __init create_kmalloc_caches(slab_flags_t flags)
> /* Kmalloc array is now usable */
> slab_state = UP;
> }
> +
> +void free_large_kmalloc(struct folio *folio, void *object)
> +{
> + unsigned int order = folio_order(folio);
> +
> + if (WARN_ON_ONCE(order == 0))
> + pr_warn_once("object pointer: 0x%p\n", object);
> +
> + kmemleak_free(object);
> + kasan_kfree_large(object);
> +
> + mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
> + -(PAGE_SIZE << order));
> + __free_pages(folio_page(folio, 0), order);
> +}
> #endif /* !CONFIG_SLOB */
>
> gfp_t kmalloc_fix_flags(gfp_t flags)
> diff --git a/mm/slub.c b/mm/slub.c
> index 3d02cf44adf7..6cb7ca27f3b7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1704,12 +1704,6 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
> * Hooks for other subsystems that check memory allocations. In a typical
> * production configuration these hooks all should produce no code at all.
> */
> -static __always_inline void kfree_hook(void *x)
> -{
> - kmemleak_free(x);
> - kasan_kfree_large(x);
> -}
> -
> static __always_inline bool slab_free_hook(struct kmem_cache *s,
> void *x, bool init)
> {
> @@ -3550,19 +3544,6 @@ struct detached_freelist {
> struct kmem_cache *s;
> };
>
> -static inline void free_large_kmalloc(struct folio *folio, void *object)
> -{
> - unsigned int order = folio_order(folio);
> -
> - if (WARN_ON_ONCE(order == 0))
> - pr_warn_once("object pointer: 0x%p\n", object);
> -
> - kfree_hook(object);
> - mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
> - -(PAGE_SIZE << order));
> - __free_pages(folio_page(folio, 0), order);
> -}
> -
> /*
> * This function progressively scans the array with free objects (with
> * a limited look ahead) and extract objects belonging to the same

2022-07-29 10:06:10

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 11/15] mm/sl[au]b: introduce common alloc/free functions without tracepoint

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> To unify kmalloc functions in later patch, introduce common alloc/free
> functions that does not have tracepoint.
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

> ---
>
> v3:
> Tried to avoid affecting existing functions.
>
> mm/slab.c | 36 +++++++++++++++++++++++++++++-------
> mm/slab.h | 4 ++++
> mm/slub.c | 13 +++++++++++++
> 3 files changed, 46 insertions(+), 7 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index a2f43425a0ae..375e35c14430 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3560,6 +3560,14 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> }
> EXPORT_SYMBOL(kmem_cache_alloc_node);
>
> +void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
> + int nodeid, size_t orig_size,
> + unsigned long caller)
> +{
> + return slab_alloc_node(cachep, NULL, flags, nodeid,
> + orig_size, caller);
> +}
> +
> #ifdef CONFIG_TRACING
> void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
> gfp_t flags,
> @@ -3645,6 +3653,26 @@ void *__kmalloc(size_t size, gfp_t flags)
> }
> EXPORT_SYMBOL(__kmalloc);
>
> +static __always_inline
> +void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
> + unsigned long caller)
> +{
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + debug_check_no_locks_freed(objp, cachep->object_size);
> + if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
> + debug_check_no_obj_freed(objp, cachep->object_size);
> + __cache_free(cachep, objp, caller);
> + local_irq_restore(flags);
> +}
> +
> +void __kmem_cache_free(struct kmem_cache *cachep, void *objp,
> + unsigned long caller)
> +{
> + __do_kmem_cache_free(cachep, objp, caller);
> +}
> +
> /**
> * kmem_cache_free - Deallocate an object
> * @cachep: The cache the allocation was from.
> @@ -3655,18 +3683,12 @@ EXPORT_SYMBOL(__kmalloc);
> */
> void kmem_cache_free(struct kmem_cache *cachep, void *objp)
> {
> - unsigned long flags;
> cachep = cache_from_obj(cachep, objp);
> if (!cachep)
> return;
>
> trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
> - local_irq_save(flags);
> - debug_check_no_locks_freed(objp, cachep->object_size);
> - if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
> - debug_check_no_obj_freed(objp, cachep->object_size);
> - __cache_free(cachep, objp, _RET_IP_);
> - local_irq_restore(flags);
> + __do_kmem_cache_free(cachep, objp, _RET_IP_);
> }
> EXPORT_SYMBOL(kmem_cache_free);
>
> diff --git a/mm/slab.h b/mm/slab.h
> index c81c92d421f1..9193e9c1f040 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -275,6 +275,10 @@ void create_kmalloc_caches(slab_flags_t);
> struct kmem_cache *kmalloc_slab(size_t, gfp_t);
>
> void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
> +void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> + int node, size_t orig_size,
> + unsigned long caller);
> +void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller);
> #endif
>
> gfp_t kmalloc_fix_flags(gfp_t flags);
> diff --git a/mm/slub.c b/mm/slub.c
> index 6cb7ca27f3b7..74eb78678c98 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3262,6 +3262,14 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> }
> EXPORT_SYMBOL(kmem_cache_alloc_lru);
>
> +void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> + int node, size_t orig_size,
> + unsigned long caller)
> +{
> + return slab_alloc_node(s, NULL, gfpflags, node,
> + caller, orig_size);
> +}
> +
> #ifdef CONFIG_TRACING
> void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
> {
> @@ -3526,6 +3534,11 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
> }
> #endif
>
> +void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
> +{
> + slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);
> +}
> +
> void kmem_cache_free(struct kmem_cache *s, void *x)
> {
> s = cache_from_obj(s, x);

2022-07-29 10:27:48

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 12/15] mm/sl[au]b: generalize kmalloc subsystem

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Now everything in kmalloc subsystem can be generalized.
> Let's do it!
>
> Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(),
> kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them
> to slab_common.c.
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

> ---
> mm/slab.c | 108 -----------------------------------------------
> mm/slab_common.c | 102 ++++++++++++++++++++++++++++++++++++++++++++
> mm/slub.c | 87 --------------------------------------
> 3 files changed, 102 insertions(+), 195 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 375e35c14430..6407dad13d5c 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3587,44 +3587,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
> EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
> #endif
>
> -static __always_inline void *
> -__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> -{
> - struct kmem_cache *cachep;
> - void *ret;
> -
> - if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> - ret = kmalloc_large_node_notrace(size, flags, node);
> -
> - trace_kmalloc_node(caller, ret, NULL, size,
> - PAGE_SIZE << get_order(size),
> - flags, node);
> - return ret;
> - }
> -
> - cachep = kmalloc_slab(size, flags);
> - if (unlikely(ZERO_OR_NULL_PTR(cachep)))
> - return cachep;
> -
> - ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
> - ret = kasan_kmalloc(cachep, ret, size, flags);
> -
> - return ret;
> -}
> -
> -void *__kmalloc_node(size_t size, gfp_t flags, int node)
> -{
> - return __do_kmalloc_node(size, flags, node, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc_node);
> -
> -void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> - int node, unsigned long caller)
> -{
> - return __do_kmalloc_node(size, flags, node, caller);
> -}
> -EXPORT_SYMBOL(__kmalloc_node_track_caller);
> -
> #ifdef CONFIG_PRINTK
> void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
> {
> @@ -3647,12 +3609,6 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
> }
> #endif
>
> -void *__kmalloc(size_t size, gfp_t flags)
> -{
> - return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc);
> -
> static __always_inline
> void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
> unsigned long caller)
> @@ -3730,43 +3686,6 @@ void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
> }
> EXPORT_SYMBOL(kmem_cache_free_bulk);
>
> -/**
> - * kfree - free previously allocated memory
> - * @objp: pointer returned by kmalloc.
> - *
> - * If @objp is NULL, no operation is performed.
> - *
> - * Don't free memory not originally allocated by kmalloc()
> - * or you will run into trouble.
> - */
> -void kfree(const void *objp)
> -{
> - struct kmem_cache *c;
> - unsigned long flags;
> - struct folio *folio;
> -
> - trace_kfree(_RET_IP_, objp);
> -
> - if (unlikely(ZERO_OR_NULL_PTR(objp)))
> - return;
> -
> - folio = virt_to_folio(objp);
> - if (!folio_test_slab(folio)) {
> - free_large_kmalloc(folio, (void *)objp);
> - return;
> - }
> -
> - c = folio_slab(folio)->slab_cache;
> -
> - local_irq_save(flags);
> - kfree_debugcheck(objp);
> - debug_check_no_locks_freed(objp, c->object_size);
> - debug_check_no_obj_freed(objp, c->object_size);
> - __cache_free(c, (void *)objp, _RET_IP_);
> - local_irq_restore(flags);
> -}
> -EXPORT_SYMBOL(kfree);
> -
> /*
> * This initializes kmem_cache_node or resizes various caches for all nodes.
> */
> @@ -4169,30 +4088,3 @@ void __check_heap_object(const void *ptr, unsigned long n,
> usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
> }
> #endif /* CONFIG_HARDENED_USERCOPY */
> -
> -/**
> - * __ksize -- Uninstrumented ksize.
> - * @objp: pointer to the object
> - *
> - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> - * safety checks as ksize() with KASAN instrumentation enabled.
> - *
> - * Return: size of the actual memory used by @objp in bytes
> - */
> -size_t __ksize(const void *objp)
> -{
> - struct kmem_cache *c;
> - struct folio *folio;
> -
> - BUG_ON(!objp);
> - if (unlikely(objp == ZERO_SIZE_PTR))
> - return 0;
> -
> - folio = virt_to_folio(objp);
> - if (!folio_test_slab(folio))
> - return folio_size(folio);
> -
> - c = folio_slab(folio)->slab_cache;
> - return c->object_size;
> -}
> -EXPORT_SYMBOL(__ksize);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 0dff46fb7193..1000e05c77df 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -924,6 +924,108 @@ void free_large_kmalloc(struct folio *folio, void *object)
> -(PAGE_SIZE << order));
> __free_pages(folio_page(folio, 0), order);
> }
> +
> +static __always_inline
> +void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> +{
> + struct kmem_cache *s;
> + void *ret;
> +
> + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> + ret = kmalloc_large_node_notrace(size, flags, node);
> + trace_kmalloc_node(caller, ret, NULL,
> + size, PAGE_SIZE << get_order(size),
> + flags, node);
> + return ret;
> + }
> +
> + s = kmalloc_slab(size, flags);
> +
> + if (unlikely(ZERO_OR_NULL_PTR(s)))
> + return s;
> +
> + ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
> + ret = kasan_kmalloc(s, ret, size, flags);
> + trace_kmalloc_node(caller, ret, s, size,
> + s->size, flags, node);
> + return ret;
> +}
> +
> +void *__kmalloc_node(size_t size, gfp_t flags, int node)
> +{
> + return __do_kmalloc_node(size, flags, node, _RET_IP_);
> +}
> +EXPORT_SYMBOL(__kmalloc_node);
> +
> +void *__kmalloc(size_t size, gfp_t flags)
> +{
> + return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> +}
> +EXPORT_SYMBOL(__kmalloc);
> +
> +void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> + int node, unsigned long caller)
> +{
> + return __do_kmalloc_node(size, flags, node, caller);
> +}
> +EXPORT_SYMBOL(__kmalloc_node_track_caller);
> +
> +/**
> + * kfree - free previously allocated memory
> + * @objp: pointer returned by kmalloc.
> + *
> + * If @objp is NULL, no operation is performed.
> + *
> + * Don't free memory not originally allocated by kmalloc()
> + * or you will run into trouble.
> + */
> +void kfree(const void *object)
> +{
> + struct folio *folio;
> + struct slab *slab;
> + struct kmem_cache *s;
> +
> + trace_kfree(_RET_IP_, object);
> +
> + if (unlikely(ZERO_OR_NULL_PTR(object)))
> + return;
> +
> + folio = virt_to_folio(object);
> + if (unlikely(!folio_test_slab(folio))) {
> + free_large_kmalloc(folio, (void *)object);
> + return;
> + }
> +
> + slab = folio_slab(folio);
> + s = slab->slab_cache;
> + __kmem_cache_free(s, (void *)object, _RET_IP_);
> +}
> +EXPORT_SYMBOL(kfree);
> +
> +/**
> + * __ksize -- Uninstrumented ksize.
> + * @objp: pointer to the object
> + *
> + * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> + * safety checks as ksize() with KASAN instrumentation enabled.
> + *
> + * Return: size of the actual memory used by @objp in bytes
> + */
> +size_t __ksize(const void *object)
> +{
> + struct folio *folio;
> +
> + if (unlikely(object == ZERO_SIZE_PTR))
> + return 0;
> +
> + folio = virt_to_folio(object);
> +
> + if (unlikely(!folio_test_slab(folio)))
> + return folio_size(folio);
> +
> + return slab_ksize(folio_slab(folio)->slab_cache);
> +}
> +EXPORT_SYMBOL(__ksize);
> #endif /* !CONFIG_SLOB */
>
> gfp_t kmalloc_fix_flags(gfp_t flags)
> diff --git a/mm/slub.c b/mm/slub.c
> index 74eb78678c98..836292c32e58 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4388,49 +4388,6 @@ static int __init setup_slub_min_objects(char *str)
>
> __setup("slub_min_objects=", setup_slub_min_objects);
>
> -static __always_inline
> -void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> -{
> - struct kmem_cache *s;
> - void *ret;
> -
> - if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> - ret = kmalloc_large_node_notrace(size, flags, node);
> -
> - trace_kmalloc_node(caller, ret, NULL,
> - size, PAGE_SIZE << get_order(size),
> - flags, node);
> -
> - return ret;
> - }
> -
> - s = kmalloc_slab(size, flags);
> -
> - if (unlikely(ZERO_OR_NULL_PTR(s)))
> - return s;
> -
> - ret = slab_alloc_node(s, NULL, flags, node, caller, size);
> -
> - trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
> -
> - ret = kasan_kmalloc(s, ret, size, flags);
> -
> - return ret;
> -}
> -
> -void *__kmalloc_node(size_t size, gfp_t flags, int node)
> -{
> - return __do_kmalloc_node(size, flags, node, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc_node);
> -
> -void *__kmalloc(size_t size, gfp_t flags)
> -{
> - return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc);
> -
> -
> #ifdef CONFIG_HARDENED_USERCOPY
> /*
> * Rejects incorrectly sized objects and objects that are to be copied
> @@ -4481,43 +4438,6 @@ void __check_heap_object(const void *ptr, unsigned long n,
> }
> #endif /* CONFIG_HARDENED_USERCOPY */
>
> -size_t __ksize(const void *object)
> -{
> - struct folio *folio;
> -
> - if (unlikely(object == ZERO_SIZE_PTR))
> - return 0;
> -
> - folio = virt_to_folio(object);
> -
> - if (unlikely(!folio_test_slab(folio)))
> - return folio_size(folio);
> -
> - return slab_ksize(folio_slab(folio)->slab_cache);
> -}
> -EXPORT_SYMBOL(__ksize);
> -
> -void kfree(const void *x)
> -{
> - struct folio *folio;
> - struct slab *slab;
> - void *object = (void *)x;
> -
> - trace_kfree(_RET_IP_, x);
> -
> - if (unlikely(ZERO_OR_NULL_PTR(x)))
> - return;
> -
> - folio = virt_to_folio(x);
> - if (unlikely(!folio_test_slab(folio))) {
> - free_large_kmalloc(folio, object);
> - return;
> - }
> - slab = folio_slab(folio);
> - slab_free(slab->slab_cache, slab, object, NULL, &object, 1, _RET_IP_);
> -}
> -EXPORT_SYMBOL(kfree);
> -
> #define SHRINK_PROMOTE_MAX 32
>
> /*
> @@ -4863,13 +4783,6 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
> return 0;
> }
>
> -void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> - int node, unsigned long caller)
> -{
> - return __do_kmalloc_node(size, gfpflags, node, caller);
> -}
> -EXPORT_SYMBOL(__kmalloc_node_track_caller);
> -
> #ifdef CONFIG_SYSFS
> static int count_inuse(struct slab *slab)
> {

2022-07-29 11:13:17

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 13/15] mm/slab_common: unify NUMA and UMA version of tracepoints

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
> remove _node postfix for NUMA version of tracepoints.
>
> This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
> but at this point maintaining both kmem_alloc and kmem_alloc_node
> event classes does not makes sense at all.
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

2022-07-29 11:27:58

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
> using TRACE_EVENT() macro.
>
> And then this patch does:
> - Do not pass pointer to struct kmem_cache to trace_kmalloc.
> gfp flag is enough to know if it's accounted or not.
> - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
> - Avoid dereferencing s->name in when not using kmem_cache_free event.
>
> Suggested-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

With some comments:

> ---
> include/trace/events/kmem.h | 64 +++++++++++++++++++++++++------------
> mm/slab.c | 16 ++++------
> mm/slab_common.c | 17 ++++++----
> mm/slob.c | 16 +++-------
> mm/slub.c | 13 +++-----
> 5 files changed, 70 insertions(+), 56 deletions(-)
>
> diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> index e078ebcdc4b1..0ded2c351062 100644
> --- a/include/trace/events/kmem.h
> +++ b/include/trace/events/kmem.h
> @@ -9,17 +9,15 @@
> #include <linux/tracepoint.h>
> #include <trace/events/mmflags.h>
>
> -DECLARE_EVENT_CLASS(kmem_alloc,
> +TRACE_EVENT(kmem_cache_alloc,
>
> TP_PROTO(unsigned long call_site,
> const void *ptr,
> struct kmem_cache *s,
> - size_t bytes_req,
> - size_t bytes_alloc,
> gfp_t gfp_flags,
> int node),
>
> - TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node),
> + TP_ARGS(call_site, ptr, s, gfp_flags, node),
>
> TP_STRUCT__entry(
> __field( unsigned long, call_site )
> @@ -34,8 +32,8 @@ DECLARE_EVENT_CLASS(kmem_alloc,
> TP_fast_assign(
> __entry->call_site = call_site;
> __entry->ptr = ptr;
> - __entry->bytes_req = bytes_req;
> - __entry->bytes_alloc = bytes_alloc;
> + __entry->bytes_req = s->object_size;
> + __entry->bytes_alloc = s->size;
> __entry->gfp_flags = (__force unsigned long)gfp_flags;
> __entry->node = node;
> __entry->accounted = IS_ENABLED(CONFIG_MEMCG_KMEM) ?

This assignment continues by ...
(s && s->flags & SLAB_ACCOUNT)) : false;

We could now remove the "s &&" as we assume it's non-NULL anyway.

> @@ -53,22 +51,46 @@ DECLARE_EVENT_CLASS(kmem_alloc,
> __entry->accounted ? "true" : "false")
> );
>
> -DEFINE_EVENT(kmem_alloc, kmalloc,
> +TRACE_EVENT(kmalloc,
>
> - TP_PROTO(unsigned long call_site, const void *ptr,
> - struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
> - gfp_t gfp_flags, int node),
> + TP_PROTO(unsigned long call_site,
> + const void *ptr,
> + size_t bytes_req,
> + size_t bytes_alloc,
> + gfp_t gfp_flags,
> + int node),
>
> - TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
> -);
> + TP_ARGS(call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node),
>
> -DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
> + TP_STRUCT__entry(
> + __field( unsigned long, call_site )
> + __field( const void *, ptr )
> + __field( size_t, bytes_req )
> + __field( size_t, bytes_alloc )
> + __field( unsigned long, gfp_flags )
> + __field( int, node )
> + __field( bool, accounted )
> + ),
>
> - TP_PROTO(unsigned long call_site, const void *ptr,
> - struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
> - gfp_t gfp_flags, int node),
> + TP_fast_assign(
> + __entry->call_site = call_site;
> + __entry->ptr = ptr;
> + __entry->bytes_req = bytes_req;
> + __entry->bytes_alloc = bytes_alloc;
> + __entry->gfp_flags = (__force unsigned long)gfp_flags;
> + __entry->node = node;
> + __entry->accounted = IS_ENABLED(CONFIG_MEMCG_KMEM) ?
> + (gfp_flags & __GFP_ACCOUNT) : false;

This test could be perhaps moved to TP_printk and the accounted field
removed completely, but we need to make sure trace-cmd (which reads the
binary entries and does the printk in userspace) can interpret the tests
properly, including the IS_ENABLED(CONFIG_MEMCG_KMEM) part.

> + ),
>
> - TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
> + TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
> + (void *)__entry->call_site,
> + __entry->ptr,
> + __entry->bytes_req,
> + __entry->bytes_alloc,
> + show_gfp_flags(__entry->gfp_flags),
> + __entry->node,
> + __entry->accounted ? "true" : "false")
> );
>
> TRACE_EVENT(kfree,
> @@ -93,20 +115,20 @@ TRACE_EVENT(kfree,
>
> TRACE_EVENT(kmem_cache_free,
>
> - TP_PROTO(unsigned long call_site, const void *ptr, const char *name),
> + TP_PROTO(unsigned long call_site, const void *ptr, const struct kmem_cache *s),
>
> - TP_ARGS(call_site, ptr, name),
> + TP_ARGS(call_site, ptr, s),
>
> TP_STRUCT__entry(
> __field( unsigned long, call_site )
> __field( const void *, ptr )
> - __string( name, name )
> + __string( name, s->name )
> ),
>
> TP_fast_assign(
> __entry->call_site = call_site;
> __entry->ptr = ptr;
> - __assign_str(name, name);
> + __assign_str(name, s->name);
> ),
>
> TP_printk("call_site=%pS ptr=%p name=%s",
> diff --git a/mm/slab.c b/mm/slab.c
> index f9b74831e3f4..1685e5507a59 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3447,8 +3447,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
> {
> void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
>
> - trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
> - cachep->size, flags, NUMA_NO_NODE);
> + trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, NUMA_NO_NODE);
>
> return ret;
> }
> @@ -3537,8 +3536,9 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
> ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
>
> ret = kasan_kmalloc(cachep, ret, size, flags);
> - trace_kmalloc(_RET_IP_, ret, cachep,
> - size, cachep->size, flags, NUMA_NO_NODE);
> + trace_kmalloc(_RET_IP_, ret,
> + size, cachep->size,
> + flags, NUMA_NO_NODE);

This could now fit on 2, maybe even 1 line?
Other trace_* calls could be perhaps also reduced, please check.

> return ret;
> }
> EXPORT_SYMBOL(kmem_cache_alloc_trace);
> @@ -3561,9 +3561,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> {
> void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
>
> - trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
> - cachep->object_size, cachep->size,
> - flags, nodeid);
> + trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);
>
> return ret;
> }
> @@ -3588,7 +3586,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
> ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
>
> ret = kasan_kmalloc(cachep, ret, size, flags);
> - trace_kmalloc(_RET_IP_, ret, cachep,
> + trace_kmalloc(_RET_IP_, ret,
> size, cachep->size,
> flags, nodeid);
> return ret;
> @@ -3652,7 +3650,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
> if (!cachep)
> return;
>
> - trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
> + trace_kmem_cache_free(_RET_IP_, objp, cachep);
> __do_kmem_cache_free(cachep, objp, _RET_IP_);
> }
> EXPORT_SYMBOL(kmem_cache_free);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 0e66b4911ebf..c01c6b8f0d34 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -933,7 +933,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
>
> if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> ret = kmalloc_large_node_notrace(size, flags, node);
> - trace_kmalloc(_RET_IP_, ret, NULL,
> + trace_kmalloc(_RET_IP_, ret,
> size, PAGE_SIZE << get_order(size),
> flags, node);
> return ret;
> @@ -946,8 +946,9 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
>
> ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
> ret = kasan_kmalloc(s, ret, size, flags);
> - trace_kmalloc(_RET_IP_, ret, s, size,
> - s->size, flags, node);
> + trace_kmalloc(_RET_IP_, ret,
> + size, s->size,
> + flags, node);
> return ret;
> }
>
> @@ -1075,8 +1076,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
> {
> void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
>
> - trace_kmalloc(_RET_IP_, ret, NULL, size,
> - PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
> + trace_kmalloc(_RET_IP_, ret,
> + size, PAGE_SIZE << get_order(size),
> + flags, NUMA_NO_NODE);
> return ret;
> }
> EXPORT_SYMBOL(kmalloc_large);
> @@ -1090,8 +1092,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> {
> void *ret = __kmalloc_large_node_notrace(size, flags, node);
>
> - trace_kmalloc(_RET_IP_, ret, NULL, size,
> - PAGE_SIZE << get_order(size), flags, node);
> + trace_kmalloc(_RET_IP_, ret,
> + size, PAGE_SIZE << get_order(size),
> + flags, node);
> return ret;
> }
> EXPORT_SYMBOL(kmalloc_large_node);
> diff --git a/mm/slob.c b/mm/slob.c
> index a4d50d957c25..97a4d2407f96 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -507,8 +507,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
> *m = size;
> ret = (void *)m + minalign;
>
> - trace_kmalloc(caller, ret, NULL,
> - size, size + minalign, gfp, node);
> + trace_kmalloc(caller, ret, size, size + minalign, gfp, node);
> } else {
> unsigned int order = get_order(size);
>
> @@ -516,8 +515,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
> gfp |= __GFP_COMP;
> ret = slob_new_pages(gfp, order, node);
>
> - trace_kmalloc(caller, ret, NULL,
> - size, PAGE_SIZE << order, gfp, node);
> + trace_kmalloc(caller, ret, size, PAGE_SIZE << order, gfp, node);
> }
>
> kmemleak_alloc(ret, size, 1, gfp);
> @@ -608,14 +606,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
>
> if (c->size < PAGE_SIZE) {
> b = slob_alloc(c->size, flags, c->align, node, 0);
> - trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
> - SLOB_UNITS(c->size) * SLOB_UNIT,

Hm we lose the SLOB_UNITS() adjustment here. Maybe we don't care that much,
maybe simply c->size could be already stored with SLOB_UNITS() (but didn't
check if other stuff breaks with that).

Thanks!

> - flags, node);
> + trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
> } else {
> b = slob_new_pages(flags, get_order(c->size), node);
> - trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
> - PAGE_SIZE << get_order(c->size),
> - flags, node);
> + trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
> }
>
> if (b && c->ctor) {
> @@ -671,7 +665,7 @@ static void kmem_rcu_free(struct rcu_head *head)
> void kmem_cache_free(struct kmem_cache *c, void *b)
> {
> kmemleak_free_recursive(b, c->flags);
> - trace_kmem_cache_free(_RET_IP_, b, c->name);
> + trace_kmem_cache_free(_RET_IP_, b, c);
> if (unlikely(c->flags & SLAB_TYPESAFE_BY_RCU)) {
> struct slob_rcu *slob_rcu;
> slob_rcu = b + (c->size - sizeof(struct slob_rcu));
> diff --git a/mm/slub.c b/mm/slub.c
> index f1aa51480dc4..a49c69469c64 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3243,8 +3243,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> {
> void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
>
> - trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
> - s->size, gfpflags, NUMA_NO_NODE);
> + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
>
> return ret;
> }
> @@ -3274,7 +3273,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
> {
> void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
> - trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
> + trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
> ret = kasan_kmalloc(s, ret, size, gfpflags);
> return ret;
> }
> @@ -3285,8 +3284,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
> {
> void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
>
> - trace_kmem_cache_alloc(_RET_IP_, ret, s,
> - s->object_size, s->size, gfpflags, node);
> + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);
>
> return ret;
> }
> @@ -3299,8 +3297,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
> {
> void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
>
> - trace_kmalloc(_RET_IP_, ret, s,
> - size, s->size, gfpflags, node);
> + trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
>
> ret = kasan_kmalloc(s, ret, size, gfpflags);
> return ret;
> @@ -3544,7 +3541,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
> s = cache_from_obj(s, x);
> if (!s)
> return;
> - trace_kmem_cache_free(_RET_IP_, x, s->name);
> + trace_kmem_cache_free(_RET_IP_, x, s);
> slab_free(s, virt_to_slab(x), x, NULL, &x, 1, _RET_IP_);
> }
> EXPORT_SYMBOL(kmem_cache_free);

2022-07-29 11:54:00

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> __ksize() returns size of objects allocated from slab allocator.
> When invalid object is passed to __ksize(), returning zero
> prevents further memory corruption and makes caller be able to
> check if there is an error.
>
> If address of large object is not beginning of folio or size of
> the folio is too small, it must be invalid. Return zero in such cases.
>
> Suggested-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

If you want to change it to BUG() I won't object, no strong opinion.

> ---
> mm/slab_common.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1f8db7959366..0d6cbe9d7ad0 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1013,8 +1013,12 @@ size_t __ksize(const void *object)
>
> folio = virt_to_folio(object);
>
> - if (unlikely(!folio_test_slab(folio)))
> + if (unlikely(!folio_test_slab(folio))) {
> + if (WARN_ON(object != folio_address(folio) ||
> + folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
> + return 0;
> return folio_size(folio);
> + }
>
> return slab_ksize(folio_slab(folio)->slab_cache);
> }

2022-07-29 12:00:14

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> __ksize() is only called by KASAN. Remove export symbol and move
> definition to mm/slab.h as we don't want to grow its callers.
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>
> Reviewed-by: Vlastimil Babka <[email protected]>

Nit: The subject and changelog should probably say declaration, not definition.

> ---
> include/linux/slab.h | 1 -
> mm/slab.h | 2 ++
> mm/slab_common.c | 11 +----------
> mm/slob.c | 1 -
> 4 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 4ee5b2fed164..701fe538650f 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -187,7 +187,6 @@ int kmem_cache_shrink(struct kmem_cache *s);
> void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
> void kfree(const void *objp);
> void kfree_sensitive(const void *objp);
> -size_t __ksize(const void *objp);
> size_t ksize(const void *objp);
> #ifdef CONFIG_PRINTK
> bool kmem_valid_obj(void *object);
> diff --git a/mm/slab.h b/mm/slab.h
> index 9193e9c1f040..ad634e02b3cb 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -678,6 +678,8 @@ void free_large_kmalloc(struct folio *folio, void *object);
>
> #endif /* CONFIG_SLOB */
>
> +size_t __ksize(const void *objp);
> +
> static inline size_t slab_ksize(const struct kmem_cache *s)
> {
> #ifndef CONFIG_SLUB
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index c01c6b8f0d34..1f8db7959366 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1003,15 +1003,7 @@ void kfree(const void *object)
> }
> EXPORT_SYMBOL(kfree);
>
> -/**
> - * __ksize -- Uninstrumented ksize.
> - * @objp: pointer to the object
> - *
> - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> - * safety checks as ksize() with KASAN instrumentation enabled.
> - *
> - * Return: size of the actual memory used by @objp in bytes
> - */
> +/* Uninstrumented ksize. Only called by KASAN. */
> size_t __ksize(const void *object)
> {
> struct folio *folio;
> @@ -1026,7 +1018,6 @@ size_t __ksize(const void *object)
>
> return slab_ksize(folio_slab(folio)->slab_cache);
> }
> -EXPORT_SYMBOL(__ksize);
> #endif /* !CONFIG_SLOB */
>
> gfp_t kmalloc_fix_flags(gfp_t flags)
> diff --git a/mm/slob.c b/mm/slob.c
> index 97a4d2407f96..91d6e2b19929 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -584,7 +584,6 @@ size_t __ksize(const void *block)
> m = (unsigned int *)(block - align);
> return SLOB_UNITS(*m) * SLOB_UNIT;
> }
> -EXPORT_SYMBOL(__ksize);
>
> int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
> {

2022-07-29 15:35:11

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] common kmalloc v3

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> This is v3 of common kmalloc series.
>
> This series generalize most of kmalloc functions and move its
> implementation into mm/slab_common.c.
>
> I believe this series give better maintainability of code for SLAB and SLUB.
> Please consider applying.
>
> This series is based on slab/for-next and also available at
> https://github.com/hygoni/linux/tree/slab-common-v3r0
>
> Any feedbacks/reviews will be appreciated.

Hi, thanks for all your efforts. It's shaping up nicely so I think the next
version will be ready to be added to -next after the 5.20 merge window.
As I've finished the individual reviews, I'm looking at the result and see a
bit more potential for cleanups, which could be perhaps incorporated to
existing patches, or additionally:

- kmalloc_large_node_notrace() has only one caller, can be removed and the
caller can call __kmalloc_large_node_notrace() directly, especially if it's
not __always_inline as I've IIRC suggested.

- kmem_cache_alloc_trace() and kmem_cache_alloc_node_trace() are weird ones,
they are in fact for kmalloc despite the name. They depend on
CONFIG_TRACING, yet if you look carefully, the !CONFIG_TRACING variant also
goes through a trace_* function. The actual difference seems that
slab_alloc() thus kasan_alloc() and kfence_alloc() don't get the orig_size
that way, which is dubious. It might be worth trying to unify this as well?
E.g.
- use only the CONFIG_TRACING variant, discard the other
- declare it in mm/slab.h, this is not for general usage
- single implementation in mm/slab_common.c that calls
__kmem_cache_alloc_node() from SLAB/SLUB and does the trace

2022-08-01 14:20:15

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()

On 8/1/22 15:26, Hyeonggon Yoo wrote:
> On Thu, Jul 28, 2022 at 05:23:29PM +0200, Vlastimil Babka wrote:
>> On 7/12/22 15:39, Hyeonggon Yoo wrote:
>> > There is no caller of kmalloc_order_trace() except kmalloc_large().
>> > Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().
>> >
>> > Also add tracepoint in kmalloc_large() that was previously
>> > in kmalloc_order_trace().
>> >
>> > Signed-off-by: Hyeonggon Yoo <[email protected]>
>> > Reviewed-by: Vlastimil Babka <[email protected]>
>>
>
> Thank you for review.
>
>> Hmm noticed a small change in how we call trace_kmalloc() which will now
>> include the __GFP_COMP.
>
> Good catch.
>
>> I think we could just call alloc_pages() from
>> kmalloc_large() with flags | __GFP_COMP instead of doing flags |=
>> __GFP_COMP; first.
>
> I'll consider that in next version.
>
>> AFAICS both kasan_kmalloc_large() and kmemleak_alloc()
>> will filter it out anyway.
>
> AFAIK not passing __GFP_COMP to both kasan_kmalloc_large()
> and kmemleak_alloc() will affect their behavior.

You mean "will NOT affect...", right?

> - kasan_kmalloc_large() only checks if flag has __GFP_DIRECT_RECLAIM,
> - kmemleak_alloc() only use it to allocate kmemleak_object from
> slab/pool, which does not require __GFP_COMP.
>
> simply calling
> alloc_pages() with flags | GFP_COMP
> kasan_kmalloc_large() with flags
> kmemleak_alloc() with flags
> trace_kmalloc() with flags
> will be okay.
>
> BTW this is fixed after patch 9.

Ah, right. __kmalloc_large_node_notrace() does the same flags |=
__GFP_COMP; but the tracepoint is outside and unaffected.

2022-08-01 14:27:46

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()

On Thu, Jul 28, 2022 at 05:23:29PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > There is no caller of kmalloc_order_trace() except kmalloc_large().
> > Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().
> >
> > Also add tracepoint in kmalloc_large() that was previously
> > in kmalloc_order_trace().
> >
> > Signed-off-by: Hyeonggon Yoo <[email protected]>
> > Reviewed-by: Vlastimil Babka <[email protected]>
>

Thank you for review.

> Hmm noticed a small change in how we call trace_kmalloc() which will now
> include the __GFP_COMP.

Good catch.

> I think we could just call alloc_pages() from
> kmalloc_large() with flags | __GFP_COMP instead of doing flags |=
> __GFP_COMP; first.

I'll consider that in next version.

> AFAICS both kasan_kmalloc_large() and kmemleak_alloc()
> will filter it out anyway.

AFAIK not passing __GFP_COMP to both kasan_kmalloc_large()
and kmemleak_alloc() will affect their behavior.

- kasan_kmalloc_large() only checks if flag has __GFP_DIRECT_RECLAIM,
- kmemleak_alloc() only use it to allocate kmemleak_object from
slab/pool, which does not require __GFP_COMP.

simply calling
alloc_pages() with flags | GFP_COMP
kasan_kmalloc_large() with flags
kmemleak_alloc() with flags
trace_kmalloc() with flags
will be okay.

BTW this is fixed after patch 9.

> > ---
> > include/linux/slab.h | 22 ++--------------------
> > mm/slab_common.c | 14 +++-----------
> > 2 files changed, 5 insertions(+), 31 deletions(-)
> >
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index a0e57df3d5a4..15a4c59da59e 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -489,26 +489,8 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
> > }
> > #endif /* CONFIG_TRACING */
> >
> > -extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment
> > - __alloc_size(1);
> > -
> > -#ifdef CONFIG_TRACING
> > -extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> > - __assume_page_alignment __alloc_size(1);
> > -#else
> > -static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags,
> > - unsigned int order)
> > -{
> > - return kmalloc_order(size, flags, order);
> > -}
> > -#endif
> > -
> > -static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags)
> > -{
> > - unsigned int order = get_order(size);
> > - return kmalloc_order_trace(size, flags, order);
> > -}
> > -
> > +void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
> > + __alloc_size(1);
> > /**
> > * kmalloc - allocate memory
> > * @size: how many bytes of memory are required.
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 6c9aac5d8f4a..1f8af2106df0 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -932,10 +932,11 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
> > * directly to the page allocator. We use __GFP_COMP, because we will need to
> > * know the allocation order to free the pages properly in kfree.
> > */
> > -void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> > +void *kmalloc_large(size_t size, gfp_t flags)
> > {
> > void *ret = NULL;
> > struct page *page;
> > + unsigned int order = get_order(size);
> >
> > if (unlikely(flags & GFP_SLAB_BUG_MASK))
> > flags = kmalloc_fix_flags(flags);
> > @@ -950,19 +951,10 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> > ret = kasan_kmalloc_large(ret, size, flags);
> > /* As ret might get tagged, call kmemleak hook after KASAN. */
> > kmemleak_alloc(ret, size, 1, flags);
> > - return ret;
> > -}
> > -EXPORT_SYMBOL(kmalloc_order);
> > -
> > -#ifdef CONFIG_TRACING
> > -void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> > -{
> > - void *ret = kmalloc_order(size, flags, order);
> > trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
> > return ret;
> > }
> > -EXPORT_SYMBOL(kmalloc_order_trace);
> > -#endif
> > +EXPORT_SYMBOL(kmalloc_large);
> >
> > #ifdef CONFIG_SLAB_FREELIST_RANDOM
> > /* Randomize a generic freelist */
>

--
Thanks,
Hyeonggon

2022-08-02 02:58:29

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()

On Mon, Aug 01, 2022 at 03:36:48PM +0200, Vlastimil Babka wrote:
> On 8/1/22 15:26, Hyeonggon Yoo wrote:
> > On Thu, Jul 28, 2022 at 05:23:29PM +0200, Vlastimil Babka wrote:
> >> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> >> > There is no caller of kmalloc_order_trace() except kmalloc_large().
> >> > Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().
> >> >
> >> > Also add tracepoint in kmalloc_large() that was previously
> >> > in kmalloc_order_trace().
> >> >
> >> > Signed-off-by: Hyeonggon Yoo <[email protected]>
> >> > Reviewed-by: Vlastimil Babka <[email protected]>
> >>
> >
> > Thank you for review.
> >
> >> Hmm noticed a small change in how we call trace_kmalloc() which will now
> >> include the __GFP_COMP.
> >
> > Good catch.
> >
> >> I think we could just call alloc_pages() from
> >> kmalloc_large() with flags | __GFP_COMP instead of doing flags |=
> >> __GFP_COMP; first.
> >
> > I'll consider that in next version.
> >
> >> AFAICS both kasan_kmalloc_large() and kmemleak_alloc()
> >> will filter it out anyway.
> >
> > AFAIK not passing __GFP_COMP to both kasan_kmalloc_large()
> > and kmemleak_alloc() will affect their behavior.
>
> You mean "will NOT affect...", right?

Ah, yes. mistake.

>
> > - kasan_kmalloc_large() only checks if flag has __GFP_DIRECT_RECLAIM,
> > - kmemleak_alloc() only use it to allocate kmemleak_object from
> > slab/pool, which does not require __GFP_COMP.
> >
> > simply calling
> > alloc_pages() with flags | GFP_COMP
> > kasan_kmalloc_large() with flags
> > kmemleak_alloc() with flags
> > trace_kmalloc() with flags
> > will be okay.
> >
> > BTW this is fixed after patch 9.
>
> Ah, right. __kmalloc_large_node_notrace() does the same flags |=
> __GFP_COMP; but the tracepoint is outside and unaffected.

I'll reflect your suggestion anyway.

--
Thanks,
Hyeonggon

2022-08-02 09:35:25

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using

On Fri, Jul 29, 2022 at 01:23:38PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
> > using TRACE_EVENT() macro.
> >
> > And then this patch does:
> > - Do not pass pointer to struct kmem_cache to trace_kmalloc.
> > gfp flag is enough to know if it's accounted or not.
> > - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
> > - Avoid dereferencing s->name in when not using kmem_cache_free event.
> >
> > Suggested-by: Vlastimil Babka <[email protected]>
> > Signed-off-by: Hyeonggon Yoo <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
>
> With some comments:
>
> > ---
> > include/trace/events/kmem.h | 64 +++++++++++++++++++++++++------------
> > mm/slab.c | 16 ++++------
> > mm/slab_common.c | 17 ++++++----
> > mm/slob.c | 16 +++-------
> > mm/slub.c | 13 +++-----
> > 5 files changed, 70 insertions(+), 56 deletions(-)
> >
> > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> > index e078ebcdc4b1..0ded2c351062 100644
> > --- a/include/trace/events/kmem.h
> > +++ b/include/trace/events/kmem.h
> > @@ -9,17 +9,15 @@
> > #include <linux/tracepoint.h>
> > #include <trace/events/mmflags.h>
> >
> > -DECLARE_EVENT_CLASS(kmem_alloc,
> > +TRACE_EVENT(kmem_cache_alloc,
> >
> > TP_PROTO(unsigned long call_site,
> > const void *ptr,
> > struct kmem_cache *s,
> > - size_t bytes_req,
> > - size_t bytes_alloc,
> > gfp_t gfp_flags,
> > int node),
> >
> > - TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node),
> > + TP_ARGS(call_site, ptr, s, gfp_flags, node),
> >
> > TP_STRUCT__entry(
> > __field( unsigned long, call_site )
> > @@ -34,8 +32,8 @@ DECLARE_EVENT_CLASS(kmem_alloc,
> > TP_fast_assign(
> > __entry->call_site = call_site;
> > __entry->ptr = ptr;
> > - __entry->bytes_req = bytes_req;
> > - __entry->bytes_alloc = bytes_alloc;
> > + __entry->bytes_req = s->object_size;
> > + __entry->bytes_alloc = s->size;
> > __entry->gfp_flags = (__force unsigned long)gfp_flags;
> > __entry->node = node;
> > __entry->accounted = IS_ENABLED(CONFIG_MEMCG_KMEM) ?
>
> This assignment continues by ...
> (s && s->flags & SLAB_ACCOUNT)) : false;
>
> We could now remove the "s &&" as we assume it's non-NULL anyway.

Right. will adjust.

>
> > @@ -53,22 +51,46 @@ DECLARE_EVENT_CLASS(kmem_alloc,
> > __entry->accounted ? "true" : "false")
> > );
> >
> > -DEFINE_EVENT(kmem_alloc, kmalloc,
> > +TRACE_EVENT(kmalloc,
> >
> > - TP_PROTO(unsigned long call_site, const void *ptr,
> > - struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
> > - gfp_t gfp_flags, int node),
> > + TP_PROTO(unsigned long call_site,
> > + const void *ptr,
> > + size_t bytes_req,
> > + size_t bytes_alloc,
> > + gfp_t gfp_flags,
> > + int node),
> >
> > - TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
> > -);
> > + TP_ARGS(call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node),
> >
> > -DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
> > + TP_STRUCT__entry(
> > + __field( unsigned long, call_site )
> > + __field( const void *, ptr )
> > + __field( size_t, bytes_req )
> > + __field( size_t, bytes_alloc )
> > + __field( unsigned long, gfp_flags )
> > + __field( int, node )
> > + __field( bool, accounted )
> > + ),
> >
> > - TP_PROTO(unsigned long call_site, const void *ptr,
> > - struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
> > - gfp_t gfp_flags, int node),
> > + TP_fast_assign(
> > + __entry->call_site = call_site;
> > + __entry->ptr = ptr;
> > + __entry->bytes_req = bytes_req;
> > + __entry->bytes_alloc = bytes_alloc;
> > + __entry->gfp_flags = (__force unsigned long)gfp_flags;
> > + __entry->node = node;
> > + __entry->accounted = IS_ENABLED(CONFIG_MEMCG_KMEM) ?
> > + (gfp_flags & __GFP_ACCOUNT) : false;
>
> This test could be perhaps moved to TP_printk and the accounted field
> removed completely, but we need to make sure trace-cmd (which reads the
> binary entries and does the printk in userspace) can interpret the tests
> properly, including the IS_ENABLED(CONFIG_MEMCG_KMEM) part.

Okay, I'll try moving it and check if trace-cmd could understand it.

>
> > + ),
> >
> > - TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
> > + TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
> > + (void *)__entry->call_site,
> > + __entry->ptr,
> > + __entry->bytes_req,
> > + __entry->bytes_alloc,
> > + show_gfp_flags(__entry->gfp_flags),
> > + __entry->node,
> > + __entry->accounted ? "true" : "false")
> > );
> >
> > TRACE_EVENT(kfree,
> > @@ -93,20 +115,20 @@ TRACE_EVENT(kfree,
> >
> > TRACE_EVENT(kmem_cache_free,
> >
> > - TP_PROTO(unsigned long call_site, const void *ptr, const char *name),
> > + TP_PROTO(unsigned long call_site, const void *ptr, const struct kmem_cache *s),
> >
> > - TP_ARGS(call_site, ptr, name),
> > + TP_ARGS(call_site, ptr, s),
> >
> > TP_STRUCT__entry(
> > __field( unsigned long, call_site )
> > __field( const void *, ptr )
> > - __string( name, name )
> > + __string( name, s->name )
> > ),
> >
> > TP_fast_assign(
> > __entry->call_site = call_site;
> > __entry->ptr = ptr;
> > - __assign_str(name, name);
> > + __assign_str(name, s->name);
> > ),
> >
> > TP_printk("call_site=%pS ptr=%p name=%s",
> > diff --git a/mm/slab.c b/mm/slab.c
> > index f9b74831e3f4..1685e5507a59 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3447,8 +3447,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
> > {
> > void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
> >
> > - trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
> > - cachep->size, flags, NUMA_NO_NODE);
> > + trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, NUMA_NO_NODE);
> >
> > return ret;
> > }
> > @@ -3537,8 +3536,9 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
> > ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
> >
> > ret = kasan_kmalloc(cachep, ret, size, flags);
> > - trace_kmalloc(_RET_IP_, ret, cachep,
> > - size, cachep->size, flags, NUMA_NO_NODE);
> > + trace_kmalloc(_RET_IP_, ret,
> > + size, cachep->size,
> > + flags, NUMA_NO_NODE);
>
> This could now fit on 2, maybe even 1 line?
> Other trace_* calls could be perhaps also reduced, please check.
>

Okay, will check.

> > return ret;
> > }
> > EXPORT_SYMBOL(kmem_cache_alloc_trace);
> > @@ -3561,9 +3561,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> > {
> > void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
> >
> > - trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
> > - cachep->object_size, cachep->size,
> > - flags, nodeid);
> > + trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);
> >
> > return ret;
> > }
> > @@ -3588,7 +3586,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
> > ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
> >
> > ret = kasan_kmalloc(cachep, ret, size, flags);
> > - trace_kmalloc(_RET_IP_, ret, cachep,
> > + trace_kmalloc(_RET_IP_, ret,
> > size, cachep->size,
> > flags, nodeid);
> > return ret;
> > @@ -3652,7 +3650,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
> > if (!cachep)
> > return;
> >
> > - trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
> > + trace_kmem_cache_free(_RET_IP_, objp, cachep);
> > __do_kmem_cache_free(cachep, objp, _RET_IP_);
> > }
> > EXPORT_SYMBOL(kmem_cache_free);
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 0e66b4911ebf..c01c6b8f0d34 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -933,7 +933,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
> >
> > if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> > ret = kmalloc_large_node_notrace(size, flags, node);
> > - trace_kmalloc(_RET_IP_, ret, NULL,
> > + trace_kmalloc(_RET_IP_, ret,
> > size, PAGE_SIZE << get_order(size),
> > flags, node);
> > return ret;
> > @@ -946,8 +946,9 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
> >
> > ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
> > ret = kasan_kmalloc(s, ret, size, flags);
> > - trace_kmalloc(_RET_IP_, ret, s, size,
> > - s->size, flags, node);
> > + trace_kmalloc(_RET_IP_, ret,
> > + size, s->size,
> > + flags, node);
> > return ret;
> > }
> >
> > @@ -1075,8 +1076,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
> > {
> > void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
> >
> > - trace_kmalloc(_RET_IP_, ret, NULL, size,
> > - PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
> > + trace_kmalloc(_RET_IP_, ret,
> > + size, PAGE_SIZE << get_order(size),
> > + flags, NUMA_NO_NODE);
> > return ret;
> > }
> > EXPORT_SYMBOL(kmalloc_large);
> > @@ -1090,8 +1092,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> > {
> > void *ret = __kmalloc_large_node_notrace(size, flags, node);
> >
> > - trace_kmalloc(_RET_IP_, ret, NULL, size,
> > - PAGE_SIZE << get_order(size), flags, node);
> > + trace_kmalloc(_RET_IP_, ret,
> > + size, PAGE_SIZE << get_order(size),
> > + flags, node);
> > return ret;
> > }
> > EXPORT_SYMBOL(kmalloc_large_node);
> > diff --git a/mm/slob.c b/mm/slob.c
> > index a4d50d957c25..97a4d2407f96 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -507,8 +507,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
> > *m = size;
> > ret = (void *)m + minalign;
> >
> > - trace_kmalloc(caller, ret, NULL,
> > - size, size + minalign, gfp, node);
> > + trace_kmalloc(caller, ret, size, size + minalign, gfp, node);
> > } else {
> > unsigned int order = get_order(size);
> >
> > @@ -516,8 +515,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
> > gfp |= __GFP_COMP;
> > ret = slob_new_pages(gfp, order, node);
> >
> > - trace_kmalloc(caller, ret, NULL,
> > - size, PAGE_SIZE << order, gfp, node);
> > + trace_kmalloc(caller, ret, size, PAGE_SIZE << order, gfp, node);
> > }
> >
> > kmemleak_alloc(ret, size, 1, gfp);
> > @@ -608,14 +606,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
> >
> > if (c->size < PAGE_SIZE) {
> > b = slob_alloc(c->size, flags, c->align, node, 0);
> > - trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
> > - SLOB_UNITS(c->size) * SLOB_UNIT,
>
> Hm we lose the SLOB_UNITS() adjustment here.

Thank you for catching this.

> Maybe we don't care that much,
> maybe simply c->size could be already stored with SLOB_UNITS() (but didn't
> check if other stuff breaks with that).

Hmm will try that way!

Thanks!

>
> Thanks!
>
> > - flags, node);
> > + trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
> > } else {
> > b = slob_new_pages(flags, get_order(c->size), node);
> > - trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
> > - PAGE_SIZE << get_order(c->size),
> > - flags, node);
> > + trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
> > }
> >
> > if (b && c->ctor) {
> > @@ -671,7 +665,7 @@ static void kmem_rcu_free(struct rcu_head *head)
> > void kmem_cache_free(struct kmem_cache *c, void *b)
> > {
> > kmemleak_free_recursive(b, c->flags);
> > - trace_kmem_cache_free(_RET_IP_, b, c->name);
> > + trace_kmem_cache_free(_RET_IP_, b, c);
> > if (unlikely(c->flags & SLAB_TYPESAFE_BY_RCU)) {
> > struct slob_rcu *slob_rcu;
> > slob_rcu = b + (c->size - sizeof(struct slob_rcu));
> > diff --git a/mm/slub.c b/mm/slub.c
> > index f1aa51480dc4..a49c69469c64 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3243,8 +3243,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> > {
> > void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
> >
> > - trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
> > - s->size, gfpflags, NUMA_NO_NODE);
> > + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
> >
> > return ret;
> > }
> > @@ -3274,7 +3273,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> > void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
> > {
> > void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
> > - trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
> > + trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
> > ret = kasan_kmalloc(s, ret, size, gfpflags);
> > return ret;
> > }
> > @@ -3285,8 +3284,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
> > {
> > void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
> >
> > - trace_kmem_cache_alloc(_RET_IP_, ret, s,
> > - s->object_size, s->size, gfpflags, node);
> > + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);
> >
> > return ret;
> > }
> > @@ -3299,8 +3297,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
> > {
> > void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
> >
> > - trace_kmalloc(_RET_IP_, ret, s,
> > - size, s->size, gfpflags, node);
> > + trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
> >
> > ret = kasan_kmalloc(s, ret, size, gfpflags);
> > return ret;
> > @@ -3544,7 +3541,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
> > s = cache_from_obj(s, x);
> > if (!s)
> > return;
> > - trace_kmem_cache_free(_RET_IP_, x, s->name);
> > + trace_kmem_cache_free(_RET_IP_, x, s);
> > slab_free(s, virt_to_slab(x), x, NULL, &x, 1, _RET_IP_);
> > }
> > EXPORT_SYMBOL(kmem_cache_free);
>

--
Thanks,
Hyeonggon

2022-08-02 09:36:21

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h

On Fri, Jul 29, 2022 at 01:47:23PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > __ksize() is only called by KASAN. Remove export symbol and move
> > definition to mm/slab.h as we don't want to grow its callers.
> >
> > Signed-off-by: Hyeonggon Yoo <[email protected]>
> > Reviewed-by: Vlastimil Babka <[email protected]>
>
> Nit: The subject and changelog should probably say declaration, not definition.

Ah, right. will do.
Thanks!

>
> > ---
> > include/linux/slab.h | 1 -
> > mm/slab.h | 2 ++
> > mm/slab_common.c | 11 +----------
> > mm/slob.c | 1 -
> > 4 files changed, 3 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 4ee5b2fed164..701fe538650f 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -187,7 +187,6 @@ int kmem_cache_shrink(struct kmem_cache *s);
> > void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
> > void kfree(const void *objp);
> > void kfree_sensitive(const void *objp);
> > -size_t __ksize(const void *objp);
> > size_t ksize(const void *objp);
> > #ifdef CONFIG_PRINTK
> > bool kmem_valid_obj(void *object);
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 9193e9c1f040..ad634e02b3cb 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -678,6 +678,8 @@ void free_large_kmalloc(struct folio *folio, void *object);
> >
> > #endif /* CONFIG_SLOB */
> >
> > +size_t __ksize(const void *objp);
> > +
> > static inline size_t slab_ksize(const struct kmem_cache *s)
> > {
> > #ifndef CONFIG_SLUB
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index c01c6b8f0d34..1f8db7959366 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1003,15 +1003,7 @@ void kfree(const void *object)
> > }
> > EXPORT_SYMBOL(kfree);
> >
> > -/**
> > - * __ksize -- Uninstrumented ksize.
> > - * @objp: pointer to the object
> > - *
> > - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> > - * safety checks as ksize() with KASAN instrumentation enabled.
> > - *
> > - * Return: size of the actual memory used by @objp in bytes
> > - */
> > +/* Uninstrumented ksize. Only called by KASAN. */
> > size_t __ksize(const void *object)
> > {
> > struct folio *folio;
> > @@ -1026,7 +1018,6 @@ size_t __ksize(const void *object)
> >
> > return slab_ksize(folio_slab(folio)->slab_cache);
> > }
> > -EXPORT_SYMBOL(__ksize);
> > #endif /* !CONFIG_SLOB */
> >
> > gfp_t kmalloc_fix_flags(gfp_t flags)
> > diff --git a/mm/slob.c b/mm/slob.c
> > index 97a4d2407f96..91d6e2b19929 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -584,7 +584,6 @@ size_t __ksize(const void *block)
> > m = (unsigned int *)(block - align);
> > return SLOB_UNITS(*m) * SLOB_UNIT;
> > }
> > -EXPORT_SYMBOL(__ksize);
> >
> > int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
> > {
>

--
Thanks,
Hyeonggon

2022-08-14 10:13:09

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] common kmalloc v3

On Fri, Jul 29, 2022 at 05:08:31PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > This is v3 of common kmalloc series.
> >
> > This series generalize most of kmalloc functions and move its
> > implementation into mm/slab_common.c.
> >
> > I believe this series give better maintainability of code for SLAB and SLUB.
> > Please consider applying.
> >
> > This series is based on slab/for-next and also available at
> > https://github.com/hygoni/linux/tree/slab-common-v3r0
> >
> > Any feedbacks/reviews will be appreciated.
>
> Hi, thanks for all your efforts. It's shaping up nicely so I think the next
> version will be ready to be added to -next after the 5.20 merge window.
> As I've finished the individual reviews, I'm looking at the result and see a
> bit more potential for cleanups, which could be perhaps incorporated to
> existing patches, or additionally:

Thank you for reviews and I too would like to add it to -next soon!

>
> - kmalloc_large_node_notrace() has only one caller, can be removed and the
> caller can call __kmalloc_large_node_notrace() directly, especially if it's
> not __always_inline as I've IIRC suggested.

Will adjust in next version.

> - kmem_cache_alloc_trace() and kmem_cache_alloc_node_trace() are weird ones,
> they are in fact for kmalloc despite the name.

Yeah, I'm the one that would like to rename it to kmalloc_trace() and
kmalloc_node_trace().

> They depend on
> CONFIG_TRACING, yet if you look carefully, the !CONFIG_TRACING variant also
> goes through a trace_* function. The actual difference seems that
> slab_alloc() thus kasan_alloc() and kfence_alloc() don't get the orig_size
> that way, which is dubious. It might be worth trying to unify this as well?
> E.g.
> - use only the CONFIG_TRACING variant, discard the other

Sounds okay.

> - declare it in mm/slab.h, this is not for general usage

We can't completely remove it because its needed in include/linux/slab.h
for inlined kmalloc.

> - single implementation in mm/slab_common.c that calls
> __kmem_cache_alloc_node() from SLAB/SLUB and does the trace

While I love the idea of single implementation in mm/slab_common.c,
making use of __kmem_cache_alloc_node() and __kmem_cache_free() adds
a bit of overhead:
it adds overhead of function call and can't benefit from inlining
(including removing unnnecessary part of function code)

So... what about including slab_common.c in sl?b.c,
so that compiler can treat sl?b.c and slab_common.c as a single translation unit?
(or split kmalloc implementation into kmalloc.c and do similar thing?)

Thanks!

---
Hyeonggon

diff --git a/mm/Makefile b/mm/Makefile
index 9a564f836403..bcee8495b531 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -3,7 +3,6 @@
# Makefile for the linux memory manager.
#

-KASAN_SANITIZE_slab_common.o := n
KASAN_SANITIZE_slab.o := n
KASAN_SANITIZE_slub.o := n
KCSAN_SANITIZE_kmemleak.o := n
@@ -11,7 +10,6 @@ KCSAN_SANITIZE_kmemleak.o := n
# These produce frequent data race reports: most of them are due to races on
# the same word but accesses to different bits of that word. Re-enable KCSAN
# for these when we have more consensus on what to do about them.
-KCSAN_SANITIZE_slab_common.o := n
KCSAN_SANITIZE_slab.o := n
KCSAN_SANITIZE_slub.o := n
KCSAN_SANITIZE_page_alloc.o := n
@@ -21,7 +19,6 @@ KCSAN_INSTRUMENT_BARRIERS := y
# These files are disabled because they produce non-interesting and/or
# flaky coverage that is not a function of syscall inputs. E.g. slab is out of
# free pages, or a task is migrated between nodes.
-KCOV_INSTRUMENT_slab_common.o := n
KCOV_INSTRUMENT_slob.o := n
KCOV_INSTRUMENT_slab.o := n
KCOV_INSTRUMENT_slub.o := n
@@ -51,8 +48,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
maccess.o page-writeback.o folio-compat.o \
readahead.o swap.o truncate.o vmscan.o shmem.o \
util.o mmzone.o vmstat.o backing-dev.o \
- mm_init.o percpu.o slab_common.o \
- compaction.o vmacache.o \
+ mm_init.o percpu.o compaction.o vmacache.o \
interval_tree.o list_lru.o workingset.o \
debug.o gup.o mmap_lock.o $(mmu-y)

diff --git a/mm/slab.c b/mm/slab.c
index a5486ff8362a..a302a7c17b40 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -131,6 +131,8 @@

#include "slab.h"

+#include "slab_common.c"
+
/*
* DEBUG - 1 for kmem_cache_create() to honour; SLAB_RED_ZONE & SLAB_POISON.
* 0 for faster, smaller code (especially in the critical paths).
@@ -3541,6 +3543,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
}
EXPORT_SYMBOL(kmem_cache_alloc_node);

+static __always_inline
void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
int nodeid, size_t orig_size,
unsigned long caller)
@@ -3585,6 +3588,7 @@ void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
local_irq_restore(flags);
}

+static __always_inline
void __kmem_cache_free(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
diff --git a/mm/slab.h b/mm/slab.h
index 65023f000d42..4e43ab717d99 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -273,11 +273,6 @@ void create_kmalloc_caches(slab_flags_t);

/* Find the kmalloc slab corresponding for a certain size */
struct kmem_cache *kmalloc_slab(size_t, gfp_t);
-
-void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
- int node, size_t orig_size,
- unsigned long caller);
-void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller);
#endif

gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 83cfe91b6ab6..70d7393d31ae 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -898,6 +898,12 @@ void free_large_kmalloc(struct folio *folio, void *object)
__free_pages(folio_page(folio, 0), order);
}

+static __always_inline
+void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
+ size_t orig_size, unsigned long caller);
+static __always_inline
+void __kmem_cache_free(struct kmem_cache *s, void *objp, unsigned long caller);
+
static void *__kmalloc_large_node(size_t size, gfp_t flags, int node);
static __always_inline
void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
diff --git a/mm/slob.c b/mm/slob.c
index 45a061b8ba38..656f4f8b77d9 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -74,6 +74,9 @@
#include <linux/atomic.h>

#include "slab.h"
+
+#include "slab_common.c"
+
/*
* slob_block has a field 'units', which indicates size of block if +ve,
* or offset of next block if -ve (in SLOB_UNITs).
diff --git a/mm/slub.c b/mm/slub.c
index 8083a6ee5f15..bd47a30049ae 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -45,6 +45,8 @@

#include "internal.h"

+#include "slab_common.c"
+
/*
* Lock order:
* 1. slab_mutex (Global Mutex)
@@ -3261,6 +3263,7 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
}
EXPORT_SYMBOL(kmem_cache_alloc_lru);

+static __always_inline
void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t orig_size,
unsigned long caller)
@@ -3505,6 +3508,7 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
}
#endif

+static __always_inline
void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
{
slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);

2022-08-15 13:09:09

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] common kmalloc v3

On 8/14/22 12:06, Hyeonggon Yoo wrote:
> On Fri, Jul 29, 2022 at 05:08:31PM +0200, Vlastimil Babka wrote:
>> On 7/12/22 15:39, Hyeonggon Yoo wrote:
>>> This is v3 of common kmalloc series.
>>>
>>> This series generalize most of kmalloc functions and move its
>>> implementation into mm/slab_common.c.
>>>
>>> I believe this series give better maintainability of code for SLAB and SLUB.
>>> Please consider applying.
>>>
>>> This series is based on slab/for-next and also available at
>>> https://github.com/hygoni/linux/tree/slab-common-v3r0
>>>
>>> Any feedbacks/reviews will be appreciated.
>>
>> Hi, thanks for all your efforts. It's shaping up nicely so I think the next
>> version will be ready to be added to -next after the 5.20 merge window.
>> As I've finished the individual reviews, I'm looking at the result and see a
>> bit more potential for cleanups, which could be perhaps incorporated to
>> existing patches, or additionally:
>
> Thank you for reviews and I too would like to add it to -next soon!
>
>>
>> - kmalloc_large_node_notrace() has only one caller, can be removed and the
>> caller can call __kmalloc_large_node_notrace() directly, especially if it's
>> not __always_inline as I've IIRC suggested.
>
> Will adjust in next version.
>
>> - kmem_cache_alloc_trace() and kmem_cache_alloc_node_trace() are weird ones,
>> they are in fact for kmalloc despite the name.
>
> Yeah, I'm the one that would like to rename it to kmalloc_trace() and
> kmalloc_node_trace().
>
>> They depend on
>> CONFIG_TRACING, yet if you look carefully, the !CONFIG_TRACING variant also
>> goes through a trace_* function. The actual difference seems that
>> slab_alloc() thus kasan_alloc() and kfence_alloc() don't get the orig_size
>> that way, which is dubious. It might be worth trying to unify this as well?
>> E.g.
>> - use only the CONFIG_TRACING variant, discard the other
>
> Sounds okay.
>
>> - declare it in mm/slab.h, this is not for general usage
>
> We can't completely remove it because its needed in include/linux/slab.h
> for inlined kmalloc.

Ah, ok.

>> - single implementation in mm/slab_common.c that calls
>> __kmem_cache_alloc_node() from SLAB/SLUB and does the trace
>
> While I love the idea of single implementation in mm/slab_common.c,
> making use of __kmem_cache_alloc_node() and __kmem_cache_free() adds
> a bit of overhead:
> it adds overhead of function call and can't benefit from inlining
> (including removing unnnecessary part of function code)

Hm, right.

> So... what about including slab_common.c in sl?b.c,
> so that compiler can treat sl?b.c and slab_common.c as a single translation unit?
> (or split kmalloc implementation into kmalloc.c and do similar thing?)

I don't know if that has a good precedent in the kernel. Maybe we can
postpone these more radical attempts to a later series.