2022-03-04 13:04:10

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v2 0/5] slab cleanups

Changes from v1:
Now SLAB passes requests larger than order-1 page
to page allocator.

Adjusted comments from Matthew, Vlastimil, Rientjes.
Thank you for feedback!

BTW, I have no idea what __ksize() should return when an object that
is not allocated from slab is passed. both 0 and folio_size()
seems wrong to me.

Hello, these are cleanup patches for slab.
Please consider them for slab-next :)

Any comments will be appreciated.
Thanks.

Hyeonggon Yoo (5):
mm/slab: kmalloc: pass requests larger than order-1 page to page
allocator
mm/sl[au]b: unify __ksize()
mm/sl[auo]b: move definition of __ksize() to mm/slab.h
mm/slub: limit number of node partial slabs only in cache creation
mm/slub: refactor deactivate_slab()

include/linux/slab.h | 36 ++++++------
mm/slab.c | 51 ++++++++---------
mm/slab.h | 21 +++++++
mm/slab_common.c | 20 +++++++
mm/slob.c | 1 -
mm/slub.c | 130 ++++++++++++-------------------------------
6 files changed, 114 insertions(+), 145 deletions(-)

--
2.33.1


2022-03-04 14:10:32

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] slab cleanups

On Fri, 4 Mar 2022 at 13:02, Hyeonggon Yoo <[email protected]> wrote:
>
> On Fri, Mar 04, 2022 at 12:50:21PM +0100, Marco Elver wrote:
> > On Fri, 4 Mar 2022 at 07:34, Hyeonggon Yoo <[email protected]> wrote:
> > >
> > > Changes from v1:
> > > Now SLAB passes requests larger than order-1 page
> > > to page allocator.
> > >
> > > Adjusted comments from Matthew, Vlastimil, Rientjes.
> > > Thank you for feedback!
> > >
> > > BTW, I have no idea what __ksize() should return when an object that
> > > is not allocated from slab is passed. both 0 and folio_size()
> > > seems wrong to me.
> >
> > Didn't we say 0 would be the safer of the two options?
> > https://lkml.kernel.org/r/[email protected]
> >
>
> Oh sorry, I didn't understand why 0 was safer when I was reading it.
>
> Reading again, 0 is safer because kasan does not unpoison for
> wrongly passed object, right?

Not quite. KASAN can tell if something is wrong, i.e. invalid object.
Similarly, if you are able to tell if the passed pointer is not a
valid object some other way, you can do something better - namely,
return 0. The intuition here is that the caller has a pointer to an
invalid object, and wants to use ksize() to determine its size, and
most likely access all those bytes. Arguably, at that point the kernel
is already in a degrading state. But we can try to not let things get
worse by having ksize() return 0, in the hopes that it will stop
corrupting more memory. It won't work in all cases, but should avoid
things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
bounds the memory accessed corrupting random memory.

The other reason is that a caller could actually check the size, and
if 0, do something else. Few callers will do so, because nobody
expects that their code has a bug. :-)

2022-03-04 15:11:33

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] slab cleanups

On Fri, Mar 04, 2022 at 12:50:21PM +0100, Marco Elver wrote:
> On Fri, 4 Mar 2022 at 07:34, Hyeonggon Yoo <[email protected]> wrote:
> >
> > Changes from v1:
> > Now SLAB passes requests larger than order-1 page
> > to page allocator.
> >
> > Adjusted comments from Matthew, Vlastimil, Rientjes.
> > Thank you for feedback!
> >
> > BTW, I have no idea what __ksize() should return when an object that
> > is not allocated from slab is passed. both 0 and folio_size()
> > seems wrong to me.
>
> Didn't we say 0 would be the safer of the two options?
> https://lkml.kernel.org/r/[email protected]
>

Oh sorry, I didn't understand why 0 was safer when I was reading it.

Reading again, 0 is safer because kasan does not unpoison for
wrongly passed object, right?

> > Hello, these are cleanup patches for slab.
> > Please consider them for slab-next :)
> >
> > Any comments will be appreciated.
> > Thanks.
> >
> > Hyeonggon Yoo (5):
> > mm/slab: kmalloc: pass requests larger than order-1 page to page
> > allocator
> > mm/sl[au]b: unify __ksize()
> > mm/sl[auo]b: move definition of __ksize() to mm/slab.h
> > mm/slub: limit number of node partial slabs only in cache creation
> > mm/slub: refactor deactivate_slab()
> >
> > include/linux/slab.h | 36 ++++++------
> > mm/slab.c | 51 ++++++++---------
> > mm/slab.h | 21 +++++++
> > mm/slab_common.c | 20 +++++++
> > mm/slob.c | 1 -
> > mm/slub.c | 130 ++++++++++++-------------------------------
> > 6 files changed, 114 insertions(+), 145 deletions(-)
> >
> > --
> > 2.33.1
> >

--
Thank you, You are awesome!
Hyeonggon :-)

2022-03-04 16:14:50

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] slab cleanups

On Fri, 4 Mar 2022 at 07:34, Hyeonggon Yoo <[email protected]> wrote:
>
> Changes from v1:
> Now SLAB passes requests larger than order-1 page
> to page allocator.
>
> Adjusted comments from Matthew, Vlastimil, Rientjes.
> Thank you for feedback!
>
> BTW, I have no idea what __ksize() should return when an object that
> is not allocated from slab is passed. both 0 and folio_size()
> seems wrong to me.

Didn't we say 0 would be the safer of the two options?
https://lkml.kernel.org/r/[email protected]

> Hello, these are cleanup patches for slab.
> Please consider them for slab-next :)
>
> Any comments will be appreciated.
> Thanks.
>
> Hyeonggon Yoo (5):
> mm/slab: kmalloc: pass requests larger than order-1 page to page
> allocator
> mm/sl[au]b: unify __ksize()
> mm/sl[auo]b: move definition of __ksize() to mm/slab.h
> mm/slub: limit number of node partial slabs only in cache creation
> mm/slub: refactor deactivate_slab()
>
> include/linux/slab.h | 36 ++++++------
> mm/slab.c | 51 ++++++++---------
> mm/slab.h | 21 +++++++
> mm/slab_common.c | 20 +++++++
> mm/slob.c | 1 -
> mm/slub.c | 130 ++++++++++++-------------------------------
> 6 files changed, 114 insertions(+), 145 deletions(-)
>
> --
> 2.33.1
>

2022-03-04 17:36:41

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] slab cleanups

On Fri, 4 Mar 2022 at 17:42, Vlastimil Babka <[email protected]> wrote:
>
> On 3/4/22 14:11, Marco Elver wrote:
> > On Fri, 4 Mar 2022 at 13:02, Hyeonggon Yoo <[email protected]> wrote:
> >>
> >> On Fri, Mar 04, 2022 at 12:50:21PM +0100, Marco Elver wrote:
> >> > On Fri, 4 Mar 2022 at 07:34, Hyeonggon Yoo <[email protected]> wrote:
> >> > >
> >> > > Changes from v1:
> >> > > Now SLAB passes requests larger than order-1 page
> >> > > to page allocator.
> >> > >
> >> > > Adjusted comments from Matthew, Vlastimil, Rientjes.
> >> > > Thank you for feedback!
> >> > >
> >> > > BTW, I have no idea what __ksize() should return when an object that
> >> > > is not allocated from slab is passed. both 0 and folio_size()
> >> > > seems wrong to me.
> >> >
> >> > Didn't we say 0 would be the safer of the two options?
> >> > https://lkml.kernel.org/r/[email protected]
> >> >
> >>
> >> Oh sorry, I didn't understand why 0 was safer when I was reading it.
> >>
> >> Reading again, 0 is safer because kasan does not unpoison for
> >> wrongly passed object, right?
> >
> > Not quite. KASAN can tell if something is wrong, i.e. invalid object.
> > Similarly, if you are able to tell if the passed pointer is not a
> > valid object some other way, you can do something better - namely,
> > return 0.
>
> Hmm, but how paranoid do we have to be? Patch 1 converts SLAB to use
> kmalloc_large(). So it's now legitimate to have objects allocated by SLAB's
> kmalloc() that don't have a slab folio flag set, and their size is
> folio_size(). It would be more common than getting a bogus pointer, so
> should we return 0 just because a bogus pointer is possible?

No of course not, which is why I asked in the earlier email if it's a
"definitive failure case".

> If we do that,
> then KASAN will fail to unpoison legitimate kmalloc_large() objects, no?
> What I suggested earlier is we could make the checks more precise - if
> folio_size() is smaller or equal order-1 page, then it's bogus because we
> only do kmalloc_large() for >order-1. If the object pointer is not to the
> beginning of the folio, then it's bogus, because kmalloc_large() returns the
> beginning of the folio. Then in these case we return 0, but otherwise we
> should return folio_size()?
>
> > The intuition here is that the caller has a pointer to an
> > invalid object, and wants to use ksize() to determine its size, and
> > most likely access all those bytes. Arguably, at that point the kernel
> > is already in a degrading state. But we can try to not let things get
> > worse by having ksize() return 0, in the hopes that it will stop
> > corrupting more memory. It won't work in all cases, but should avoid
> > things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> > bounds the memory accessed corrupting random memory.
> >
> > The other reason is that a caller could actually check the size, and
> > if 0, do something else. Few callers will do so, because nobody
> > expects that their code has a bug. :-)
>

2022-03-04 19:02:28

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v2 2/5] mm/sl[au]b: unify __ksize()

Now that SLAB passes large requests to page allocator like SLUB,
Unify __ksize(). Only SLOB need to implement own version of __ksize()
because it stores size in object header for kmalloc objects.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/slab.c | 30 ------------------------------
mm/slab_common.c | 27 +++++++++++++++++++++++++++
mm/slub.c | 16 ----------------
3 files changed, 27 insertions(+), 46 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 570af6dc3478..3ddf2181d8e4 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4216,33 +4216,3 @@ void __check_heap_object(const void *ptr, unsigned long n,
usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
}
#endif /* CONFIG_HARDENED_USERCOPY */
-
-/**
- * __ksize -- Uninstrumented ksize.
- * @objp: pointer to the object
- *
- * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- * safety checks as ksize() with KASAN instrumentation enabled.
- *
- * Return: size of the actual memory used by @objp in bytes
- */
-size_t __ksize(const void *objp)
-{
- struct kmem_cache *c;
- struct folio *folio;
- size_t size;
-
- BUG_ON(!objp);
- if (unlikely(objp == ZERO_SIZE_PTR))
- return 0;
-
- folio = virt_to_folio(objp);
- if (!folio_test_slab(folio))
- return folio_size(folio);
-
- c = virt_to_cache(objp);
- size = c ? c->object_size : 0;
-
- return size;
-}
-EXPORT_SYMBOL(__ksize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 23f2ab0713b7..1d2f92e871d2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1245,6 +1245,33 @@ void kfree_sensitive(const void *p)
}
EXPORT_SYMBOL(kfree_sensitive);

+#ifndef CONFIG_SLOB
+/**
+ * __ksize -- Uninstrumented ksize.
+ * @objp: pointer to the object
+ *
+ * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
+ * safety checks as ksize() with KASAN instrumentation enabled.
+ *
+ * Return: size of the actual memory used by @objp in bytes
+ */
+size_t __ksize(const void *object)
+{
+ struct folio *folio;
+
+ if (unlikely(object == ZERO_SIZE_PTR))
+ return 0;
+
+ folio = virt_to_folio(object);
+
+ if (unlikely(!folio_test_slab(folio)))
+ return folio_size(folio);
+
+ return slab_ksize(folio_slab(folio)->slab_cache);
+}
+EXPORT_SYMBOL(__ksize);
+#endif
+
/**
* ksize - get the actual amount of memory allocated for a given object
* @objp: Pointer to the object
diff --git a/mm/slub.c b/mm/slub.c
index 04fd084f4709..6f0ebadd8f30 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4507,22 +4507,6 @@ void __check_heap_object(const void *ptr, unsigned long n,
}
#endif /* CONFIG_HARDENED_USERCOPY */

-size_t __ksize(const void *object)
-{
- struct folio *folio;
-
- if (unlikely(object == ZERO_SIZE_PTR))
- return 0;
-
- folio = virt_to_folio(object);
-
- if (unlikely(!folio_test_slab(folio)))
- return folio_size(folio);
-
- return slab_ksize(folio_slab(folio)->slab_cache);
-}
-EXPORT_SYMBOL(__ksize);
-
void kfree(const void *x)
{
struct folio *folio;
--
2.33.1

2022-03-04 19:21:51

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] slab cleanups

On 3/4/22 14:11, Marco Elver wrote:
> On Fri, 4 Mar 2022 at 13:02, Hyeonggon Yoo <[email protected]> wrote:
>>
>> On Fri, Mar 04, 2022 at 12:50:21PM +0100, Marco Elver wrote:
>> > On Fri, 4 Mar 2022 at 07:34, Hyeonggon Yoo <[email protected]> wrote:
>> > >
>> > > Changes from v1:
>> > > Now SLAB passes requests larger than order-1 page
>> > > to page allocator.
>> > >
>> > > Adjusted comments from Matthew, Vlastimil, Rientjes.
>> > > Thank you for feedback!
>> > >
>> > > BTW, I have no idea what __ksize() should return when an object that
>> > > is not allocated from slab is passed. both 0 and folio_size()
>> > > seems wrong to me.
>> >
>> > Didn't we say 0 would be the safer of the two options?
>> > https://lkml.kernel.org/r/[email protected]
>> >
>>
>> Oh sorry, I didn't understand why 0 was safer when I was reading it.
>>
>> Reading again, 0 is safer because kasan does not unpoison for
>> wrongly passed object, right?
>
> Not quite. KASAN can tell if something is wrong, i.e. invalid object.
> Similarly, if you are able to tell if the passed pointer is not a
> valid object some other way, you can do something better - namely,
> return 0.

Hmm, but how paranoid do we have to be? Patch 1 converts SLAB to use
kmalloc_large(). So it's now legitimate to have objects allocated by SLAB's
kmalloc() that don't have a slab folio flag set, and their size is
folio_size(). It would be more common than getting a bogus pointer, so
should we return 0 just because a bogus pointer is possible? If we do that,
then KASAN will fail to unpoison legitimate kmalloc_large() objects, no?
What I suggested earlier is we could make the checks more precise - if
folio_size() is smaller or equal order-1 page, then it's bogus because we
only do kmalloc_large() for >order-1. If the object pointer is not to the
beginning of the folio, then it's bogus, because kmalloc_large() returns the
beginning of the folio. Then in these case we return 0, but otherwise we
should return folio_size()?

> The intuition here is that the caller has a pointer to an
> invalid object, and wants to use ksize() to determine its size, and
> most likely access all those bytes. Arguably, at that point the kernel
> is already in a degrading state. But we can try to not let things get
> worse by having ksize() return 0, in the hopes that it will stop
> corrupting more memory. It won't work in all cases, but should avoid
> things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> bounds the memory accessed corrupting random memory.
>
> The other reason is that a caller could actually check the size, and
> if 0, do something else. Few callers will do so, because nobody
> expects that their code has a bug. :-)

2022-03-04 20:42:15

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] mm/sl[au]b: unify __ksize()

On 3/4/22 07:34, Hyeonggon Yoo wrote:
> Now that SLAB passes large requests to page allocator like SLUB,
> Unify __ksize(). Only SLOB need to implement own version of __ksize()
> because it stores size in object header for kmalloc objects.
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

As discussed, we can be more specific about the !folio_test_slab() case, but
that can be done on top.

2022-03-05 05:12:56

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] slab cleanups

On Fri, Mar 04, 2022 at 02:11:50PM +0100, Marco Elver wrote:
> On Fri, 4 Mar 2022 at 13:02, Hyeonggon Yoo <[email protected]> wrote:
> >
> > On Fri, Mar 04, 2022 at 12:50:21PM +0100, Marco Elver wrote:
> > > On Fri, 4 Mar 2022 at 07:34, Hyeonggon Yoo <[email protected]> wrote:
> > > >
> > > > Changes from v1:
> > > > Now SLAB passes requests larger than order-1 page
> > > > to page allocator.
> > > >
> > > > Adjusted comments from Matthew, Vlastimil, Rientjes.
> > > > Thank you for feedback!
> > > >
> > > > BTW, I have no idea what __ksize() should return when an object that
> > > > is not allocated from slab is passed. both 0 and folio_size()
> > > > seems wrong to me.
> > >
> > > Didn't we say 0 would be the safer of the two options?
> > > https://lkml.kernel.org/r/[email protected]
> > >
> >
> > Oh sorry, I didn't understand why 0 was safer when I was reading it.
> >
> > Reading again, 0 is safer because kasan does not unpoison for
> > wrongly passed object, right?
>
> Not quite. KASAN can tell if something is wrong, i.e. invalid object.
> Similarly, if you are able to tell if the passed pointer is not a
> valid object some other way, you can do something better - namely,
> return 0.
>
> The intuition here is that the caller has a pointer to an
> invalid object, and wants to use ksize() to determine its size, and
> most likely access all those bytes. Arguably, at that point the kernel
> is already in a degrading state. But we can try to not let things get
> worse by having ksize() return 0, in the hopes that it will stop
> corrupting more memory. It won't work in all cases, but should avoid
> things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> bounds the memory accessed corrupting random memory.

Oh, it's to prevent to corrupt memory further in failure case,
like memset(obj, 0, s);

> The other reason is that a caller could actually check the size, and
> if 0, do something else. Few callers will do so, because nobody
> expects that their code has a bug. :-)

and making it able to check errors by caller.
Thank you so much for kind explanation.

I'll add what Vlastimil suggested in next series. Thanks!

--
Thank you, You are awesome!
Hyeonggon :-)

2022-03-05 09:02:51

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] mm/sl[au]b: unify __ksize()

On Fri, Mar 04, 2022 at 07:25:47PM +0100, Vlastimil Babka wrote:
> On 3/4/22 07:34, Hyeonggon Yoo wrote:
> > Now that SLAB passes large requests to page allocator like SLUB,
> > Unify __ksize(). Only SLOB need to implement own version of __ksize()
> > because it stores size in object header for kmalloc objects.
> >
> > Signed-off-by: Hyeonggon Yoo <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
>
> As discussed, we can be more specific about the !folio_test_slab() case, but
> that can be done on top.

I'll add that in v3. Thanks!

--
Thank you, You are awesome!
Hyeonggon :-)