2022-08-28 15:51:27

by Dawei Li

[permalink] [raw]
Subject: [PATCH] mm: simplify size2index conversion of __kmalloc_index

Current size2index is implemented by one to one hardcode mapping,
which can be improved by order_base_2().
Must be careful to not violate compile-time optimization rule.

Generated code for caller of kmalloc:
48 8b 3d 9f 0b 6b 01 mov 0x16b0b9f(%rip),%rdi
# ffffffff826d1568 <kmalloc_caches+0x48>
ba 08 01 00 00 mov $0x108,%edx
be c0 0d 00 00 mov $0xdc0,%esi
e8 98 d7 2e 00 callq ffffffff8130e170 <kmem_cache_alloc_trace>

Signed-off-by: Dawei Li <[email protected]>
---
include/linux/slab.h | 34 +++++++++-------------------------
1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0fefdf528e0d..66452a4357c6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -17,7 +17,7 @@
#include <linux/types.h>
#include <linux/workqueue.h>
#include <linux/percpu-refcount.h>
-
+#include <linux/log2.h>

/*
* Flags to pass to kmem_cache_create().
@@ -394,31 +394,16 @@ static __always_inline unsigned int __kmalloc_index(size_t size,

if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96)
return 1;
+
if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192)
return 2;
- if (size <= 8) return 3;
- if (size <= 16) return 4;
- if (size <= 32) return 5;
- if (size <= 64) return 6;
- if (size <= 128) return 7;
- if (size <= 256) return 8;
- if (size <= 512) return 9;
- if (size <= 1024) return 10;
- if (size <= 2 * 1024) return 11;
- if (size <= 4 * 1024) return 12;
- if (size <= 8 * 1024) return 13;
- if (size <= 16 * 1024) return 14;
- if (size <= 32 * 1024) return 15;
- if (size <= 64 * 1024) return 16;
- if (size <= 128 * 1024) return 17;
- if (size <= 256 * 1024) return 18;
- if (size <= 512 * 1024) return 19;
- if (size <= 1024 * 1024) return 20;
- if (size <= 2 * 1024 * 1024) return 21;
- if (size <= 4 * 1024 * 1024) return 22;
- if (size <= 8 * 1024 * 1024) return 23;
- if (size <= 16 * 1024 * 1024) return 24;
- if (size <= 32 * 1024 * 1024) return 25;
+
+ if (size <= 8)
+ return 3;
+
+ /* Following compile-time optimization rule is mandatory. */
+ if (size <= 32 * 1024 * 1024)
+ return order_base_2(size);

if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
@@ -700,7 +685,6 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t
return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
}

-
#ifdef CONFIG_NUMA
extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
unsigned long caller) __alloc_size(1);
--
2.25.1


2022-08-29 03:41:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm: simplify size2index conversion of __kmalloc_index

On Sun, Aug 28, 2022 at 11:14:48PM +0800, Dawei Li wrote:
> Current size2index is implemented by one to one hardcode mapping,
> which can be improved by order_base_2().
> Must be careful to not violate compile-time optimization rule.

This patch has been NACKed before (when submitted by other people).

2022-08-29 04:02:22

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] mm: simplify size2index conversion of __kmalloc_index

On Sun, Aug 28, 2022 at 11:14:48PM +0800, Dawei Li wrote:
> Current size2index is implemented by one to one hardcode mapping,
> which can be improved by order_base_2().
> Must be careful to not violate compile-time optimization rule.

> Generated code for caller of kmalloc:
> 48 8b 3d 9f 0b 6b 01 mov 0x16b0b9f(%rip),%rdi
> # ffffffff826d1568 <kmalloc_caches+0x48>
> ba 08 01 00 00 mov $0x108,%edx
> be c0 0d 00 00 mov $0xdc0,%esi
> e8 98 d7 2e 00 callq ffffffff8130e170 <kmem_cache_alloc_trace>
>
> Signed-off-by: Dawei Li <[email protected]>
> ---
> include/linux/slab.h | 34 +++++++++-------------------------
> 1 file changed, 9 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 0fefdf528e0d..66452a4357c6 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -17,7 +17,7 @@
> #include <linux/types.h>
> #include <linux/workqueue.h>
> #include <linux/percpu-refcount.h>
> -
> +#include <linux/log2.h>
>
> /*
> * Flags to pass to kmem_cache_create().
> @@ -394,31 +394,16 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>
> if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96)
> return 1;
> +
> if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192)
> return 2;
> - if (size <= 8) return 3;
> - if (size <= 16) return 4;
> - if (size <= 32) return 5;
> - if (size <= 64) return 6;
> - if (size <= 128) return 7;
> - if (size <= 256) return 8;
> - if (size <= 512) return 9;
> - if (size <= 1024) return 10;
> - if (size <= 2 * 1024) return 11;
> - if (size <= 4 * 1024) return 12;
> - if (size <= 8 * 1024) return 13;
> - if (size <= 16 * 1024) return 14;
> - if (size <= 32 * 1024) return 15;
> - if (size <= 64 * 1024) return 16;
> - if (size <= 128 * 1024) return 17;
> - if (size <= 256 * 1024) return 18;
> - if (size <= 512 * 1024) return 19;
> - if (size <= 1024 * 1024) return 20;
> - if (size <= 2 * 1024 * 1024) return 21;
> - if (size <= 4 * 1024 * 1024) return 22;
> - if (size <= 8 * 1024 * 1024) return 23;
> - if (size <= 16 * 1024 * 1024) return 24;
> - if (size <= 32 * 1024 * 1024) return 25;

It does not apply. better rebase it on Vlastimil's slab tree (for-next branch)
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/

> +
> + if (size <= 8)
> + return 3;
> +
> + /* Following compile-time optimization rule is mandatory. */
> + if (size <= 32 * 1024 * 1024)
> + return order_base_2(size);

Oh, it seems order_base_2() does compile-time opitmization as well.

With order_base_2(), what about using KMALLOC_MAX_CACHE_SIZE instead of 32 * 1024 * 1024?
I think that would be more robust.

Hmm also better check if it works okay with kfence tests (it passes non-constant value)
let't check if it breaks after rebase.

Thanks!

>
> if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
> @@ -700,7 +685,6 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t
> return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
> }
>
> -
> #ifdef CONFIG_NUMA
> extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
> unsigned long caller) __alloc_size(1);
> --
> 2.25.1
>

--
Thanks,
Hyeonggon

2022-08-29 05:26:00

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] mm: simplify size2index conversion of __kmalloc_index

On Mon, Aug 29, 2022 at 04:11:04AM +0100, Matthew Wilcox wrote:
> On Sun, Aug 28, 2022 at 11:14:48PM +0800, Dawei Li wrote:
> > Current size2index is implemented by one to one hardcode mapping,
> > which can be improved by order_base_2().
> > Must be careful to not violate compile-time optimization rule.
>
> This patch has been NACKed before (when submitted by other people).


Hmm right.
https://lkml.iu.edu/hypermail/linux/kernel/1606.2/05402.html

Christoph Lameter wrote:
> On Wed, 22 Jun 2016, Yury Norov wrote:
> > There will be no fls() for constant at runtime because ilog2() calculates
> > constant values at compile-time as well. From this point of view,
> > this patch removes code duplication, as we already have compile-time
> > log() calculation in kernel, and should re-use it whenever possible.\

> The reason not to use ilog there was that the constant folding did not
> work correctly with one or the other architectures/compilers. If you want
> to do this then please verify that all arches reliably do produce a
> constant there.

Can we re-evaluate this?

--
Thanks,
Hyeonggon

2022-08-29 14:34:40

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm: simplify size2index conversion of __kmalloc_index

On 8/29/22 05:36, Hyeonggon Yoo wrote:
> On Mon, Aug 29, 2022 at 04:11:04AM +0100, Matthew Wilcox wrote:
>> On Sun, Aug 28, 2022 at 11:14:48PM +0800, Dawei Li wrote:
>> > Current size2index is implemented by one to one hardcode mapping,
>> > which can be improved by order_base_2().
>> > Must be careful to not violate compile-time optimization rule.
>>
>> This patch has been NACKed before (when submitted by other people).
>
>
> Hmm right.
> https://lkml.iu.edu/hypermail/linux/kernel/1606.2/05402.html
>
> Christoph Lameter wrote:
>> On Wed, 22 Jun 2016, Yury Norov wrote:
>> > There will be no fls() for constant at runtime because ilog2() calculates
>> > constant values at compile-time as well. From this point of view,
>> > this patch removes code duplication, as we already have compile-time
>> > log() calculation in kernel, and should re-use it whenever possible.\
>
>> The reason not to use ilog there was that the constant folding did not
>> work correctly with one or the other architectures/compilers. If you want
>> to do this then please verify that all arches reliably do produce a
>> constant there.
>
> Can we re-evaluate this?

Is there a way to turn inability of compile-time calculation to a
compile-time error? (when size_is_constant=true etc). Then we could try and
see if anything breaks in -next.

2022-08-29 16:09:34

by Dawei Li

[permalink] [raw]
Subject: 回复: [PATCH] mm: simplify size2index convers ion of __kmalloc_index

Interesting, I will see what I can do about it.
Just curious, could for-next testing cover all architectures and compilers?
Thanks for all the insightful comments from you guys, that's very helpful.

________________________________________
??????: Vlastimil Babka <[email protected]>
????ʱ??: 2022??8??29?? 22:21
?ռ???: Hyeonggon Yoo; Matthew Wilcox
????: Dawei Li; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
????: Re: [PATCH] mm: simplify size2index conversion of __kmalloc_index

On 8/29/22 05:36, Hyeonggon Yoo wrote:
> On Mon, Aug 29, 2022 at 04:11:04AM +0100, Matthew Wilcox wrote:
>> On Sun, Aug 28, 2022 at 11:14:48PM +0800, Dawei Li wrote:
>> > Current size2index is implemented by one to one hardcode mapping,
>> > which can be improved by order_base_2().
>> > Must be careful to not violate compile-time optimization rule.
>>
>> This patch has been NACKed before (when submitted by other people).
>
>
> Hmm right.
> https://lkml.iu.edu/hypermail/linux/kernel/1606.2/05402.html
>
> Christoph Lameter wrote:
>> On Wed, 22 Jun 2016, Yury Norov wrote:
>> > There will be no fls() for constant at runtime because ilog2() calculates
>> > constant values at compile-time as well. From this point of view,
>> > this patch removes code duplication, as we already have compile-time
>> > log() calculation in kernel, and should re-use it whenever possible.\
>
>> The reason not to use ilog there was that the constant folding did not
>> work correctly with one or the other architectures/compilers. If you want
>> to do this then please verify that all arches reliably do produce a
>> constant there.
>
> Can we re-evaluate this?

Is there a way to turn inability of compile-time calculation to a
compile-time error? (when size_is_constant=true etc). Then we could try and
see if anything breaks in -next.

2022-08-30 06:00:35

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] mm: simplify size2index conversion of __kmalloc_index



Le 29/08/2022 à 16:21, Vlastimil Babka a écrit :
> On 8/29/22 05:36, Hyeonggon Yoo wrote:
>> On Mon, Aug 29, 2022 at 04:11:04AM +0100, Matthew Wilcox wrote:
>>> On Sun, Aug 28, 2022 at 11:14:48PM +0800, Dawei Li wrote:
>>>> Current size2index is implemented by one to one hardcode mapping,
>>>> which can be improved by order_base_2().
>>>> Must be careful to not violate compile-time optimization rule.
>>>
>>> This patch has been NACKed before (when submitted by other people).
>>
>>
>> Hmm right.
>> https://lkml.iu.edu/hypermail/linux/kernel/1606.2/05402.html
>>
>> Christoph Lameter wrote:
>>> On Wed, 22 Jun 2016, Yury Norov wrote:
>>>> There will be no fls() for constant at runtime because ilog2() calculates
>>>> constant values at compile-time as well. From this point of view,
>>>> this patch removes code duplication, as we already have compile-time
>>>> log() calculation in kernel, and should re-use it whenever possible.\
>>
>>> The reason not to use ilog there was that the constant folding did not
>>> work correctly with one or the other architectures/compilers. If you want
>>> to do this then please verify that all arches reliably do produce a
>>> constant there.
>>
>> Can we re-evaluate this?
>
> Is there a way to turn inability of compile-time calculation to a
> compile-time error? (when size_is_constant=true etc). Then we could try and
> see if anything breaks in -next.
>
>

The following will generate a build error if the function
constant_check() is not called with a buildtime constant argument.

static void __always_inline constant_check(unsigned long val)
{
BUILD_BUG_ON(!__builtin_constant_p(val));
}

Is that what you are looking for ?

2022-08-30 13:25:26

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm: simplify size2index conversion of __kmalloc_index

On 8/30/22 07:51, Christophe Leroy wrote:
>
>
> Le 29/08/2022 à 16:21, Vlastimil Babka a écrit :
>> On 8/29/22 05:36, Hyeonggon Yoo wrote:
>>> On Mon, Aug 29, 2022 at 04:11:04AM +0100, Matthew Wilcox wrote:
>>>> On Sun, Aug 28, 2022 at 11:14:48PM +0800, Dawei Li wrote:
>>>>> Current size2index is implemented by one to one hardcode mapping,
>>>>> which can be improved by order_base_2().
>>>>> Must be careful to not violate compile-time optimization rule.
>>>>
>>>> This patch has been NACKed before (when submitted by other people).
>>>
>>>
>>> Hmm right.
>>> https://lkml.iu.edu/hypermail/linux/kernel/1606.2/05402.html
>>>
>>> Christoph Lameter wrote:
>>>> On Wed, 22 Jun 2016, Yury Norov wrote:
>>>>> There will be no fls() for constant at runtime because ilog2() calculates
>>>>> constant values at compile-time as well. From this point of view,
>>>>> this patch removes code duplication, as we already have compile-time
>>>>> log() calculation in kernel, and should re-use it whenever possible.\
>>>
>>>> The reason not to use ilog there was that the constant folding did not
>>>> work correctly with one or the other architectures/compilers. If you want
>>>> to do this then please verify that all arches reliably do produce a
>>>> constant there.
>>>
>>> Can we re-evaluate this?
>>
>> Is there a way to turn inability of compile-time calculation to a
>> compile-time error? (when size_is_constant=true etc). Then we could try and
>> see if anything breaks in -next.
>>
>>
>
> The following will generate a build error if the function
> constant_check() is not called with a buildtime constant argument.
>
> static void __always_inline constant_check(unsigned long val)
> {
> BUILD_BUG_ON(!__builtin_constant_p(val));
> }
>
> Is that what you are looking for ?
Maybe, if we can rely on these two being equivalent:
- __kmalloc_index(x) is evaluated compile-time
- __builtin_constant_p(__kmalloc_index(x)) is true

Logically such equivalency should be expected, and a quick attempt
locally with recent gcc seems to work fine, but I guess we'll have to
try in -next for a bit and see if anything comes out.