2024-03-31 02:43:56

by Song, Xiongwei

[permalink] [raw]
Subject: [PATCH 0/4] SLUB: improve filling cpu partial a bit in get_partial_node()

From: Xiongwei Song <[email protected]>

This series is to remove the unnecessary check for filling cpu partial
and improve the readability.

Introduce slub_get_cpu_partial() and dummy function to prevent compiler
warning with CONFIG_SLUB_CPU_PARTIAL disabled. This is done in patch 2.
Use the helper in patch 3 and 4.

No functionality changed.

Actually, the series is the improvement of patch below:
https://lore.kernel.org/lkml/[email protected]/T/

Regards,
Xiongwei

Xiongwei Song (4):
mm/slub: remove the check of !kmem_cache_has_cpu_partial()
mm/slub: add slub_get_cpu_partial() helper
mm/slub: simpilify get_partial_node()
mm/slub: don't read slab->cpu_partial_slabs directly

mm/slub.c | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)

--
2.27.0



2024-03-31 02:51:32

by Song, Xiongwei

[permalink] [raw]
Subject: [PATCH 3/4] mm/slub: simplify get_partial_node()

From: Xiongwei Song <[email protected]>

The break conditions can be more readable and simple.

We can check if we need to fill cpu partial after getting the first
partial slab. If kmem_cache_has_cpu_partial() returns true, we fill
cpu partial from next iteration, or break up the loop.

Then we can remove the preprocessor condition of
CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make
compiler silent.

Signed-off-by: Xiongwei Song <[email protected]>
---
mm/slub.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 590cc953895d..ec91c7435d4e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s,
if (!partial) {
partial = slab;
stat(s, ALLOC_FROM_PARTIAL);
- } else {
- put_cpu_partial(s, slab, 0);
- stat(s, CPU_PARTIAL_NODE);
- partial_slabs++;
+
+ /* Fill cpu partial if needed from next iteration, or break */
+ if (kmem_cache_has_cpu_partial(s))
+ continue;
+ else
+ break;
}
-#ifdef CONFIG_SLUB_CPU_PARTIAL
- if (partial_slabs > s->cpu_partial_slabs / 2)
- break;
-#else
- break;
-#endif

+ put_cpu_partial(s, slab, 0);
+ stat(s, CPU_PARTIAL_NODE);
+ partial_slabs++;
+
+ if (partial_slabs > slub_get_cpu_partial(s) / 2)
+ break;
}
spin_unlock_irqrestore(&n->list_lock, flags);
return partial;
--
2.27.0


2024-03-31 03:01:24

by Song, Xiongwei

[permalink] [raw]
Subject: [PATCH 1/4] mm/slub: remove the check of !kmem_cache_has_cpu_partial()

From: Xiongwei Song <[email protected]>

The check of !kmem_cache_has_cpu_partial(s) with
CONFIG_SLUB_CPU_PARTIAL enabled here is always false. We have known the
result by calling kmem_cacke_debug(). Here we can remove it.

Signed-off-by: Xiongwei Song <[email protected]>
---
mm/slub.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1bb2a93cf7b6..059922044a4f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2610,8 +2610,7 @@ static struct slab *get_partial_node(struct kmem_cache *s,
partial_slabs++;
}
#ifdef CONFIG_SLUB_CPU_PARTIAL
- if (!kmem_cache_has_cpu_partial(s)
- || partial_slabs > s->cpu_partial_slabs / 2)
+ if (partial_slabs > s->cpu_partial_slabs / 2)
break;
#else
break;
--
2.27.0


2024-03-31 03:05:13

by Song, Xiongwei

[permalink] [raw]
Subject: [PATCH 2/4] mm/slub: add slub_get_cpu_partial() helper

From: Xiongwei Song <[email protected]>

Add slub_get_cpu_partial() and dummy function to help improve
get_partial_node(). It can prevent compile error when accessing
cpu_partial_slabs with CONFIG_SLUB_CPU_PARTIAL disabled.

Signed-off-by: Xiongwei Song <[email protected]>
---
mm/slub.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 059922044a4f..590cc953895d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -604,11 +604,21 @@ static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects)
nr_slabs = DIV_ROUND_UP(nr_objects * 2, oo_objects(s->oo));
s->cpu_partial_slabs = nr_slabs;
}
+
+static inline unsigned int slub_get_cpu_partial(struct kmem_cache *s)
+{
+ return s->cpu_partial_slabs;
+}
#else
static inline void
slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects)
{
}
+
+static inline unsigned int slub_get_cpu_partial(struct kmem_cache *s)
+{
+ return 0;
+}
#endif /* CONFIG_SLUB_CPU_PARTIAL */

/*
--
2.27.0


2024-03-31 03:05:22

by Song, Xiongwei

[permalink] [raw]
Subject: [PATCH 4/4] mm/slub: don't read slab->cpu_partial_slabs directly

From: Xiongwei Song <[email protected]>

We can use slub_get_cpu_partial() to read cpu_partial_slabs.

Signed-off-by: Xiongwei Song <[email protected]>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index ec91c7435d4e..47ea06d6feae 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2966,7 +2966,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain)
oldslab = this_cpu_read(s->cpu_slab->partial);

if (oldslab) {
- if (drain && oldslab->slabs >= s->cpu_partial_slabs) {
+ if (drain && oldslab->slabs >= slub_get_cpu_partial(s)) {
/*
* Partial array is full. Move the existing set to the
* per node partial list. Postpone the actual unfreezing
--
2.27.0


2024-04-02 09:42:09

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/slub: simplify get_partial_node()

On 3/31/24 4:19 AM, [email protected] wrote:
> From: Xiongwei Song <[email protected]>
>
> The break conditions can be more readable and simple.
>
> We can check if we need to fill cpu partial after getting the first
> partial slab. If kmem_cache_has_cpu_partial() returns true, we fill
> cpu partial from next iteration, or break up the loop.
>
> Then we can remove the preprocessor condition of
> CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make
> compiler silent.
>
> Signed-off-by: Xiongwei Song <[email protected]>
> ---
> mm/slub.c | 22 ++++++++++++----------
> 1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 590cc953895d..ec91c7435d4e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> if (!partial) {
> partial = slab;
> stat(s, ALLOC_FROM_PARTIAL);
> - } else {
> - put_cpu_partial(s, slab, 0);
> - stat(s, CPU_PARTIAL_NODE);
> - partial_slabs++;
> +
> + /* Fill cpu partial if needed from next iteration, or break */
> + if (kmem_cache_has_cpu_partial(s))

That kinda puts back the check removed in patch 1, although only in the
first iteration. Still not ideal.

> + continue;
> + else
> + break;
> }
> -#ifdef CONFIG_SLUB_CPU_PARTIAL
> - if (partial_slabs > s->cpu_partial_slabs / 2)
> - break;
> -#else
> - break;
> -#endif

I'd suggest intead of the changes done in this patch, only change this part
above to:

if ((slub_get_cpu_partial(s) == 0) ||
(partial_slabs > slub_get_cpu_partial(s) / 2))
break;

That gets rid of the #ifdef and also fixes a weird corner case that if we
set cpu_partial_slabs to 0 from sysfs, we still allocate at least one here.

It could be tempting to use >= instead of > to achieve the same effect but
that would have unintended performance effects that would best be evaluated
separately.

>
> + put_cpu_partial(s, slab, 0);
> + stat(s, CPU_PARTIAL_NODE);
> + partial_slabs++;
> +
> + if (partial_slabs > slub_get_cpu_partial(s) / 2)
> + break;
> }
> spin_unlock_irqrestore(&n->list_lock, flags);
> return partial;


2024-04-02 09:42:56

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm/slub: don't read slab->cpu_partial_slabs directly

On 3/31/24 4:19 AM, [email protected] wrote:
> From: Xiongwei Song <[email protected]>
>
> We can use slub_get_cpu_partial() to read cpu_partial_slabs.

This code is under the #ifdef so it's not necessary to use the wrapper, only
makes it harder to read imho.

> Signed-off-by: Xiongwei Song <[email protected]>
> ---
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index ec91c7435d4e..47ea06d6feae 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2966,7 +2966,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain)
> oldslab = this_cpu_read(s->cpu_slab->partial);
>
> if (oldslab) {
> - if (drain && oldslab->slabs >= s->cpu_partial_slabs) {
> + if (drain && oldslab->slabs >= slub_get_cpu_partial(s)) {
> /*
> * Partial array is full. Move the existing set to the
> * per node partial list. Postpone the actual unfreezing


2024-04-02 09:45:56

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/slub: remove the check of !kmem_cache_has_cpu_partial()

On 3/31/24 4:19 AM, [email protected] wrote:
> From: Xiongwei Song <[email protected]>
>
> The check of !kmem_cache_has_cpu_partial(s) with
> CONFIG_SLUB_CPU_PARTIAL enabled here is always false. We have known the
> result by calling kmem_cacke_debug(). Here we can remove it.

Could we be more obvious. We have already checked kmem_cache_debug() earlier
and if it was true, the we either continued or broke from the loop so we
can't reach this code in that case and don't need to check
kmem_cache_debug() as part of kmem_cache_has_cpu_partial() again.

> Signed-off-by: Xiongwei Song <[email protected]>
> ---
> mm/slub.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 1bb2a93cf7b6..059922044a4f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2610,8 +2610,7 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> partial_slabs++;
> }
> #ifdef CONFIG_SLUB_CPU_PARTIAL
> - if (!kmem_cache_has_cpu_partial(s)
> - || partial_slabs > s->cpu_partial_slabs / 2)
> + if (partial_slabs > s->cpu_partial_slabs / 2)
> break;
> #else
> break;


2024-04-03 00:11:35

by Song, Xiongwei

[permalink] [raw]
Subject: RE: [PATCH 1/4] mm/slub: remove the check of !kmem_cache_has_cpu_partial()

>
> On 3/31/24 4:19 AM, [email protected] wrote:
> > From: Xiongwei Song <[email protected]>
> >
> > The check of !kmem_cache_has_cpu_partial(s) with
> > CONFIG_SLUB_CPU_PARTIAL enabled here is always false. We have known the
> > result by calling kmem_cacke_debug(). Here we can remove it.
>
> Could we be more obvious. We have already checked kmem_cache_debug() earlier
> and if it was true, the we either continued or broke from the loop so we
> can't reach this code in that case and don't need to check
> kmem_cache_debug() as part of kmem_cache_has_cpu_partial() again.

Ok, looks better. Will update.

Thanks,
Xiongwei

>
> > Signed-off-by: Xiongwei Song <[email protected]>
> > ---
> > mm/slub.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 1bb2a93cf7b6..059922044a4f 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2610,8 +2610,7 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> > partial_slabs++;
> > }
> > #ifdef CONFIG_SLUB_CPU_PARTIAL
> > - if (!kmem_cache_has_cpu_partial(s)
> > - || partial_slabs > s->cpu_partial_slabs / 2)
> > + if (partial_slabs > s->cpu_partial_slabs / 2)
> > break;
> > #else
> > break;

2024-04-03 00:12:01

by Song, Xiongwei

[permalink] [raw]
Subject: RE: [PATCH 4/4] mm/slub: don't read slab->cpu_partial_slabs directly

> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content
> is safe.
>
> On 3/31/24 4:19 AM, [email protected] wrote:
> > From: Xiongwei Song <[email protected]>
> >
> > We can use slub_get_cpu_partial() to read cpu_partial_slabs.
>
> This code is under the #ifdef so it's not necessary to use the wrapper, only
> makes it harder to read imho.

Ok, got it. Will drop this one.

Thanks.

>
> > Signed-off-by: Xiongwei Song <[email protected]>
> > ---
> > mm/slub.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index ec91c7435d4e..47ea06d6feae 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2966,7 +2966,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct slab
> *slab, int drain)
> > oldslab = this_cpu_read(s->cpu_slab->partial);
> >
> > if (oldslab) {
> > - if (drain && oldslab->slabs >= s->cpu_partial_slabs) {
> > + if (drain && oldslab->slabs >= slub_get_cpu_partial(s)) {
> > /*
> > * Partial array is full. Move the existing set to the
> > * per node partial list. Postpone the actual unfreezing
>

2024-04-03 00:48:26

by Song, Xiongwei

[permalink] [raw]
Subject: RE: [PATCH 3/4] mm/slub: simplify get_partial_node()

>
> On 3/31/24 4:19 AM, [email protected] wrote:
> > From: Xiongwei Song <[email protected]>
> >
> > The break conditions can be more readable and simple.
> >
> > We can check if we need to fill cpu partial after getting the first
> > partial slab. If kmem_cache_has_cpu_partial() returns true, we fill
> > cpu partial from next iteration, or break up the loop.
> >
> > Then we can remove the preprocessor condition of
> > CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make
> > compiler silent.
> >
> > Signed-off-by: Xiongwei Song <[email protected]>
> > ---
> > mm/slub.c | 22 ++++++++++++----------
> > 1 file changed, 12 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 590cc953895d..ec91c7435d4e 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> > if (!partial) {
> > partial = slab;
> > stat(s, ALLOC_FROM_PARTIAL);
> > - } else {
> > - put_cpu_partial(s, slab, 0);
> > - stat(s, CPU_PARTIAL_NODE);
> > - partial_slabs++;
> > +
> > + /* Fill cpu partial if needed from next iteration, or break */
> > + if (kmem_cache_has_cpu_partial(s))
>
> That kinda puts back the check removed in patch 1, although only in the
> first iteration. Still not ideal.
>
> > + continue;
> > + else
> > + break;
> > }
> > -#ifdef CONFIG_SLUB_CPU_PARTIAL
> > - if (partial_slabs > s->cpu_partial_slabs / 2)
> > - break;
> > -#else
> > - break;
> > -#endif
>
> I'd suggest intead of the changes done in this patch, only change this part
> above to:
>
> if ((slub_get_cpu_partial(s) == 0) ||
> (partial_slabs > slub_get_cpu_partial(s) / 2))
> break;
>
> That gets rid of the #ifdef and also fixes a weird corner case that if we
> set cpu_partial_slabs to 0 from sysfs, we still allocate at least one here.

Oh, yes. Will update.

>
> It could be tempting to use >= instead of > to achieve the same effect but
> that would have unintended performance effects that would best be evaluated
> separately.

I can run a test to measure Amean changes. But in terms of x86 assembly, there
should not be extra instructions with ">=".

Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">".
No more instructions involved. So there should not be performance effects on x86.

Thanks,
Xiongwei

>
> >
> > + put_cpu_partial(s, slab, 0);
> > + stat(s, CPU_PARTIAL_NODE);
> > + partial_slabs++;
> > +
> > + if (partial_slabs > slub_get_cpu_partial(s) / 2)
> > + break;
> > }
> > spin_unlock_irqrestore(&n->list_lock, flags);
> > return partial;

2024-04-03 07:25:45

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/slub: simplify get_partial_node()

On 4/3/24 2:37 AM, Song, Xiongwei wrote:
>>
>>
>> It could be tempting to use >= instead of > to achieve the same effect but
>> that would have unintended performance effects that would best be evaluated
>> separately.
>
> I can run a test to measure Amean changes. But in terms of x86 assembly, there
> should not be extra instructions with ">=".
>
> Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">".
> No more instructions involved. So there should not be performance effects on x86.

Right, I didn't mean the code of the test, but how the difference of the
comparison affects how many cpu partial slabs would be put on the cpu
partial list here.

> Thanks,
> Xiongwei
>
>>
>> >
>> > + put_cpu_partial(s, slab, 0);
>> > + stat(s, CPU_PARTIAL_NODE);
>> > + partial_slabs++;
>> > +
>> > + if (partial_slabs > slub_get_cpu_partial(s) / 2)
>> > + break;
>> > }
>> > spin_unlock_irqrestore(&n->list_lock, flags);
>> > return partial;
>


2024-04-03 11:16:24

by Song, Xiongwei

[permalink] [raw]
Subject: RE: [PATCH 3/4] mm/slub: simplify get_partial_node()

>
> On 4/3/24 2:37 AM, Song, Xiongwei wrote:
> >>
> >>
> >> It could be tempting to use >= instead of > to achieve the same effect but
> >> that would have unintended performance effects that would best be evaluated
> >> separately.
> >
> > I can run a test to measure Amean changes. But in terms of x86 assembly, there
> > should not be extra instructions with ">=".
> >
> > Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">".
> > No more instructions involved. So there should not be performance effects on x86.
>
> Right, I didn't mean the code of the test, but how the difference of the
> comparison affects how many cpu partial slabs would be put on the cpu
> partial list here.

Got it. Will do measurement for it.

Thanks,
Xiongwei