LinuxLists.cc - [PATCH v3 0/4] mm/slub: Fix count

2021-03-09 15:29:49

Subject: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

count_partial() can hold n->list_lock spinlock for quite long, which
makes much trouble to the system. This series eliminate this problem.

v1->v2:
- Improved changelog and variable naming for PATCH 1~2.
- PATCH3 adds per-cpu counter to avoid performance regression
in concurrent __slab_free().

v2->v3:
- Changed "page->inuse" to the safe "new.inuse", etc.
- Used CONFIG_SLUB_DEBUG and CONFIG_SYSFS condition for new counters.
- atomic_long_t -> unsigned long

[Testing]
There seems might be a little performance impact under extreme
__slab_free() concurrent calls according to my tests.

On my 32-cpu 2-socket physical machine:
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

1) perf stat --null --repeat 10 -- hackbench 20 thread 20000

== original, no patched
Performance counter stats for 'hackbench 20 thread 20000' (10 runs):

24.536050899 seconds time elapsed ( +- 0.24% )

Performance counter stats for 'hackbench 20 thread 20000' (10 runs):

24.588049142 seconds time elapsed ( +- 0.35% )

== patched with patch1~4
Performance counter stats for 'hackbench 20 thread 20000' (10 runs):

24.670892273 seconds time elapsed ( +- 0.29% )

Performance counter stats for 'hackbench 20 thread 20000' (10 runs):

24.746755689 seconds time elapsed ( +- 0.21% )

2) perf stat --null --repeat 10 -- hackbench 32 thread 20000

== original, no patched
Performance counter stats for 'hackbench 32 thread 20000' (10 runs):

39.784911855 seconds time elapsed ( +- 0.14% )

Performance counter stats for 'hackbench 32 thread 20000' (10 runs):

39.868687608 seconds time elapsed ( +- 0.19% )

== patched with patch1~4
Performance counter stats for 'hackbench 32 thread 20000' (10 runs):

39.681273015 seconds time elapsed ( +- 0.21% )

Performance counter stats for 'hackbench 32 thread 20000' (10 runs):

39.681238459 seconds time elapsed ( +- 0.09% )

Xunlei Pang (4):
mm/slub: Introduce two counters for partial objects
mm/slub: Get rid of count_partial()
percpu: Export per_cpu_sum()
mm/slub: Use percpu partial free counter

include/linux/percpu-defs.h | 10 ++++
kernel/locking/percpu-rwsem.c | 10 ----
mm/slab.h | 4 ++
mm/slub.c | 120 +++++++++++++++++++++++++++++-------------
4 files changed, 97 insertions(+), 47 deletions(-)

--
1.8.3.1

2021-03-15 21:06:55

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On 3/9/21 4:25 PM, Xunlei Pang wrote:
> count_partial() can hold n->list_lock spinlock for quite long, which
> makes much trouble to the system. This series eliminate this problem.

Before I check the details, I have two high-level comments:

- patch 1 introduces some counting scheme that patch 4 then changes, could we do
this in one step to avoid the churn?

- the series addresses the concern that spinlock is being held, but doesn't
address the fact that counting partial per-node slabs is not nearly enough if we
want accurate <active_objs> in /proc/slabinfo because there are also percpu
slabs and per-cpu partial slabs, where we don't track the free objects at all.
So after this series while the readers of /proc/slabinfo won't block the
spinlock, they will get the same garbage data as before. So Christoph is not
wrong to say that we can just report active_objs == num_objs and it won't
actually break any ABI.
At the same time somebody might actually want accurate object statistics at the
expense of peak performance, and it would be nice to give them such option in
SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS,
although that option provides many additional tuning stats, with additional
overhead.
So my proposal would be a new config for "accurate active objects" (or just tie
it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in
patch 4 to all alloc/free, so that it includes percpu slabs. Without this config
enabled, let's just report active_objs == num_objs.

Vlastimil

> v1->v2:
> - Improved changelog and variable naming for PATCH 1~2.
> - PATCH3 adds per-cpu counter to avoid performance regression
> in concurrent __slab_free().
>
> v2->v3:
> - Changed "page->inuse" to the safe "new.inuse", etc.
> - Used CONFIG_SLUB_DEBUG and CONFIG_SYSFS condition for new counters.
> - atomic_long_t -> unsigned long
>
> [Testing]
> There seems might be a little performance impact under extreme
> __slab_free() concurrent calls according to my tests.
>
> On my 32-cpu 2-socket physical machine:
> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>
> 1) perf stat --null --repeat 10 -- hackbench 20 thread 20000
>
> == original, no patched
> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
>
> 24.536050899 seconds time elapsed ( +- 0.24% )
>
>
> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
>
> 24.588049142 seconds time elapsed ( +- 0.35% )
>
>
> == patched with patch1~4
> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
>
> 24.670892273 seconds time elapsed ( +- 0.29% )
>
>
> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
>
> 24.746755689 seconds time elapsed ( +- 0.21% )
>
>
> 2) perf stat --null --repeat 10 -- hackbench 32 thread 20000
>
> == original, no patched
> Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
>
> 39.784911855 seconds time elapsed ( +- 0.14% )
>
> Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
>
> 39.868687608 seconds time elapsed ( +- 0.19% )
>
> == patched with patch1~4
> Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
>
> 39.681273015 seconds time elapsed ( +- 0.21% )
>
> Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
>
> 39.681238459 seconds time elapsed ( +- 0.09% )
>
>
> Xunlei Pang (4):
> mm/slub: Introduce two counters for partial objects
> mm/slub: Get rid of count_partial()
> percpu: Export per_cpu_sum()
> mm/slub: Use percpu partial free counter
>
> include/linux/percpu-defs.h | 10 ++++
> kernel/locking/percpu-rwsem.c | 10 ----
> mm/slab.h | 4 ++
> mm/slub.c | 120 +++++++++++++++++++++++++++++-------------
> 4 files changed, 97 insertions(+), 47 deletions(-)
>

2021-03-15 23:53:57

by Roman Gushchin

[permalink] [raw]

Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On Mon, Mar 15, 2021 at 07:49:57PM +0100, Vlastimil Babka wrote:
> On 3/9/21 4:25 PM, Xunlei Pang wrote:
> > count_partial() can hold n->list_lock spinlock for quite long, which
> > makes much trouble to the system. This series eliminate this problem.
>
> Before I check the details, I have two high-level comments:
>
> - patch 1 introduces some counting scheme that patch 4 then changes, could we do
> this in one step to avoid the churn?
>
> - the series addresses the concern that spinlock is being held, but doesn't
> address the fact that counting partial per-node slabs is not nearly enough if we
> want accurate <active_objs> in /proc/slabinfo because there are also percpu
> slabs and per-cpu partial slabs, where we don't track the free objects at all.
> So after this series while the readers of /proc/slabinfo won't block the
> spinlock, they will get the same garbage data as before. So Christoph is not
> wrong to say that we can just report active_objs == num_objs and it won't
> actually break any ABI.
> At the same time somebody might actually want accurate object statistics at the
> expense of peak performance, and it would be nice to give them such option in
> SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS,
> although that option provides many additional tuning stats, with additional
> overhead.
> So my proposal would be a new config for "accurate active objects" (or just tie
> it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in
> patch 4 to all alloc/free, so that it includes percpu slabs. Without this config
> enabled, let's just report active_objs == num_objs.

It sounds really good to me! The only thing, I'd avoid introducing a new option
and use CONFIG_SLUB_STATS instead.

It seems like CONFIG_SLUB_DEBUG is a more popular option than CONFIG_SLUB_STATS.
CONFIG_SLUB_DEBUG is enabled on my Fedora workstation, CONFIG_SLUB_STATS is off.
I doubt an average user needs this data, so I'd go with CONFIG_SLUB_STATS.

Thanks!

>
> Vlastimil
>
> > v1->v2:
> > - Improved changelog and variable naming for PATCH 1~2.
> > - PATCH3 adds per-cpu counter to avoid performance regression
> > in concurrent __slab_free().
> >
> > v2->v3:
> > - Changed "page->inuse" to the safe "new.inuse", etc.
> > - Used CONFIG_SLUB_DEBUG and CONFIG_SYSFS condition for new counters.
> > - atomic_long_t -> unsigned long
> >
> > [Testing]
> > There seems might be a little performance impact under extreme
> > __slab_free() concurrent calls according to my tests.
> >
> > On my 32-cpu 2-socket physical machine:
> > Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
> >
> > 1) perf stat --null --repeat 10 -- hackbench 20 thread 20000
> >
> > == original, no patched
> > Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> >
> > 24.536050899 seconds time elapsed ( +- 0.24% )
> >
> >
> > Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> >
> > 24.588049142 seconds time elapsed ( +- 0.35% )
> >
> >
> > == patched with patch1~4
> > Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> >
> > 24.670892273 seconds time elapsed ( +- 0.29% )
> >
> >
> > Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> >
> > 24.746755689 seconds time elapsed ( +- 0.21% )
> >
> >
> > 2) perf stat --null --repeat 10 -- hackbench 32 thread 20000
> >
> > == original, no patched
> > Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> >
> > 39.784911855 seconds time elapsed ( +- 0.14% )
> >
> > Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> >
> > 39.868687608 seconds time elapsed ( +- 0.19% )
> >
> > == patched with patch1~4
> > Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> >
> > 39.681273015 seconds time elapsed ( +- 0.21% )
> >
> > Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> >
> > 39.681238459 seconds time elapsed ( +- 0.09% )
> >
> >
> > Xunlei Pang (4):
> > mm/slub: Introduce two counters for partial objects
> > mm/slub: Get rid of count_partial()
> > percpu: Export per_cpu_sum()
> > mm/slub: Use percpu partial free counter
> >
> > include/linux/percpu-defs.h | 10 ++++
> > kernel/locking/percpu-rwsem.c | 10 ----
> > mm/slab.h | 4 ++
> > mm/slub.c | 120 +++++++++++++++++++++++++++++-------------
> > 4 files changed, 97 insertions(+), 47 deletions(-)
> >
>

2021-03-16 02:32:47

by Yang Shi

[permalink] [raw]

Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On Mon, Mar 15, 2021 at 12:15 PM Roman Gushchin <[email protected]> wrote:
>
>
> On Mon, Mar 15, 2021 at 07:49:57PM +0100, Vlastimil Babka wrote:
> > On 3/9/21 4:25 PM, Xunlei Pang wrote:
> > > count_partial() can hold n->list_lock spinlock for quite long, which
> > > makes much trouble to the system. This series eliminate this problem.
> >
> > Before I check the details, I have two high-level comments:
> >
> > - patch 1 introduces some counting scheme that patch 4 then changes, could we do
> > this in one step to avoid the churn?
> >
> > - the series addresses the concern that spinlock is being held, but doesn't
> > address the fact that counting partial per-node slabs is not nearly enough if we
> > want accurate <active_objs> in /proc/slabinfo because there are also percpu
> > slabs and per-cpu partial slabs, where we don't track the free objects at all.
> > So after this series while the readers of /proc/slabinfo won't block the
> > spinlock, they will get the same garbage data as before. So Christoph is not
> > wrong to say that we can just report active_objs == num_objs and it won't
> > actually break any ABI.
> > At the same time somebody might actually want accurate object statistics at the
> > expense of peak performance, and it would be nice to give them such option in
> > SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS,
> > although that option provides many additional tuning stats, with additional
> > overhead.
> > So my proposal would be a new config for "accurate active objects" (or just tie
> > it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in
> > patch 4 to all alloc/free, so that it includes percpu slabs. Without this config
> > enabled, let's just report active_objs == num_objs.
>
> It sounds really good to me! The only thing, I'd avoid introducing a new option
> and use CONFIG_SLUB_STATS instead.
>
> It seems like CONFIG_SLUB_DEBUG is a more popular option than CONFIG_SLUB_STATS.
> CONFIG_SLUB_DEBUG is enabled on my Fedora workstation, CONFIG_SLUB_STATS is off.
> I doubt an average user needs this data, so I'd go with CONFIG_SLUB_STATS.

I think CONFIG_SLUB_DEBUG is enabled by default on most distros since
it is supposed not incur too much overhead unless specific debug (i.e.
red_zone) is turned on on demand.

>
> Thanks!
>
> >
> > Vlastimil
> >
> > > v1->v2:
> > > - Improved changelog and variable naming for PATCH 1~2.
> > > - PATCH3 adds per-cpu counter to avoid performance regression
> > > in concurrent __slab_free().
> > >
> > > v2->v3:
> > > - Changed "page->inuse" to the safe "new.inuse", etc.
> > > - Used CONFIG_SLUB_DEBUG and CONFIG_SYSFS condition for new counters.
> > > - atomic_long_t -> unsigned long
> > >
> > > [Testing]
> > > There seems might be a little performance impact under extreme
> > > __slab_free() concurrent calls according to my tests.
> > >
> > > On my 32-cpu 2-socket physical machine:
> > > Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
> > >
> > > 1) perf stat --null --repeat 10 -- hackbench 20 thread 20000
> > >
> > > == original, no patched
> > > Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> > >
> > > 24.536050899 seconds time elapsed ( +- 0.24% )
> > >
> > >
> > > Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> > >
> > > 24.588049142 seconds time elapsed ( +- 0.35% )
> > >
> > >
> > > == patched with patch1~4
> > > Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> > >
> > > 24.670892273 seconds time elapsed ( +- 0.29% )
> > >
> > >
> > > Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> > >
> > > 24.746755689 seconds time elapsed ( +- 0.21% )
> > >
> > >
> > > 2) perf stat --null --repeat 10 -- hackbench 32 thread 20000
> > >
> > > == original, no patched
> > > Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> > >
> > > 39.784911855 seconds time elapsed ( +- 0.14% )
> > >
> > > Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> > >
> > > 39.868687608 seconds time elapsed ( +- 0.19% )
> > >
> > > == patched with patch1~4
> > > Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> > >
> > > 39.681273015 seconds time elapsed ( +- 0.21% )
> > >
> > > Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> > >
> > > 39.681238459 seconds time elapsed ( +- 0.09% )
> > >
> > >
> > > Xunlei Pang (4):
> > > mm/slub: Introduce two counters for partial objects
> > > mm/slub: Get rid of count_partial()
> > > percpu: Export per_cpu_sum()
> > > mm/slub: Use percpu partial free counter
> > >
> > > include/linux/percpu-defs.h | 10 ++++
> > > kernel/locking/percpu-rwsem.c | 10 ----
> > > mm/slab.h | 4 ++
> > > mm/slub.c | 120 +++++++++++++++++++++++++++++-------------
> > > 4 files changed, 97 insertions(+), 47 deletions(-)
> > >
> >
>

2021-03-16 10:24:55

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On Mon, 15 Mar 2021, Yang Shi wrote:

> > It seems like CONFIG_SLUB_DEBUG is a more popular option than CONFIG_SLUB_STATS.
> > CONFIG_SLUB_DEBUG is enabled on my Fedora workstation, CONFIG_SLUB_STATS is off.
> > I doubt an average user needs this data, so I'd go with CONFIG_SLUB_STATS.
>
> I think CONFIG_SLUB_DEBUG is enabled by default on most distros since
> it is supposed not incur too much overhead unless specific debug (i.e.
> red_zone) is turned on on demand.

Correct. CONFIG_SLUB_DEBUG includes the code so the debugging can be
enabled on Distro kernels with a kernel command line option. So you dont
have to recompile the kernel to find weird memory corruption issues from
strange device drivers.

Somehow my email address dropped off this thread.

2021-03-16 10:35:33

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On 3/16/21 11:07 AM, Christoph Lameter wrote:
> On Mon, 15 Mar 2021, Yang Shi wrote:
>
>> > It seems like CONFIG_SLUB_DEBUG is a more popular option than CONFIG_SLUB_STATS.
>> > CONFIG_SLUB_DEBUG is enabled on my Fedora workstation, CONFIG_SLUB_STATS is off.
>> > I doubt an average user needs this data, so I'd go with CONFIG_SLUB_STATS.

Hm I can imagine that (after due performance testing) we would consider having
accurate slabinfo in our distro kernel, just as we have CONFIG_SLUB_DEBUG but
not the full stats.

>> I think CONFIG_SLUB_DEBUG is enabled by default on most distros since
>> it is supposed not incur too much overhead unless specific debug (i.e.
>> red_zone) is turned on on demand.
>
> Correct. CONFIG_SLUB_DEBUG includes the code so the debugging can be
> enabled on Distro kernels with a kernel command line option. So you dont
> have to recompile the kernel to find weird memory corruption issues from
> strange device drivers.
>
> Somehow my email address dropped off this thread.

Hm I see [email protected] in all e-mails of the thread, but now you replaced it with
[email protected] ?

2021-03-16 15:21:54

by Xunlei Pang

[permalink] [raw]

Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On 3/16/21 2:49 AM, Vlastimil Babka wrote:
> On 3/9/21 4:25 PM, Xunlei Pang wrote:
>> count_partial() can hold n->list_lock spinlock for quite long, which
>> makes much trouble to the system. This series eliminate this problem.
>
> Before I check the details, I have two high-level comments:
>
> - patch 1 introduces some counting scheme that patch 4 then changes, could we do
> this in one step to avoid the churn?
>
> - the series addresses the concern that spinlock is being held, but doesn't
> address the fact that counting partial per-node slabs is not nearly enough if we
> want accurate <active_objs> in /proc/slabinfo because there are also percpu
> slabs and per-cpu partial slabs, where we don't track the free objects at all.
> So after this series while the readers of /proc/slabinfo won't block the
> spinlock, they will get the same garbage data as before. So Christoph is not
> wrong to say that we can just report active_objs == num_objs and it won't
> actually break any ABI.

If maintainers don't mind this inaccuracy which I also doubt its
importance, then it becomes easy. For fear that some people who really
cares, introducing an extra config(default-off) for it would be a good
option.

> At the same time somebody might actually want accurate object statistics at the
> expense of peak performance, and it would be nice to give them such option in
> SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS,
> although that option provides many additional tuning stats, with additional
> overhead.
> So my proposal would be a new config for "accurate active objects" (or just tie
> it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in
> patch 4 to all alloc/free, so that it includes percpu slabs. Without this config
> enabled, let's just report active_objs == num_objs.
For percpu slabs, the numbers can be retrieved from the existing
slub_percpu_partial()->pobjects, looks no need extra work.

>
> Vlastimil
>
>> v1->v2:
>> - Improved changelog and variable naming for PATCH 1~2.
>> - PATCH3 adds per-cpu counter to avoid performance regression
>> in concurrent __slab_free().
>>
>> v2->v3:
>> - Changed "page->inuse" to the safe "new.inuse", etc.
>> - Used CONFIG_SLUB_DEBUG and CONFIG_SYSFS condition for new counters.
>> - atomic_long_t -> unsigned long
>>
>> [Testing]
>> There seems might be a little performance impact under extreme
>> __slab_free() concurrent calls according to my tests.
>>
>> On my 32-cpu 2-socket physical machine:
>> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>
>> 1) perf stat --null --repeat 10 -- hackbench 20 thread 20000
>>
>> == original, no patched
>> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
>>
>> 24.536050899 seconds time elapsed ( +- 0.24% )
>>
>>
>> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
>>
>> 24.588049142 seconds time elapsed ( +- 0.35% )
>>
>>
>> == patched with patch1~4
>> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
>>
>> 24.670892273 seconds time elapsed ( +- 0.29% )
>>
>>
>> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
>>
>> 24.746755689 seconds time elapsed ( +- 0.21% )
>>
>>
>> 2) perf stat --null --repeat 10 -- hackbench 32 thread 20000
>>
>> == original, no patched
>> Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
>>
>> 39.784911855 seconds time elapsed ( +- 0.14% )
>>
>> Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
>>
>> 39.868687608 seconds time elapsed ( +- 0.19% )
>>
>> == patched with patch1~4
>> Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
>>
>> 39.681273015 seconds time elapsed ( +- 0.21% )
>>
>> Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
>>
>> 39.681238459 seconds time elapsed ( +- 0.09% )
>>
>>
>> Xunlei Pang (4):
>> mm/slub: Introduce two counters for partial objects
>> mm/slub: Get rid of count_partial()
>> percpu: Export per_cpu_sum()
>> mm/slub: Use percpu partial free counter
>>
>> include/linux/percpu-defs.h | 10 ++++
>> kernel/locking/percpu-rwsem.c | 10 ----
>> mm/slab.h | 4 ++
>> mm/slub.c | 120 +++++++++++++++++++++++++++++-------------
>> 4 files changed, 97 insertions(+), 47 deletions(-)
>>

2021-03-16 15:30:00

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On 3/16/21 11:42 AM, Xunlei Pang wrote:
> On 3/16/21 2:49 AM, Vlastimil Babka wrote:
>> On 3/9/21 4:25 PM, Xunlei Pang wrote:
>>> count_partial() can hold n->list_lock spinlock for quite long, which
>>> makes much trouble to the system. This series eliminate this problem.
>>
>> Before I check the details, I have two high-level comments:
>>
>> - patch 1 introduces some counting scheme that patch 4 then changes, could we do
>> this in one step to avoid the churn?
>>
>> - the series addresses the concern that spinlock is being held, but doesn't
>> address the fact that counting partial per-node slabs is not nearly enough if we
>> want accurate <active_objs> in /proc/slabinfo because there are also percpu
>> slabs and per-cpu partial slabs, where we don't track the free objects at all.
>> So after this series while the readers of /proc/slabinfo won't block the
>> spinlock, they will get the same garbage data as before. So Christoph is not
>> wrong to say that we can just report active_objs == num_objs and it won't
>> actually break any ABI.
>
> If maintainers don't mind this inaccuracy which I also doubt its
> importance, then it becomes easy. For fear that some people who really
> cares, introducing an extra config(default-off) for it would be a good
> option.

Great.

>> At the same time somebody might actually want accurate object statistics at the
>> expense of peak performance, and it would be nice to give them such option in
>> SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS,
>> although that option provides many additional tuning stats, with additional
>> overhead.
>> So my proposal would be a new config for "accurate active objects" (or just tie
>> it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in
>> patch 4 to all alloc/free, so that it includes percpu slabs. Without this config
>> enabled, let's just report active_objs == num_objs.
> For percpu slabs, the numbers can be retrieved from the existing
> slub_percpu_partial()->pobjects, looks no need extra work.

Hm, unfortunately it's not that simple, the number there is a snapshot that can
become wildly inacurate afterwards.

2021-03-16 15:40:54

by Xunlei Pang

[permalink] [raw]

Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On 3/16/21 7:02 PM, Vlastimil Babka wrote:
> On 3/16/21 11:42 AM, Xunlei Pang wrote:
>> On 3/16/21 2:49 AM, Vlastimil Babka wrote:
>>> On 3/9/21 4:25 PM, Xunlei Pang wrote:
>>>> count_partial() can hold n->list_lock spinlock for quite long, which
>>>> makes much trouble to the system. This series eliminate this problem.
>>>
>>> Before I check the details, I have two high-level comments:
>>>
>>> - patch 1 introduces some counting scheme that patch 4 then changes, could we do
>>> this in one step to avoid the churn?
>>>
>>> - the series addresses the concern that spinlock is being held, but doesn't
>>> address the fact that counting partial per-node slabs is not nearly enough if we
>>> want accurate <active_objs> in /proc/slabinfo because there are also percpu
>>> slabs and per-cpu partial slabs, where we don't track the free objects at all.
>>> So after this series while the readers of /proc/slabinfo won't block the
>>> spinlock, they will get the same garbage data as before. So Christoph is not
>>> wrong to say that we can just report active_objs == num_objs and it won't
>>> actually break any ABI.
>>
>> If maintainers don't mind this inaccuracy which I also doubt its
>> importance, then it becomes easy. For fear that some people who really
>> cares, introducing an extra config(default-off) for it would be a good
>> option.
>
> Great.
>
>>> At the same time somebody might actually want accurate object statistics at the
>>> expense of peak performance, and it would be nice to give them such option in
>>> SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS,
>>> although that option provides many additional tuning stats, with additional
>>> overhead.
>>> So my proposal would be a new config for "accurate active objects" (or just tie
>>> it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in
>>> patch 4 to all alloc/free, so that it includes percpu slabs. Without this config
>>> enabled, let's just report active_objs == num_objs.
>> For percpu slabs, the numbers can be retrieved from the existing
>> slub_percpu_partial()->pobjects, looks no need extra work.
>
> Hm, unfortunately it's not that simple, the number there is a snapshot that can
> become wildly inacurate afterwards.
>

It's hard to make it absoultely accurate using percpu, the data can
change during you iterating all the cpus and total_objects, I can't
imagine its real-world usage, not to mention the percpu freelist cache.
I think sysfs slabs_cpu_partial should work enough for common debug purpose.