2023-04-14 16:31:15

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes

Hi!

The cpu_dying_mask is not only undocumented but also to some extent a
misnomer. It's purpose is to capture the last direction of a cpu_up() or
cpu_down() operation taking eventual rollback operations into account.

cpu_dying mask is not really useful for general consumption. The
cpu_dying_mask bits are sticky even after cpu_up() or cpu_down() completes.

A recent fix to plug a race in the per CPU counter code picked
cpu_dying_mask to cure it. Unfortunately this does not work as the author
probably expected and the behaviour of cpu_dying_mask is not easy to change
without breaking the only other and initial user, the scheduler.

This series addresses this by:

1) Reworking the per CPU counter hotplug mechanism so the race is fully
plugged without using cpu_dying_mask

2) Replacing the cpu_dying_mask logic with hotplug core internal state
which is exposed to the scheduler with a properly documented
function.

The series is also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git smp/dying_mask

Thanks

tglx
---
include/linux/cpuhotplug.h | 2 -
include/linux/cpumask.h | 21 ----------------
kernel/cpu.c | 45 +++++++++++++++++++++++++++++------
kernel/sched/core.c | 4 +--
kernel/smpboot.h | 2 +
lib/percpu_counter.c | 57 +++++++++++++++++++--------------------------
6 files changed, 67 insertions(+), 64 deletions(-)


2023-05-03 12:06:33

by Valentin Schneider

[permalink] [raw]
Subject: Re: [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes

On 14/04/23 18:30, Thomas Gleixner wrote:
> Hi!
>
> The cpu_dying_mask is not only undocumented but also to some extent a
> misnomer. It's purpose is to capture the last direction of a cpu_up() or
> cpu_down() operation taking eventual rollback operations into account.
>
> cpu_dying mask is not really useful for general consumption. The
> cpu_dying_mask bits are sticky even after cpu_up() or cpu_down() completes.
>
> A recent fix to plug a race in the per CPU counter code picked
> cpu_dying_mask to cure it. Unfortunately this does not work as the author
> probably expected and the behaviour of cpu_dying_mask is not easy to change
> without breaking the only other and initial user, the scheduler.
>
> This series addresses this by:
>
> 1) Reworking the per CPU counter hotplug mechanism so the race is fully
> plugged without using cpu_dying_mask
>
> 2) Replacing the cpu_dying_mask logic with hotplug core internal state
> which is exposed to the scheduler with a properly documented
> function.
>

For patches 2-3:

Reviewed-by: Valentin Schneider <[email protected]>

2023-12-30 22:39:32

by Dennis Zhou

[permalink] [raw]
Subject: Re: [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes

Hello,

On Fri, Apr 14, 2023 at 06:30:42PM +0200, Thomas Gleixner wrote:
> Hi!
>
> The cpu_dying_mask is not only undocumented but also to some extent a
> misnomer. It's purpose is to capture the last direction of a cpu_up() or
> cpu_down() operation taking eventual rollback operations into account.
>
> cpu_dying mask is not really useful for general consumption. The
> cpu_dying_mask bits are sticky even after cpu_up() or cpu_down() completes.
>
> A recent fix to plug a race in the per CPU counter code picked
> cpu_dying_mask to cure it. Unfortunately this does not work as the author
> probably expected and the behaviour of cpu_dying_mask is not easy to change
> without breaking the only other and initial user, the scheduler.
>
> This series addresses this by:
>
> 1) Reworking the per CPU counter hotplug mechanism so the race is fully
> plugged without using cpu_dying_mask
>
> 2) Replacing the cpu_dying_mask logic with hotplug core internal state
> which is exposed to the scheduler with a properly documented
> function.
>
> The series is also available from git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git smp/dying_mask
>
> Thanks
>
> tglx
> ---
> include/linux/cpuhotplug.h | 2 -
> include/linux/cpumask.h | 21 ----------------
> kernel/cpu.c | 45 +++++++++++++++++++++++++++++------
> kernel/sched/core.c | 4 +--
> kernel/smpboot.h | 2 +
> lib/percpu_counter.c | 57 +++++++++++++++++++--------------------------
> 6 files changed, 67 insertions(+), 64 deletions(-)

This has been on my mind and regretfully it's been a busy year for me.

I know the merge window is around the corner, but I rebased this series
onto percpu#for-6.8 [1]. I had to massage percpu_counter slightly due
to some changes but other than that it largely is intact. I need to do a
little bit of a more thorough pass and re-send it out, but I think it
remains correct to merge. I can then pull it, give it a few days to soak
in for-next and then send it to Linus either in a follow up PR or in the
2nd week of the merge window.

Thomas, how does this sound to you?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git/log/?h=percpu-hotplug

Thanks,
Dennis