2024-05-22 15:19:27

by Oleg Nesterov

[permalink] [raw]
Subject: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

After the recent comment 5097cbcb38e6 ("sched/isolation: Prevent boot crash
when the boot CPU is nohz_full") the kernel no longer crashes, but there is
another problem.

In this case tick_setup_device() does tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers WARN_ON_ONCE(irqs_disabled())
in smp_call_function_single().

I don't understand this code even remotely, I failed to find the fix.

Perhaps we can use smp_call_function_single_async() as a workaround ?

But I don't even understand why exactly we need smp_call_function()...
Lets suppose we change

--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -231,9 +231,9 @@ static void tick_setup_device(struct tick_device *td,

} else if (tick_do_timer_boot_cpu != -1 &&
!tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
+// tick_take_do_timer_from_boot();
tick_do_timer_boot_cpu = -1;
- WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
+ WRITE_ONCE(tick_do_timer_cpu), cpu);
#endif
}

Can anyone explain what exactly can go wrong?

Race with tick_nohz_stop_tick() on boot CPU which can set
tick_do_timer_cpu = TICK_DO_TIMER_NONE? Is it really bad?

Something else?

Please help, I don't think I can find a sane solution :/

Oleg.



2024-05-23 13:26:00

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On 05/22, Oleg Nesterov wrote:
>
> After the recent comment 5097cbcb38e6 ("sched/isolation: Prevent boot crash
> when the boot CPU is nohz_full") the kernel no longer crashes, but there is
> another problem.
>
> In this case tick_setup_device() does tick_take_do_timer_from_boot() to
> update tick_do_timer_cpu and this triggers WARN_ON_ONCE(irqs_disabled())
> in smp_call_function_single().
>
> I don't understand this code even remotely, I failed to find the fix.
>
> Perhaps we can use smp_call_function_single_async() as a workaround ?
>
> But I don't even understand why exactly we need smp_call_function()...

..

> Race with tick_nohz_stop_tick() on boot CPU which can set
> tick_do_timer_cpu = TICK_DO_TIMER_NONE? Is it really bad?

And is it supposed to happen if tick_nohz_full_running ?

tick_sched_do_timer() and can_stop_idle_tick() claim that
TICK_DO_TIMER_NONE is not possible in this case...

So, once again, could you explain why the patch below is wrong?

Oleg.
---

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..907b44d8cf1f 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
}
}

-#ifdef CONFIG_NO_HZ_FULL
-static void giveup_do_timer(void *info)
-{
- int cpu = *(unsigned int *)info;
-
- WARN_ON(tick_do_timer_cpu != smp_processor_id());
-
- tick_do_timer_cpu = cpu;
-}
-
-static void tick_take_do_timer_from_boot(void)
-{
- int cpu = smp_processor_id();
- int from = tick_do_timer_boot_cpu;
-
- if (from >= 0 && from != cpu)
- smp_call_function_single(from, giveup_do_timer, &cpu, 1);
-}
-#endif
-
/*
* Setup the tick device
*/
@@ -231,9 +211,8 @@ static void tick_setup_device(struct tick_device *td,

} else if (tick_do_timer_boot_cpu != -1 &&
!tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
tick_do_timer_boot_cpu = -1;
- WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
#endif
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 71a792cd8936..3b1d011d45e1 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1014,6 +1014,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
*/
tick_cpu = READ_ONCE(tick_do_timer_cpu);
if (tick_cpu == cpu) {
+#ifdef CONFIG_NO_HZ_FULL
+ WARN_ON_ONCE(tick_nohz_full_running);
+#endif
WRITE_ONCE(tick_do_timer_cpu, TICK_DO_TIMER_NONE);
tick_sched_flag_set(ts, TS_FLAG_DO_TIMER_LAST);
} else if (tick_cpu != TICK_DO_TIMER_NONE) {


2024-05-24 09:33:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

Oleg!

On Thu, May 23 2024 at 15:23, Oleg Nesterov wrote:
> On 05/22, Oleg Nesterov wrote:
>>
>> After the recent comment 5097cbcb38e6 ("sched/isolation: Prevent boot crash
>> when the boot CPU is nohz_full") the kernel no longer crashes, but there is
>> another problem.
>>
>> In this case tick_setup_device() does tick_take_do_timer_from_boot() to
>> update tick_do_timer_cpu and this triggers WARN_ON_ONCE(irqs_disabled())
>> in smp_call_function_single().
>>
>> I don't understand this code even remotely, I failed to find the fix.
>>
>> Perhaps we can use smp_call_function_single_async() as a workaround ?
>>
>> But I don't even understand why exactly we need smp_call_function()...

It's not required at all.

>> Race with tick_nohz_stop_tick() on boot CPU which can set
>> tick_do_timer_cpu = TICK_DO_TIMER_NONE? Is it really bad?

This can't happen.

> And is it supposed to happen if tick_nohz_full_running ?
>
> tick_sched_do_timer() and can_stop_idle_tick() claim that
> TICK_DO_TIMER_NONE is not possible in this case...

What happens during boot is:

1) The boot CPU takes the do_timer duty when it installs its
clockevent device

2) The boot CPU does not give up the duty because of this
condition in can_stop_idle_tick():

if (tick_nohz_full_enabled()) {
if (tick_cpu == cpu)
return false;
...

So there is no race because the boot CPU _cannot_ reach
tick_nohz_stop_tick() as long as no secondary has taken over.

It's far from obvious. What a horrible maze...

> So, once again, could you explain why the patch below is wrong?

> - tick_take_do_timer_from_boot();
> tick_do_timer_boot_cpu = -1;
> - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> + WRITE_ONCE(tick_do_timer_cpu, cpu);

This part is perfectly fine.

> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 71a792cd8936..3b1d011d45e1 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1014,6 +1014,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> */
> tick_cpu = READ_ONCE(tick_do_timer_cpu);
> if (tick_cpu == cpu) {
> +#ifdef CONFIG_NO_HZ_FULL
> + WARN_ON_ONCE(tick_nohz_full_running);
> +#endif

WARN_ON_ONCE(tick_nohz_full_enabled());

which spares the ugly #ifdef?

> WRITE_ONCE(tick_do_timer_cpu, TICK_DO_TIMER_NONE);
> tick_sched_flag_set(ts, TS_FLAG_DO_TIMER_LAST);
> } else if (tick_cpu != TICK_DO_TIMER_NONE) {

Thanks,

tglx

2024-05-24 14:12:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

Hi Thomas,

On 05/24, Thomas Gleixner wrote:
>
> >> But I don't even understand why exactly we need smp_call_function()...
>
> It's not required at all.
>
> >> Race with tick_nohz_stop_tick() on boot CPU which can set
> >> tick_do_timer_cpu = TICK_DO_TIMER_NONE? Is it really bad?
>
> This can't happen.
>
> > And is it supposed to happen if tick_nohz_full_running ?
> >
> > tick_sched_do_timer() and can_stop_idle_tick() claim that
> > TICK_DO_TIMER_NONE is not possible in this case...
>
> What happens during boot is:
>
> 1) The boot CPU takes the do_timer duty when it installs its
> clockevent device
>
> 2) The boot CPU does not give up the duty because of this
> condition in can_stop_idle_tick():
>
> if (tick_nohz_full_enabled()) {
> if (tick_cpu == cpu)
> return false;

Yes, I have looked at this code too. But I failed to understand its
callers, even tick_nohz_idle_stop_tick() which doesn't even call this
function when ts->timer_expires != 0.

This code is too tricky for me, I still don't follow the logic.
Damn, I can't even remember the names of all these functions ;)

> > So, once again, could you explain why the patch below is wrong?
>
> > - tick_take_do_timer_from_boot();
> > tick_do_timer_boot_cpu = -1;
> > - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> > + WRITE_ONCE(tick_do_timer_cpu, cpu);
>
> This part is perfectly fine.

Great, thanks! I'll write the changelog and send the patch tomorrow.

> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -1014,6 +1014,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> > */
> > tick_cpu = READ_ONCE(tick_do_timer_cpu);
> > if (tick_cpu == cpu) {
> > +#ifdef CONFIG_NO_HZ_FULL
> > + WARN_ON_ONCE(tick_nohz_full_running);
> > +#endif
>
> WARN_ON_ONCE(tick_nohz_full_enabled());
>
> which spares the ugly #ifdef?

Yes but tick_nohz_full_enabled() depends on context_tracking_key, and
context_tracking_enabled() is false without CONFIG_CONTEXT_TRACKING_USER.
I didn't even try to check if it is selected by NO_HZ_FULL and how do
they play together.

But you know, I won't include this WARN_ON_ONCE(), I have added it for
(very basic) testing. We have another WARN_ON(tick_nohz_full_running)
in tick_sched_do_timer(), I guess it should be enough.

Thanks!

Oleg.


2024-05-24 15:21:06

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

Le Fri, May 24, 2024 at 11:31:12AM +0200, Thomas Gleixner a ?crit :
> Oleg!
>
> On Thu, May 23 2024 at 15:23, Oleg Nesterov wrote:
> > On 05/22, Oleg Nesterov wrote:
> >>
> >> After the recent comment 5097cbcb38e6 ("sched/isolation: Prevent boot crash
> >> when the boot CPU is nohz_full") the kernel no longer crashes, but there is
> >> another problem.
> >>
> >> In this case tick_setup_device() does tick_take_do_timer_from_boot() to
> >> update tick_do_timer_cpu and this triggers WARN_ON_ONCE(irqs_disabled())
> >> in smp_call_function_single().
> >>
> >> I don't understand this code even remotely, I failed to find the fix.
> >>
> >> Perhaps we can use smp_call_function_single_async() as a workaround ?
> >>
> >> But I don't even understand why exactly we need smp_call_function()...
>
> It's not required at all.
>
> >> Race with tick_nohz_stop_tick() on boot CPU which can set
> >> tick_do_timer_cpu = TICK_DO_TIMER_NONE? Is it really bad?
>
> This can't happen.

Actually... The boot CPU is nohz_full and nothing prevents it
from stopping its tick once IRQs are enabled and before calling
tick_nohz_idle_enter(). When that happens, tick_nohz_full_update_tick()
doesn't go through can_stop_idle_tick() and therefore doesn't check if it
is the timekeeper. And then it goes through tick_nohz_stop_tick() which
can set tick_do_timer_cpu = TICK_DO_TIMER_NONE.

>
> > And is it supposed to happen if tick_nohz_full_running ?
> >
> > tick_sched_do_timer() and can_stop_idle_tick() claim that
> > TICK_DO_TIMER_NONE is not possible in this case...
>
> What happens during boot is:
>
> 1) The boot CPU takes the do_timer duty when it installs its
> clockevent device
>
> 2) The boot CPU does not give up the duty because of this
> condition in can_stop_idle_tick():
>
> if (tick_nohz_full_enabled()) {
> if (tick_cpu == cpu)
> return false;
> ...
>
> So there is no race because the boot CPU _cannot_ reach
> tick_nohz_stop_tick() as long as no secondary has taken over.
>
> It's far from obvious. What a horrible maze..

I know, I wish I had the time to Nack that nohz_full boot CPU
patch back then. But now we have to maintain it, even though it's
broken and uglifies the situation.

Anyway, we probably need to prevent from stopping the tick
as long as a CPU is the timekeeper and some CPU (could be the same)
is nohz_full somewhere.

That needs to be a seperate change (I'll try to fix that after
the week-end with a new brain) and then Oleg's patch can go on
top of it.

Thanks.

>
> > So, once again, could you explain why the patch below is wrong?
>
> > - tick_take_do_timer_from_boot();
> > tick_do_timer_boot_cpu = -1;
> > - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> > + WRITE_ONCE(tick_do_timer_cpu, cpu);
>
> This part is perfectly fine.
>
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 71a792cd8936..3b1d011d45e1 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -1014,6 +1014,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> > */
> > tick_cpu = READ_ONCE(tick_do_timer_cpu);
> > if (tick_cpu == cpu) {
> > +#ifdef CONFIG_NO_HZ_FULL
> > + WARN_ON_ONCE(tick_nohz_full_running);
> > +#endif
>
> WARN_ON_ONCE(tick_nohz_full_enabled());
>
> which spares the ugly #ifdef?
>
> > WRITE_ONCE(tick_do_timer_cpu, TICK_DO_TIMER_NONE);
> > tick_sched_flag_set(ts, TS_FLAG_DO_TIMER_LAST);
> > } else if (tick_cpu != TICK_DO_TIMER_NONE) {
>
> Thanks,
>
> tglx

2024-05-24 15:23:04

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

Le Fri, May 24, 2024 at 04:10:52PM +0200, Oleg Nesterov a ?crit :
> Hi Thomas,
>
> On 05/24, Thomas Gleixner wrote:
> >
> > >> But I don't even understand why exactly we need smp_call_function()...
> >
> > It's not required at all.
> >
> > >> Race with tick_nohz_stop_tick() on boot CPU which can set
> > >> tick_do_timer_cpu = TICK_DO_TIMER_NONE? Is it really bad?
> >
> > This can't happen.
> >
> > > And is it supposed to happen if tick_nohz_full_running ?
> > >
> > > tick_sched_do_timer() and can_stop_idle_tick() claim that
> > > TICK_DO_TIMER_NONE is not possible in this case...
> >
> > What happens during boot is:
> >
> > 1) The boot CPU takes the do_timer duty when it installs its
> > clockevent device
> >
> > 2) The boot CPU does not give up the duty because of this
> > condition in can_stop_idle_tick():
> >
> > if (tick_nohz_full_enabled()) {
> > if (tick_cpu == cpu)
> > return false;
>
> Yes, I have looked at this code too. But I failed to understand its
> callers, even tick_nohz_idle_stop_tick() which doesn't even call this
> function when ts->timer_expires != 0.
>
> This code is too tricky for me, I still don't follow the logic.
> Damn, I can't even remember the names of all these functions ;)
>
> > > So, once again, could you explain why the patch below is wrong?
> >
> > > - tick_take_do_timer_from_boot();
> > > tick_do_timer_boot_cpu = -1;
> > > - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> > > + WRITE_ONCE(tick_do_timer_cpu, cpu);
> >
> > This part is perfectly fine.
>
> Great, thanks! I'll write the changelog and send the patch tomorrow.
>
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -1014,6 +1014,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> > > */
> > > tick_cpu = READ_ONCE(tick_do_timer_cpu);
> > > if (tick_cpu == cpu) {
> > > +#ifdef CONFIG_NO_HZ_FULL
> > > + WARN_ON_ONCE(tick_nohz_full_running);
> > > +#endif
> >
> > WARN_ON_ONCE(tick_nohz_full_enabled());
> >
> > which spares the ugly #ifdef?
>
> Yes but tick_nohz_full_enabled() depends on context_tracking_key, and
> context_tracking_enabled() is false without CONFIG_CONTEXT_TRACKING_USER.
> I didn't even try to check if it is selected by NO_HZ_FULL and how do
> they play together.

Yes it is! You can rely on it.

Thanks.

2024-05-24 17:16:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On Fri, May 24 2024 at 17:20, Frederic Weisbecker wrote:
> Le Fri, May 24, 2024 at 11:31:12AM +0200, Thomas Gleixner a écrit :
>> >> Race with tick_nohz_stop_tick() on boot CPU which can set
>> >> tick_do_timer_cpu = TICK_DO_TIMER_NONE? Is it really bad?
>>
>> This can't happen.
>
> Actually... The boot CPU is nohz_full and nothing prevents it
> from stopping its tick once IRQs are enabled and before calling
> tick_nohz_idle_enter(). When that happens, tick_nohz_full_update_tick()
> doesn't go through can_stop_idle_tick() and therefore doesn't check if it
> is the timekeeper. And then it goes through tick_nohz_stop_tick() which
> can set tick_do_timer_cpu = TICK_DO_TIMER_NONE.

OMG...

2024-05-24 18:38:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

Frederic. Thanks for looking at this!

I've already had a few beers today, I know I'll regret about this email
tomorrow, but I can't resist ;)

On 05/24, Frederic Weisbecker wrote:
>
> Actually... The boot CPU is nohz_full and nothing prevents it
> from stopping its tick once IRQs are enabled and before calling
> tick_nohz_idle_enter(). When that happens, tick_nohz_full_update_tick()
> doesn't go through can_stop_idle_tick() and therefore doesn't check if it
> is the timekeeper. And then it goes through tick_nohz_stop_tick() which
> can set tick_do_timer_cpu = TICK_DO_TIMER_NONE.

Heh. May be you mean, say, tick_nohz_full_update_tick() or
tick_nohz_idle_update_tick() which check tick_nohz_full_cpu(cpu).
Yes this doesn't help during the boot, and this was another source for
confusion to me.

But again, again. tick_sched_do_timer() says

* If nohz_full is enabled, this should not happen because the
* 'tick_do_timer_cpu' CPU never relinquishes.

so I guess it is not supposed to happen?

And. My main question was: how can smp_call_function_single() help???
Why do we actually need it?

Thanks ;)

Oleg.


2024-05-24 22:06:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On Fri, May 24 2024 at 20:37, Oleg Nesterov wrote:

> I've already had a few beers today, I know I'll regret about this
> email tomorrow, but I can't resist ;)

You won't regret it. :)

> On 05/24, Frederic Weisbecker wrote:
> But again, again. tick_sched_do_timer() says
>
> * If nohz_full is enabled, this should not happen because the
> * 'tick_do_timer_cpu' CPU never relinquishes.
>
> so I guess it is not supposed to happen?

Right. It does not happen because the kernel starts with jiffies as
clocksource except on S390. The jiffies clocksource is not qualified to
switch over to NOHZ mode for obvious reasons. But even on S390 which has
a truly usable and useful clocksource the tick stays periodic to begin
with. Why?

The NOHZ ready notification happens late in the boot process via:
fs_initcall(clocksource_done_booting)

So by the time that happens, the secondary CPUs are up and have taken
over the do timer duty.

[ 0.600381] smp: Bringing up secondary CPUs ...

...

[ 1.917842] clocksource: Switched to clocksource kvm-clock
[ 1.918548] clocksource_done_booting: Switched to NOHZ // debug printk

This is the point where tick_nohz_activate() is called first time and
that does:

tick_sched_flag_set(ts, TS_FLAG_NOHZ);

So up to this point the tick is never stopped neither on housekeeping
nor on NOHZ FULL CPUs:

tick_nohz_full_update_tick()
if (!tick_sched_flag_test(ts, TS_FLAG_NOHZ))
return;

> And. My main question was: how can smp_call_function_single() help???

It's useless.

> Why do we actually need it?

We do not.

As explained above there is also nothing extra to fix contrary to
Frederics fears.

Even in the case that a command line limitation restricts the number of
CPUs such that there is no housekeeping CPU onlined during
smp_init(). That is checked in the isolation init code which clears
nohz_full_running in that case. Nothing to see there either.

So all this needs is the simple:

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..dab17d756fd8 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -229,11 +209,9 @@ static void tick_setup_device(struct tick_device *td,
if (tick_nohz_full_cpu(cpu))
tick_do_timer_boot_cpu = cpu;

- } else if (tick_do_timer_boot_cpu != -1 &&
- !tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
+ } else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) {
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_do_timer_boot_cpu = -1;
- WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
#endif
}

along with the removal of the SMP function call voodoo programming gunk,
a lengthy changelog and a bunch of useful comments.

Changing the horribly lazy and incomprehensible '-1' to an actual
meaningful define, e.g. TICK_DO_TIMER_NONE, would definitely help along
with renaming the variable to tick_do_timer_nohz_full_boot_cpu.

There is no race other than the boot CPU reading tick_do_timer_cpu
concurrently to the update, but that's completely harmless whatever it
sees there. If it's the boot CPU, i.e. 0, or the secondary does not
matter. The secondary immediately schedules the tick unconditionally so
timekeeping and jiffies will just work.

If the secondary CPU fails to come up after it installed the clock event
device then the missing tick is the least of the problems.

That has absolutely nothing to do with the issue at hand. If the CPU
which owns tick_do_timer_cpu dies or gets stuck then all bets are off
independent of NOHZ FULL. See the changes which went in during the merge
window to handle the case where the hypervisor fails to inject the timer
interrupts or keeps the time keeper duty CPU scheduled out for a long
period of time....

Thanks,

tglx

2024-05-25 13:53:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

Thomas, thanks a lot!

Let me grep a bit more to better understand your explanations.

Just one note for now.

On 05/25, Thomas Gleixner wrote:
>
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -229,11 +209,9 @@ static void tick_setup_device(struct tick_device *td,
> if (tick_nohz_full_cpu(cpu))
> tick_do_timer_boot_cpu = cpu;
>
> - } else if (tick_do_timer_boot_cpu != -1 &&
> - !tick_nohz_full_cpu(cpu)) {
> - tick_take_do_timer_from_boot();
> + } else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) {
> + WRITE_ONCE(tick_do_timer_cpu, cpu);
> tick_do_timer_boot_cpu = -1;
> - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> #endif
> }
>
> along with the removal of the SMP function call voodoo programming gunk,

Yes,

> Changing the horribly lazy and incomprehensible '-1' to an actual
> meaningful define, e.g. TICK_DO_TIMER_NONE, would definitely help along
> with renaming the variable to tick_do_timer_nohz_full_boot_cpu.

Better yet, we can make it a boolean, we do not need cpu number. And
perhaps we can simply kill it along with #ifdef CONFIG_NO_HZ_FULL ?

if (!td->evtdev) {
tick_cpu = READ_ONCE(tick_do_timer_cpu);
/*
* If no cpu took the do_timer update, assign it to
* this cpu:
*/
if (tick_cpu == TICK_DO_TIMER_BOOT) {
WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_next_period = ktime_get();
/*
* The boot CPU may be nohz_full, in which case the
* first housekeeping secondary will take do_timer
* from us.
*/
} else if (tick_nohz_full_cpu(tick_cpu) &&
!tick_nohz_full_cpu(cpu)) {
WRITE_ONCE(tick_do_timer_cpu, cpu);
}

Oleg.


2024-05-25 14:14:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On 05/25, Oleg Nesterov wrote:
>
> perhaps we can simply kill it along with #ifdef CONFIG_NO_HZ_FULL ?
>
> if (!td->evtdev) {
> tick_cpu = READ_ONCE(tick_do_timer_cpu);
> /*
> * If no cpu took the do_timer update, assign it to
> * this cpu:
> */
> if (tick_cpu == TICK_DO_TIMER_BOOT) {
> WRITE_ONCE(tick_do_timer_cpu, cpu);
> tick_next_period = ktime_get();
> /*
> * The boot CPU may be nohz_full, in which case the
> * first housekeeping secondary will take do_timer
> * from us.
> */
> } else if (tick_nohz_full_cpu(tick_cpu) &&
> !tick_nohz_full_cpu(cpu)) {
> WRITE_ONCE(tick_do_timer_cpu, cpu);
> }

although tick_nohz_full_cpu(tick_cpu) above depends on the fact that
tick_cpu = TICK_DO_TIMER_NONE should not be possible if
tick_nohz_full_enabled(), not good.

Oleg.


2024-05-26 19:29:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

The more I grep the more I confused.

On 05/25, Thomas Gleixner wrote:
>
> Right. It does not happen because the kernel starts with jiffies as
> clocksource except on S390. The jiffies clocksource is not qualified to
> switch over to NOHZ mode for obvious reasons.

Not obvious for those who never looked at this code ;)

OK, clocksource_jiffies doesn't have CLOCK_SOURCE_VALID_FOR_HRES,

> fs_initcall(clocksource_done_booting)

So clocksource_done_booting() -> clocksource_select() should find another
CLOCK_SOURCE_VALID_FOR_HRES cs, then timekeeping_notify(best)

- sets tk_core.timekeeper.tkr_mono.clock = best

- calls tick_clock_notify() which sets .check_clocks on every cpu.

This makes tick_check_oneshot_change(false) return true. (I am ignoring the
highres=n case, tick_nohz_switch_to_nohz() doesn't really differ).

Now, on every CPU the next ->event_handler == tick_handle_periodic path
call hrtimer_switch_to_hres() which

- sets ->event_handler == hrtimer_interrupt (tick_init_highres)
so tick_periodic/do_timer will be never called again

- calls tick_setup_sched_timer() -> tick_nohz_activate() which
sets TS_FLAG_NOHZ

> [ 1.918548] clocksource_done_booting: Switched to NOHZ // debug printk
>
> This is the point where tick_nohz_activate() is called first time and
> that does:
>
> tick_sched_flag_set(ts, TS_FLAG_NOHZ);

See above, but I got lost, most probably I misunderstood these nontrivial
code paths.

> So up to this point the tick is never stopped neither on housekeeping
> nor on NOHZ FULL CPUs:
>
> tick_nohz_full_update_tick()
> if (!tick_sched_flag_test(ts, TS_FLAG_NOHZ))
> return;

OK... But tick_nohz_idle_update_tick() doesn't check TS_FLAG_NOHZ and
the tick_nohz_full_cpu() check can't help at boot time.

And I still don't understand why we can rely on can_stop_idle_tick() even
in tick_nohz_idle_stop_tick().

I'll try to read this code again tomorrow, but it will never fit my poor
little brain ;)

Oleg.


2024-05-26 20:52:40

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

Le Sun, May 26, 2024 at 09:27:58PM +0200, Oleg Nesterov a ?crit :
> > So up to this point the tick is never stopped neither on housekeeping
> > nor on NOHZ FULL CPUs:
> >
> > tick_nohz_full_update_tick()
> > if (!tick_sched_flag_test(ts, TS_FLAG_NOHZ))
> > return;
>
> OK... But tick_nohz_idle_update_tick() doesn't check TS_FLAG_NOHZ and
> the tick_nohz_full_cpu() check can't help at boot time.

Yes but tick_nohz_idle_update_tick() is only called when the tick is already
stopped. And for the tick to be already stopped, TS_FLAG_NOHZ must have been
set.

> And I still don't understand why we can rely on can_stop_idle_tick() even
> in tick_nohz_idle_stop_tick().

Not sure I follow you on this one...

>
> I'll try to read this code again tomorrow, but it will never fit my poor
> little brain ;)

You understood more than I ever did in just a few hours :-)

But yes the tick code was much more simple before I put my own hands
inside. nohz_full and cpuidle did not arrange it.

Thanks.

2024-05-26 20:57:12

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

Le Sat, May 25, 2024 at 12:06:06AM +0200, Thomas Gleixner a ?crit :
> On Fri, May 24 2024 at 20:37, Oleg Nesterov wrote:
>
> > I've already had a few beers today, I know I'll regret about this
> > email tomorrow, but I can't resist ;)
>
> You won't regret it. :)
>
> > On 05/24, Frederic Weisbecker wrote:
> > But again, again. tick_sched_do_timer() says
> >
> > * If nohz_full is enabled, this should not happen because the
> > * 'tick_do_timer_cpu' CPU never relinquishes.
> >
> > so I guess it is not supposed to happen?
>
> Right. It does not happen because the kernel starts with jiffies as
> clocksource except on S390. The jiffies clocksource is not qualified to
> switch over to NOHZ mode for obvious reasons. But even on S390 which has
> a truly usable and useful clocksource the tick stays periodic to begin
> with. Why?
>
> The NOHZ ready notification happens late in the boot process via:
> fs_initcall(clocksource_done_booting)
>
> So by the time that happens, the secondary CPUs are up and have taken
> over the do timer duty.
>
> [ 0.600381] smp: Bringing up secondary CPUs ...
>
> ....
>
> [ 1.917842] clocksource: Switched to clocksource kvm-clock
> [ 1.918548] clocksource_done_booting: Switched to NOHZ // debug printk
>
> This is the point where tick_nohz_activate() is called first time and
> that does:
>
> tick_sched_flag_set(ts, TS_FLAG_NOHZ);
>
> So up to this point the tick is never stopped neither on housekeeping
> nor on NOHZ FULL CPUs:
>
> tick_nohz_full_update_tick()
> if (!tick_sched_flag_test(ts, TS_FLAG_NOHZ))
> return;

Oh nice! I indeed completely skipped the clocksource part of the picture!

Thanks.

2024-05-27 09:10:35

by Nicholas Piggin

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On Sat May 25, 2024 at 8:06 AM AEST, Thomas Gleixner wrote:
> On Fri, May 24 2024 at 20:37, Oleg Nesterov wrote:
>
> > I've already had a few beers today, I know I'll regret about this
> > email tomorrow, but I can't resist ;)
>
> You won't regret it. :)
>
> > On 05/24, Frederic Weisbecker wrote:
> > But again, again. tick_sched_do_timer() says
> >
> > * If nohz_full is enabled, this should not happen because the
> > * 'tick_do_timer_cpu' CPU never relinquishes.
> >
> > so I guess it is not supposed to happen?
>
> Right. It does not happen because the kernel starts with jiffies as
> clocksource except on S390. The jiffies clocksource is not qualified to
> switch over to NOHZ mode for obvious reasons. But even on S390 which has
> a truly usable and useful clocksource the tick stays periodic to begin
> with. Why?
>
> The NOHZ ready notification happens late in the boot process via:
> fs_initcall(clocksource_done_booting)
>
> So by the time that happens, the secondary CPUs are up and have taken
> over the do timer duty.
>
> [ 0.600381] smp: Bringing up secondary CPUs ...
>
> ....
>
> [ 1.917842] clocksource: Switched to clocksource kvm-clock
> [ 1.918548] clocksource_done_booting: Switched to NOHZ // debug printk
>
> This is the point where tick_nohz_activate() is called first time and
> that does:
>
> tick_sched_flag_set(ts, TS_FLAG_NOHZ);
>
> So up to this point the tick is never stopped neither on housekeeping
> nor on NOHZ FULL CPUs:
>
> tick_nohz_full_update_tick()
> if (!tick_sched_flag_test(ts, TS_FLAG_NOHZ))
> return;
>
> > And. My main question was: how can smp_call_function_single() help???
>
> It's useless.
>
> > Why do we actually need it?
>
> We do not.
>
> As explained above there is also nothing extra to fix contrary to
> Frederics fears.
>
> Even in the case that a command line limitation restricts the number of
> CPUs such that there is no housekeeping CPU onlined during
> smp_init(). That is checked in the isolation init code which clears
> nohz_full_running in that case. Nothing to see there either.
>
> So all this needs is the simple:
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index d88b13076b79..dab17d756fd8 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -229,11 +209,9 @@ static void tick_setup_device(struct tick_device *td,
> if (tick_nohz_full_cpu(cpu))
> tick_do_timer_boot_cpu = cpu;
>
> - } else if (tick_do_timer_boot_cpu != -1 &&
> - !tick_nohz_full_cpu(cpu)) {
> - tick_take_do_timer_from_boot();
> + } else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) {
> + WRITE_ONCE(tick_do_timer_cpu, cpu);
> tick_do_timer_boot_cpu = -1;
> - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> #endif
> }
>
> along with the removal of the SMP function call voodoo programming gunk,
> a lengthy changelog and a bunch of useful comments.

I might not have tested that path on powerpc since it should not
switch clockevent driver (or clocksource either I think) at least
on 64-bit. Explains the smp_call_function warning if you are
testing on x86 :/

Thanks,
Nick

2024-05-27 10:23:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On Mon, May 27 2024 at 19:10, Nicholas Piggin wrote:
> On Sat May 25, 2024 at 8:06 AM AEST, Thomas Gleixner wrote:
>> along with the removal of the SMP function call voodoo programming gunk,
>> a lengthy changelog and a bunch of useful comments.
>
> I might not have tested that path on powerpc since it should not
> switch clockevent driver (or clocksource either I think) at least
> on 64-bit. Explains the smp_call_function warning if you are
> testing on x86 :/

Even on PowerPC the per CPU clockevent driver is registered during early
boot for the boot CPU and for the APs when they are onlined. Before your
change the boot CPU was unconditionally taking over the do_timer duty
and never gave up on it in the NOHZ full case.

The logic you added allows that the duty is taken by the first
housekeeping CPU in the case that the boot CPU is marked NOHZ full.

So yes, that function call _is_ invoked on PowerPC too if the boot CPU
is NOHZ full. There is absolutely nothing x86 specific.

The difference is that PowerPC registers the per CPU clockevent _before_
setting the CPU online and x86 does it afterwards.

So the warning does not trigger on PowerPC because:

WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled() && ...);

Thanks,

tglx


2024-05-27 11:02:06

by Nicholas Piggin

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On Mon May 27, 2024 at 5:27 AM AEST, Oleg Nesterov wrote:
> The more I grep the more I confused.
>
> On 05/25, Thomas Gleixner wrote:
> >
> > Right. It does not happen because the kernel starts with jiffies as
> > clocksource except on S390. The jiffies clocksource is not qualified to
> > switch over to NOHZ mode for obvious reasons.
>
> Not obvious for those who never looked at this code ;)
>
> OK, clocksource_jiffies doesn't have CLOCK_SOURCE_VALID_FOR_HRES,

jiffies clocksource requires a timer to run it I suppose.

[snip]

> > So up to this point the tick is never stopped neither on housekeeping
> > nor on NOHZ FULL CPUs:
> >
> > tick_nohz_full_update_tick()
> > if (!tick_sched_flag_test(ts, TS_FLAG_NOHZ))
> > return;
>
> OK... But tick_nohz_idle_update_tick() doesn't check TS_FLAG_NOHZ and
> the tick_nohz_full_cpu() check can't help at boot time.
>
> And I still don't understand why we can rely on can_stop_idle_tick() even
> in tick_nohz_idle_stop_tick().

AFAIKS timer_expires_base would be 0 unless tick_nohz_next_event()
were called, but that is only called from places that checked
can_stop_idle_tick() or is already a tick_nohz_full() CPU.

And timer tick CPU can't be unplugged with nohz full.

Thanks,
Nick

2024-05-27 11:17:33

by Nicholas Piggin

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On Mon May 27, 2024 at 8:23 PM AEST, Thomas Gleixner wrote:
> On Mon, May 27 2024 at 19:10, Nicholas Piggin wrote:
> > On Sat May 25, 2024 at 8:06 AM AEST, Thomas Gleixner wrote:
> >> along with the removal of the SMP function call voodoo programming gunk,
> >> a lengthy changelog and a bunch of useful comments.
> >
> > I might not have tested that path on powerpc since it should not
> > switch clockevent driver (or clocksource either I think) at least
> > on 64-bit. Explains the smp_call_function warning if you are
> > testing on x86 :/
>
> Even on PowerPC the per CPU clockevent driver is registered during early
> boot for the boot CPU and for the APs when they are onlined. Before your
> change the boot CPU was unconditionally taking over the do_timer duty
> and never gave up on it in the NOHZ full case.
>
> The logic you added allows that the duty is taken by the first
> housekeeping CPU in the case that the boot CPU is marked NOHZ full.
>
> So yes, that function call _is_ invoked on PowerPC too if the boot CPU
> is NOHZ full. There is absolutely nothing x86 specific.

Okay, right you are. Indeed it does trigger.

> The difference is that PowerPC registers the per CPU clockevent _before_
> setting the CPU online and x86 does it afterwards.
>
> So the warning does not trigger on PowerPC because:
>
> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled() && ...);

That explains it.

Thanks,
Nick

2024-05-27 16:09:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On 05/26, Frederic Weisbecker wrote:
>
> Le Sun, May 26, 2024 at 09:27:58PM +0200, Oleg Nesterov a ?crit :
> > > So up to this point the tick is never stopped neither on housekeeping
> > > nor on NOHZ FULL CPUs:
> > >
> > > tick_nohz_full_update_tick()
> > > if (!tick_sched_flag_test(ts, TS_FLAG_NOHZ))
> > > return;
> >
> > OK... But tick_nohz_idle_update_tick() doesn't check TS_FLAG_NOHZ and
> > the tick_nohz_full_cpu() check can't help at boot time.
>
> Yes but tick_nohz_idle_update_tick() is only called when the tick is already
> stopped. And for the tick to be already stopped, TS_FLAG_NOHZ must have been
> set.

Ah, TS_FLAG_NOHZ! Indeed, thanks.

> > And I still don't understand why we can rely on can_stop_idle_tick() even
> > in tick_nohz_idle_stop_tick().
>
> Not sure I follow you on this one...

can_stop_idle_tick() has no effect if ->timer_expires_base || tick_nohz_next_event()
but I see another email from Nicholas.

Thanks!

Oleg.


2024-05-27 16:23:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On 05/27, Nicholas Piggin wrote:
>
> On Mon May 27, 2024 at 5:27 AM AEST, Oleg Nesterov wrote:
> > The more I grep the more I confused.
> >
> > On 05/25, Thomas Gleixner wrote:
> > >
> > > Right. It does not happen because the kernel starts with jiffies as
> > > clocksource except on S390. The jiffies clocksource is not qualified to
> > > switch over to NOHZ mode for obvious reasons.
> >
> > Not obvious for those who never looked at this code ;)
> >
> > OK, clocksource_jiffies doesn't have CLOCK_SOURCE_VALID_FOR_HRES,
>
> jiffies clocksource requires a timer to run it I suppose.

I meant, this is enough to ensure that clocksource_done_booting() paths should
find another cs, best != curr_clocksource, and call timekeeping_notify().

Nevermind, quite possibly I misread this code.

> > And I still don't understand why we can rely on can_stop_idle_tick() even
> > in tick_nohz_idle_stop_tick().
>
> AFAIKS timer_expires_base would be 0 unless tick_nohz_next_event()
> were called, but that is only called from places that checked
> can_stop_idle_tick() or is already a tick_nohz_full() CPU.

OK, thanks.

So, Frederic, Nicholas, any objections to the trivial change below?

We can cleanup the tick_do_timer_boot_cpu logic on top of it.

Oleg.

--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
}
}

-#ifdef CONFIG_NO_HZ_FULL
-static void giveup_do_timer(void *info)
-{
- int cpu = *(unsigned int *)info;
-
- WARN_ON(tick_do_timer_cpu != smp_processor_id());
-
- tick_do_timer_cpu = cpu;
-}
-
-static void tick_take_do_timer_from_boot(void)
-{
- int cpu = smp_processor_id();
- int from = tick_do_timer_boot_cpu;
-
- if (from >= 0 && from != cpu)
- smp_call_function_single(from, giveup_do_timer, &cpu, 1);
-}
-#endif
-
/*
* Setup the tick device
*/
@@ -231,9 +211,8 @@ static void tick_setup_device(struct tick_device *td,

} else if (tick_do_timer_boot_cpu != -1 &&
!tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
tick_do_timer_boot_cpu = -1;
- WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
#endif
}



2024-05-27 16:51:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On Mon, May 27 2024 at 21:01, Nicholas Piggin wrote:
> On Mon May 27, 2024 at 5:27 AM AEST, Oleg Nesterov wrote:
>> The more I grep the more I confused.
>>
>> On 05/25, Thomas Gleixner wrote:
>> >
>> > Right. It does not happen because the kernel starts with jiffies as
>> > clocksource except on S390. The jiffies clocksource is not qualified to
>> > switch over to NOHZ mode for obvious reasons.
>>
>> Not obvious for those who never looked at this code ;)
>>
>> OK, clocksource_jiffies doesn't have CLOCK_SOURCE_VALID_FOR_HRES,
>
> jiffies clocksource requires a timer to run it I suppose.

Yes, but it's all periodic (nohz disabled) mode until
clocksource_done_booting() selects a high resolution clocksource and
kicks the NOHZ stuff alive. That's a fs_initcall() and happens late in
the boot process.

Thanks,

tglx

2024-05-28 01:02:25

by Nicholas Piggin

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On Tue May 28, 2024 at 1:57 AM AEST, Oleg Nesterov wrote:
> On 05/27, Nicholas Piggin wrote:
> >
> > On Mon May 27, 2024 at 5:27 AM AEST, Oleg Nesterov wrote:
> > > The more I grep the more I confused.
> > >
> > > On 05/25, Thomas Gleixner wrote:
> > > >
> > > > Right. It does not happen because the kernel starts with jiffies as
> > > > clocksource except on S390. The jiffies clocksource is not qualified to
> > > > switch over to NOHZ mode for obvious reasons.
> > >
> > > Not obvious for those who never looked at this code ;)
> > >
> > > OK, clocksource_jiffies doesn't have CLOCK_SOURCE_VALID_FOR_HRES,
> >
> > jiffies clocksource requires a timer to run it I suppose.
>
> I meant, this is enough to ensure that clocksource_done_booting() paths should
> find another cs, best != curr_clocksource, and call timekeeping_notify().
>
> Nevermind, quite possibly I misread this code.
>
> > > And I still don't understand why we can rely on can_stop_idle_tick() even
> > > in tick_nohz_idle_stop_tick().
> >
> > AFAIKS timer_expires_base would be 0 unless tick_nohz_next_event()
> > were called, but that is only called from places that checked
> > can_stop_idle_tick() or is already a tick_nohz_full() CPU.
>
> OK, thanks.
>
> So, Frederic, Nicholas, any objections to the trivial change below?

Since Thomas says it's alright, then no. I guess I added it because I
was not certain about taking the tick_do_timer_cpu while the boot CPU
could be running a timer interrupt.

I would take some of his comment to explain the race is harmless and
put it in that if block.

Out of curiosity, you are getting this going on x86? Or another arch?
Any particular use-case?

Thanks,
Nick

>
> We can cleanup the tick_do_timer_boot_cpu logic on top of it.
>
> Oleg.
>
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
> }
> }
>
> -#ifdef CONFIG_NO_HZ_FULL
> -static void giveup_do_timer(void *info)
> -{
> - int cpu = *(unsigned int *)info;
> -
> - WARN_ON(tick_do_timer_cpu != smp_processor_id());
> -
> - tick_do_timer_cpu = cpu;
> -}
> -
> -static void tick_take_do_timer_from_boot(void)
> -{
> - int cpu = smp_processor_id();
> - int from = tick_do_timer_boot_cpu;
> -
> - if (from >= 0 && from != cpu)
> - smp_call_function_single(from, giveup_do_timer, &cpu, 1);
> -}
> -#endif
> -
> /*
> * Setup the tick device
> */
> @@ -231,9 +211,8 @@ static void tick_setup_device(struct tick_device *td,
>
> } else if (tick_do_timer_boot_cpu != -1 &&
> !tick_nohz_full_cpu(cpu)) {
> - tick_take_do_timer_from_boot();
> tick_do_timer_boot_cpu = -1;
> - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> + WRITE_ONCE(tick_do_timer_cpu, cpu);
> #endif
> }
>


2024-05-28 12:20:57

by Oleg Nesterov

[permalink] [raw]
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled

On 05/28, Nicholas Piggin wrote:
>
> >
> > So, Frederic, Nicholas, any objections to the trivial change below?
>
> Since Thomas says it's alright, then no. I guess I added it because I
> was not certain about taking the tick_do_timer_cpu while the boot CPU
> could be running a timer interrupt.

I thought about it too, but didn't see anything wrong...

Suppose that WRITE_ONCE(tick_do_timer_cpu, cpu) happens right after
tick_periodic() on the boot CPU sees READ_ONCE(tick_do_timer_cpu) == cpu.
Does this really differ from the case when tick_take_do_timer_from_boot()
waits for the boot CPU to return from timer_interrupt() ?

> I would take some of his comment to explain the race is harmless and
> put it in that if block.

Yes, yes, sure. See the patch I'll send in a minute.

> Out of curiosity, you are getting this going on x86?

Yes, and I didn't check other arch'es.

> Any particular use-case?

I have no idea. I noticed this problem when I was working on 5097cbcb38e6
("sched/isolation: Prevent boot crash when the boot CPU is nohz_full"), see
https://lore.kernel.org/all/[email protected]/

Perhaps Chris who reported that problem can add more details.

Oleg.


2024-05-28 12:22:07

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot
crash when the boot CPU is nohz_full") the kernel no longer crashes, but
there is another problem.

In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
in smp_call_function_single().

Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
comment tries to explain why this is safe (thanks Thomas!).

Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/time/tick-common.c | 39 +++++++++++++--------------------------
1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..27d0018c8b05 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
}
}

-#ifdef CONFIG_NO_HZ_FULL
-static void giveup_do_timer(void *info)
-{
- int cpu = *(unsigned int *)info;
-
- WARN_ON(tick_do_timer_cpu != smp_processor_id());
-
- tick_do_timer_cpu = cpu;
-}
-
-static void tick_take_do_timer_from_boot(void)
-{
- int cpu = smp_processor_id();
- int from = tick_do_timer_boot_cpu;
-
- if (from >= 0 && from != cpu)
- smp_call_function_single(from, giveup_do_timer, &cpu, 1);
-}
-#endif
-
/*
* Setup the tick device
*/
@@ -221,19 +201,26 @@ static void tick_setup_device(struct tick_device *td,
tick_next_period = ktime_get();
#ifdef CONFIG_NO_HZ_FULL
/*
- * The boot CPU may be nohz_full, in which case set
- * tick_do_timer_boot_cpu so the first housekeeping
- * secondary that comes up will take do_timer from
- * us.
+ * The boot CPU may be nohz_full, in which case the
+ * first housekeeping secondary will take do_timer()
+ * from us.
*/
if (tick_nohz_full_cpu(cpu))
tick_do_timer_boot_cpu = cpu;

} else if (tick_do_timer_boot_cpu != -1 &&
!tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
tick_do_timer_boot_cpu = -1;
- WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
+ /*
+ * The boot CPU will stay in periodic (NOHZ disabled)
+ * mode until clocksource_done_booting() called after
+ * smp_init() selects a high resolution clocksource and
+ * timekeeping_notify() kicks the NOHZ stuff alive.
+ *
+ * So this WRITE_ONCE can only race with the READ_ONCE
+ * check in tick_periodic() but this race is harmless.
+ */
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
#endif
}

--
2.25.1.362.g51ebf55



2024-05-28 12:47:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

If this patch/changelog/comment is fine, I'll send another trivial
one which turns tick_do_timer_boot_cpu into "bool boot_cpu_is_nohz_full".

On 05/28, Oleg Nesterov wrote:
>
> After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot
> crash when the boot CPU is nohz_full") the kernel no longer crashes, but
> there is another problem.
>
> In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
> update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
> in smp_call_function_single().
>
> Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
> comment tries to explain why this is safe (thanks Thomas!).
>
> Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
> Link: https://lore.kernel.org/all/[email protected]
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/time/tick-common.c | 39 +++++++++++++--------------------------
> 1 file changed, 13 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index d88b13076b79..27d0018c8b05 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
> }
> }
>
> -#ifdef CONFIG_NO_HZ_FULL
> -static void giveup_do_timer(void *info)
> -{
> - int cpu = *(unsigned int *)info;
> -
> - WARN_ON(tick_do_timer_cpu != smp_processor_id());
> -
> - tick_do_timer_cpu = cpu;
> -}
> -
> -static void tick_take_do_timer_from_boot(void)
> -{
> - int cpu = smp_processor_id();
> - int from = tick_do_timer_boot_cpu;
> -
> - if (from >= 0 && from != cpu)
> - smp_call_function_single(from, giveup_do_timer, &cpu, 1);
> -}
> -#endif
> -
> /*
> * Setup the tick device
> */
> @@ -221,19 +201,26 @@ static void tick_setup_device(struct tick_device *td,
> tick_next_period = ktime_get();
> #ifdef CONFIG_NO_HZ_FULL
> /*
> - * The boot CPU may be nohz_full, in which case set
> - * tick_do_timer_boot_cpu so the first housekeeping
> - * secondary that comes up will take do_timer from
> - * us.
> + * The boot CPU may be nohz_full, in which case the
> + * first housekeeping secondary will take do_timer()
> + * from us.
> */
> if (tick_nohz_full_cpu(cpu))
> tick_do_timer_boot_cpu = cpu;
>
> } else if (tick_do_timer_boot_cpu != -1 &&
> !tick_nohz_full_cpu(cpu)) {
> - tick_take_do_timer_from_boot();
> tick_do_timer_boot_cpu = -1;
> - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> + /*
> + * The boot CPU will stay in periodic (NOHZ disabled)
> + * mode until clocksource_done_booting() called after
> + * smp_init() selects a high resolution clocksource and
> + * timekeeping_notify() kicks the NOHZ stuff alive.
> + *
> + * So this WRITE_ONCE can only race with the READ_ONCE
> + * check in tick_periodic() but this race is harmless.
> + */
> + WRITE_ONCE(tick_do_timer_cpu, cpu);
> #endif
> }
>
> --
> 2.25.1.362.g51ebf55
>


2024-05-30 12:42:33

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full

We don't need to record the cpu number of the nohz_full boot CPU, a simple
boolean is enough.

We could even kill it along with #ifdef, the "else if" branch could check
tick_nohz_full_cpu(tick_do_timer_cpu) && !tick_nohz_full_cpu(cpu).

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/time/tick-common.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 27d0018c8b05..9829a2cda072 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -49,15 +49,6 @@ ktime_t tick_next_period;
* procedure also covers cpu hotplug.
*/
int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT;
-#ifdef CONFIG_NO_HZ_FULL
-/*
- * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns
- * tick_do_timer_cpu and it should be taken over by an eligible secondary
- * when one comes online.
- */
-static int tick_do_timer_boot_cpu __read_mostly = -1;
-#endif
-
/*
* Debugging: see timer_list.c
*/
@@ -178,9 +169,13 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
}
}

+#ifdef CONFIG_NO_HZ_FULL
/*
- * Setup the tick device
+ * Iindicates that the boot CPU temporarily owns tick_do_timer_cpu.
*/
+static bool boot_cpu_is_nohz_full __read_mostly;
+#endif
+
static void tick_setup_device(struct tick_device *td,
struct clock_event_device *newdev, int cpu,
const struct cpumask *cpumask)
@@ -205,12 +200,9 @@ static void tick_setup_device(struct tick_device *td,
* first housekeeping secondary will take do_timer()
* from us.
*/
- if (tick_nohz_full_cpu(cpu))
- tick_do_timer_boot_cpu = cpu;
-
- } else if (tick_do_timer_boot_cpu != -1 &&
- !tick_nohz_full_cpu(cpu)) {
- tick_do_timer_boot_cpu = -1;
+ boot_cpu_is_nohz_full = tick_nohz_full_cpu(cpu);
+ } else if (boot_cpu_is_nohz_full && !tick_nohz_full_cpu(cpu)) {
+ boot_cpu_is_nohz_full = false;
/*
* The boot CPU will stay in periodic (NOHZ disabled)
* mode until clocksource_done_booting() called after
--
2.25.1.362.g51ebf55



2024-05-30 14:53:00

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

Le Tue, May 28, 2024 at 02:20:19PM +0200, Oleg Nesterov a ?crit :
> After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot
> crash when the boot CPU is nohz_full") the kernel no longer crashes, but
> there is another problem.
>
> In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
> update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
> in smp_call_function_single().
>
> Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
> comment tries to explain why this is safe (thanks Thomas!).
>
> Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
> Link: https://lore.kernel.org/all/[email protected]
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/time/tick-common.c | 39 +++++++++++++--------------------------
> 1 file changed, 13 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index d88b13076b79..27d0018c8b05 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
> }
> }
>
> -#ifdef CONFIG_NO_HZ_FULL
> -static void giveup_do_timer(void *info)
> -{
> - int cpu = *(unsigned int *)info;
> -
> - WARN_ON(tick_do_timer_cpu != smp_processor_id());
> -
> - tick_do_timer_cpu = cpu;
> -}
> -
> -static void tick_take_do_timer_from_boot(void)
> -{
> - int cpu = smp_processor_id();
> - int from = tick_do_timer_boot_cpu;
> -
> - if (from >= 0 && from != cpu)
> - smp_call_function_single(from, giveup_do_timer, &cpu, 1);
> -}
> -#endif
> -
> /*
> * Setup the tick device
> */
> @@ -221,19 +201,26 @@ static void tick_setup_device(struct tick_device *td,
> tick_next_period = ktime_get();
> #ifdef CONFIG_NO_HZ_FULL
> /*
> - * The boot CPU may be nohz_full, in which case set
> - * tick_do_timer_boot_cpu so the first housekeeping
> - * secondary that comes up will take do_timer from
> - * us.
> + * The boot CPU may be nohz_full, in which case the
> + * first housekeeping secondary will take do_timer()
> + * from us.
> */
> if (tick_nohz_full_cpu(cpu))
> tick_do_timer_boot_cpu = cpu;
>
> } else if (tick_do_timer_boot_cpu != -1 &&
> !tick_nohz_full_cpu(cpu)) {
> - tick_take_do_timer_from_boot();
> tick_do_timer_boot_cpu = -1;
> - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> + /*
> + * The boot CPU will stay in periodic (NOHZ disabled)
> + * mode until clocksource_done_booting() called after
> + * smp_init() selects a high resolution clocksource and
> + * timekeeping_notify() kicks the NOHZ stuff alive.
> + *
> + * So this WRITE_ONCE can only race with the READ_ONCE
> + * check in tick_periodic() but this race is harmless.
> + */
> + WRITE_ONCE(tick_do_timer_cpu, cpu);

Looks good, but can we have a WARN_ONCE(tick_do_timer_cpu != tick_do_timer_boot_cpu)
right before that, just to make sure our assumptions above are right forever and
the boot CPU hasn't stopped the tick up to that point?

And after all, pushing a bit further your subsequent patch, can we get rid of
tick_do_timer_boot_cpu and ifdefery altogether? Such as:

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index fb0fdec8719a..63a7bd405de7 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -48,14 +48,6 @@ ktime_t tick_next_period;
* procedure also covers cpu hotplug.
*/
int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT;
-#ifdef CONFIG_NO_HZ_FULL
-/*
- * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns
- * tick_do_timer_cpu and it should be taken over by an eligible secondary
- * when one comes online.
- */
-static int tick_do_timer_boot_cpu __read_mostly = -1;
-#endif

/*
* Debugging: see timer_list.c
@@ -177,26 +169,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
}
}

-#ifdef CONFIG_NO_HZ_FULL
-static void giveup_do_timer(void *info)
-{
- int cpu = *(unsigned int *)info;
-
- WARN_ON(tick_do_timer_cpu != smp_processor_id());
-
- tick_do_timer_cpu = cpu;
-}
-
-static void tick_take_do_timer_from_boot(void)
-{
- int cpu = smp_processor_id();
- int from = tick_do_timer_boot_cpu;
-
- if (from >= 0 && from != cpu)
- smp_call_function_single(from, giveup_do_timer, &cpu, 1);
-}
-#endif
-
/*
* Setup the tick device
*/
@@ -211,29 +183,28 @@ static void tick_setup_device(struct tick_device *td,
* First device setup ?
*/
if (!td->evtdev) {
+ int timekeeper = READ_ONCE(tick_do_timer_cpu);
/*
* If no cpu took the do_timer update, assign it to
* this cpu:
*/
- if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
+ if (timekeeper == TICK_DO_TIMER_BOOT) {
tick_do_timer_cpu = cpu;
tick_next_period = ktime_get();
-#ifdef CONFIG_NO_HZ_FULL
+ } else if (timekeeper == TICK_DO_TIMER_NONE) {
+ if (WARN_ON_ONCE(tick_nohz_full_enabled()))
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
+ } else if (tick_nohz_full_cpu(timekeeper) && !tick_nohz_full_cpu(cpu)) {
/*
- * The boot CPU may be nohz_full, in which case set
- * tick_do_timer_boot_cpu so the first housekeeping
- * secondary that comes up will take do_timer from
- * us.
+ * The boot CPU will stay in periodic (NOHZ disabled)
+ * mode until clocksource_done_booting() called after
+ * smp_init() selects a high resolution clocksource and
+ * timekeeping_notify() kicks the NOHZ stuff alive.
+ *
+ * So this WRITE_ONCE can only race with the READ_ONCE
+ * check in tick_periodic() but this race is harmless.
*/
- if (tick_nohz_full_cpu(cpu))
- tick_do_timer_boot_cpu = cpu;
-
- } else if (tick_do_timer_boot_cpu != -1 &&
- !tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
- tick_do_timer_boot_cpu = -1;
- WARN_ON(tick_do_timer_cpu != cpu);
-#endif
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
}

/*

2024-05-30 16:58:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

Frederic,

Thanks for review.

On 05/30, Frederic Weisbecker wrote:
>
> Looks good, but can we have a WARN_ONCE(tick_do_timer_cpu != tick_do_timer_boot_cpu)
> right before that, just to make sure our assumptions above are right forever and
> the boot CPU hasn't stopped the tick up to that point?

Sure, I thought about the additional sanity checks too. Although I had something
different in mind.

Frederic, et al, I am on private trip again without my working laptop, can't read
the code. I'll reply on Saturday, OK?

Oleg.

>
> And after all, pushing a bit further your subsequent patch, can we get rid of
> tick_do_timer_boot_cpu and ifdefery altogether? Such as:
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index fb0fdec8719a..63a7bd405de7 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -48,14 +48,6 @@ ktime_t tick_next_period;
> * procedure also covers cpu hotplug.
> */
> int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT;
> -#ifdef CONFIG_NO_HZ_FULL
> -/*
> - * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns
> - * tick_do_timer_cpu and it should be taken over by an eligible secondary
> - * when one comes online.
> - */
> -static int tick_do_timer_boot_cpu __read_mostly = -1;
> -#endif
>
> /*
> * Debugging: see timer_list.c
> @@ -177,26 +169,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
> }
> }
>
> -#ifdef CONFIG_NO_HZ_FULL
> -static void giveup_do_timer(void *info)
> -{
> - int cpu = *(unsigned int *)info;
> -
> - WARN_ON(tick_do_timer_cpu != smp_processor_id());
> -
> - tick_do_timer_cpu = cpu;
> -}
> -
> -static void tick_take_do_timer_from_boot(void)
> -{
> - int cpu = smp_processor_id();
> - int from = tick_do_timer_boot_cpu;
> -
> - if (from >= 0 && from != cpu)
> - smp_call_function_single(from, giveup_do_timer, &cpu, 1);
> -}
> -#endif
> -
> /*
> * Setup the tick device
> */
> @@ -211,29 +183,28 @@ static void tick_setup_device(struct tick_device *td,
> * First device setup ?
> */
> if (!td->evtdev) {
> + int timekeeper = READ_ONCE(tick_do_timer_cpu);
> /*
> * If no cpu took the do_timer update, assign it to
> * this cpu:
> */
> - if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> + if (timekeeper == TICK_DO_TIMER_BOOT) {
> tick_do_timer_cpu = cpu;
> tick_next_period = ktime_get();
> -#ifdef CONFIG_NO_HZ_FULL
> + } else if (timekeeper == TICK_DO_TIMER_NONE) {
> + if (WARN_ON_ONCE(tick_nohz_full_enabled()))
> + WRITE_ONCE(tick_do_timer_cpu, cpu);
> + } else if (tick_nohz_full_cpu(timekeeper) && !tick_nohz_full_cpu(cpu)) {
> /*
> - * The boot CPU may be nohz_full, in which case set
> - * tick_do_timer_boot_cpu so the first housekeeping
> - * secondary that comes up will take do_timer from
> - * us.
> + * The boot CPU will stay in periodic (NOHZ disabled)
> + * mode until clocksource_done_booting() called after
> + * smp_init() selects a high resolution clocksource and
> + * timekeeping_notify() kicks the NOHZ stuff alive.
> + *
> + * So this WRITE_ONCE can only race with the READ_ONCE
> + * check in tick_periodic() but this race is harmless.
> */
> - if (tick_nohz_full_cpu(cpu))
> - tick_do_timer_boot_cpu = cpu;
> -
> - } else if (tick_do_timer_boot_cpu != -1 &&
> - !tick_nohz_full_cpu(cpu)) {
> - tick_take_do_timer_from_boot();
> - tick_do_timer_boot_cpu = -1;
> - WARN_ON(tick_do_timer_cpu != cpu);
> -#endif
> + WRITE_ONCE(tick_do_timer_cpu, cpu);
> }
>
> /*
>


2024-05-30 17:01:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

On 05/30, Frederic Weisbecker wrote:
>
> Looks good, but can we have a WARN_ONCE(tick_do_timer_cpu != tick_do_timer_boot_cpu)
> right before that, just to make sure our assumptions above are right forever and
> the boot CPU hasn't stopped the tick up to that point?

Sure, I thought about the additional sanity checks too. Although I had something
different in mind.

Frederic, et al, I am on private trip again without my working laptop, can't read
the code. I'll reply on Saturday, OK?

Thanks for review!

Oleg.


2024-06-01 14:05:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

Hi Frederic,

First of all, can we please make the additional changes you suggest on top of
this patch? I'd prefer to keep it as simple as possible, I will need to backport
it and I'd like to simplify the internal review.

On 05/30, Frederic Weisbecker wrote:
>
> And after all, pushing a bit further your subsequent patch, can we get rid of
> tick_do_timer_boot_cpu and ifdefery altogether? Such as:

Sure, I thought about this from the very beginning, see
https://lore.kernel.org/all/[email protected]/
and the changelog in
[PATCH] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full
https://lore.kernel.org/all/[email protected]/
on top of this patch.

And yes, in this case it is better to check that tick_do_timer_cpu != _NONE to
ensure that tick_nohz_full_cpu(tick_cpu) can't crash.

So I considered the change which is very close to yours, except

> + } else if (timekeeper == TICK_DO_TIMER_NONE) {
> + if (WARN_ON_ONCE(tick_nohz_full_enabled()))
> + WRITE_ONCE(tick_do_timer_cpu, cpu);

I don't think we need to change tick_do_timer_cpu in this case.
And I am not sure we need to check tick_nohz_full_enabled() here.
IOW, I was thinking about

if (!td->evtdev) {
int tick_cpu = READ_ONCE(tick_do_timer_cpu);
/*
* If no cpu took the do_timer update, assign it to
* this cpu:
*/
if (tick_cpu == TICK_DO_TIMER_BOOT) {
WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_next_period = ktime_get();
/*
* The boot CPU may be nohz_full, in which case the
* first housekeeping secondary will take do_timer()
* from us.
*/
} else if (!WARN_ON_ONCE(tick_cpu == TICK_DO_TIMER_NONE)) &&
tick_nohz_full_cpu(tick_cpu) &&
!tick_nohz_full_cpu(cpu)) {
/*
* The boot CPU will stay in periodic (NOHZ disabled)
* mode until clocksource_done_booting() called after
* smp_init() selects a high resolution clocksource and
* timekeeping_notify() kicks the NOHZ stuff alive.
*
* So this WRITE_ONCE can only race with the READ_ONCE
* check in tick_periodic() but this race is harmless.
*/
WRITE_ONCE(tick_do_timer_cpu, cpu);
}

But you know, somehow I like
[PATCH] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full
https://lore.kernel.org/all/[email protected]/
a bit more, to me the code looks more understandable this way.

Note that this patch doesn't really need to keep #ifdef CONFIG_NO_HZ_FULL,

if (!td->evtdev) {
static bool boot_cpu_is_nohz_full;
/*
* If no cpu took the do_timer update, assign it to
* this cpu:
*/
if (READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_next_period = ktime_get();
/*
* The boot CPU may be nohz_full, in which case the
* first housekeeping secondary will take do_timer()
* from us.
*/
boot_cpu_is_nohz_full = tick_nohz_full_cpu(cpu);
} else if (boot_cpu_is_nohz_full && !tick_nohz_full_cpu(cpu)) {
boot_cpu_is_nohz_full = false;
/*
* The boot CPU will stay in periodic (NOHZ disabled)
* mode until clocksource_done_booting() called after
* smp_init() selects a high resolution clocksource and
* timekeeping_notify() kicks the NOHZ stuff alive.
*
* So this WRITE_ONCE can only race with the READ_ONCE
* check in tick_periodic() but this race is harmless.
*/
WRITE_ONCE(tick_do_timer_cpu, cpu);
}

should work without #ifdef.

In this case I don't think we need the _NONE check, tick_sched_do_timer() will
complain.

But I won't argue. I will be happy to make V2 which follows your recommendations
but again, can I do this on top of this patch?

Oleg.


2024-06-02 21:29:25

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

Le Sat, Jun 01, 2024 at 04:03:22PM +0200, Oleg Nesterov a ?crit :
> Hi Frederic,
>
> First of all, can we please make the additional changes you suggest on top of
> this patch? I'd prefer to keep it as simple as possible, I will need to backport
> it and I'd like to simplify the internal review.

Sure!

>
> On 05/30, Frederic Weisbecker wrote:
> >
> > And after all, pushing a bit further your subsequent patch, can we get rid of
> > tick_do_timer_boot_cpu and ifdefery altogether? Such as:
>
> Sure, I thought about this from the very beginning, see
> https://lore.kernel.org/all/[email protected]/
> and the changelog in
> [PATCH] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full
> https://lore.kernel.org/all/[email protected]/
> on top of this patch.
>
> And yes, in this case it is better to check that tick_do_timer_cpu != _NONE to
> ensure that tick_nohz_full_cpu(tick_cpu) can't crash.
>
> So I considered the change which is very close to yours, except
>
> > + } else if (timekeeper == TICK_DO_TIMER_NONE) {
> > + if (WARN_ON_ONCE(tick_nohz_full_enabled()))
> > + WRITE_ONCE(tick_do_timer_cpu, cpu);
>
> I don't think we need to change tick_do_timer_cpu in this case.
> And I am not sure we need to check tick_nohz_full_enabled() here.
> IOW, I was thinking about

Hmm, in case of cpu-hotplug operations (that is after boot), we may be
past nohz enablement and therefore it might be TICK_DO_TIMER_NONE.

>
> if (!td->evtdev) {
> int tick_cpu = READ_ONCE(tick_do_timer_cpu);
> /*
> * If no cpu took the do_timer update, assign it to
> * this cpu:
> */
> if (tick_cpu == TICK_DO_TIMER_BOOT) {
> WRITE_ONCE(tick_do_timer_cpu, cpu);
> tick_next_period = ktime_get();
> /*
> * The boot CPU may be nohz_full, in which case the
> * first housekeeping secondary will take do_timer()
> * from us.
> */
> } else if (!WARN_ON_ONCE(tick_cpu == TICK_DO_TIMER_NONE)) &&
> tick_nohz_full_cpu(tick_cpu) &&
> !tick_nohz_full_cpu(cpu)) {
> /*
> * The boot CPU will stay in periodic (NOHZ disabled)
> * mode until clocksource_done_booting() called after
> * smp_init() selects a high resolution clocksource and
> * timekeeping_notify() kicks the NOHZ stuff alive.
> *
> * So this WRITE_ONCE can only race with the READ_ONCE
> * check in tick_periodic() but this race is harmless.
> */
> WRITE_ONCE(tick_do_timer_cpu, cpu);
> }
>
> But you know, somehow I like
> [PATCH] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full
> https://lore.kernel.org/all/[email protected]/
> a bit more, to me the code looks more understandable this way.
>
> Note that this patch doesn't really need to keep #ifdef CONFIG_NO_HZ_FULL,
>
> if (!td->evtdev) {
> static bool boot_cpu_is_nohz_full;
> /*
> * If no cpu took the do_timer update, assign it to
> * this cpu:
> */
> if (READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
> WRITE_ONCE(tick_do_timer_cpu, cpu);
> tick_next_period = ktime_get();
> /*
> * The boot CPU may be nohz_full, in which case the
> * first housekeeping secondary will take do_timer()
> * from us.
> */
> boot_cpu_is_nohz_full = tick_nohz_full_cpu(cpu);
> } else if (boot_cpu_is_nohz_full && !tick_nohz_full_cpu(cpu)) {
> boot_cpu_is_nohz_full = false;
> /*
> * The boot CPU will stay in periodic (NOHZ disabled)
> * mode until clocksource_done_booting() called after
> * smp_init() selects a high resolution clocksource and
> * timekeeping_notify() kicks the NOHZ stuff alive.
> *
> * So this WRITE_ONCE can only race with the READ_ONCE
> * check in tick_periodic() but this race is harmless.
> */
> WRITE_ONCE(tick_do_timer_cpu, cpu);
> }
>
> should work without #ifdef.
>
> In this case I don't think we need the _NONE check, tick_sched_do_timer() will
> complain.

Right...

>
> But I won't argue. I will be happy to make V2 which follows your recommendations
> but again, can I do this on top of this patch?

I guess the static version above should work to remove the ifdef. And yes on top is fine.

Thanks!

2024-06-03 15:43:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

On 06/02, Frederic Weisbecker wrote:
>
> I guess the static version above should work to remove the ifdef. And yes on top is fine.

OK, I've sent v2.

But again, I won't argue if you prefer to keep tick_do_timer_boot_cpu and add
WARN_ON_ONCE(tick_cpu != tick_do_timer_boot_cpu) before WRITE_ONCE(tick_do_timer_cpu).
In this case another patch makes no sense, I'll update this one.

Just tell me what you like more. Sorry for the chaotic emails.

Oleg.


2024-06-03 15:46:46

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full

We don't need to record the cpu number of the nohz_full boot CPU, a simple
boolean is enough. We could even kill it, the "else if" branch could check
tick_nohz_full_cpu(tick_do_timer_cpu) && !tick_nohz_full_cpu(cpu).

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/time/tick-common.c | 21 ++++-----------------
1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 27d0018c8b05..82c1b2206631 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -49,15 +49,6 @@ ktime_t tick_next_period;
* procedure also covers cpu hotplug.
*/
int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT;
-#ifdef CONFIG_NO_HZ_FULL
-/*
- * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns
- * tick_do_timer_cpu and it should be taken over by an eligible secondary
- * when one comes online.
- */
-static int tick_do_timer_boot_cpu __read_mostly = -1;
-#endif
-
/*
* Debugging: see timer_list.c
*/
@@ -192,6 +183,7 @@ static void tick_setup_device(struct tick_device *td,
* First device setup ?
*/
if (!td->evtdev) {
+ static bool boot_cpu_is_nohz_full __read_mostly;
/*
* If no cpu took the do_timer update, assign it to
* this cpu:
@@ -199,18 +191,14 @@ static void tick_setup_device(struct tick_device *td,
if (READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_next_period = ktime_get();
-#ifdef CONFIG_NO_HZ_FULL
/*
* The boot CPU may be nohz_full, in which case the
* first housekeeping secondary will take do_timer()
* from us.
*/
- if (tick_nohz_full_cpu(cpu))
- tick_do_timer_boot_cpu = cpu;
-
- } else if (tick_do_timer_boot_cpu != -1 &&
- !tick_nohz_full_cpu(cpu)) {
- tick_do_timer_boot_cpu = -1;
+ boot_cpu_is_nohz_full = tick_nohz_full_cpu(cpu);
+ } else if (boot_cpu_is_nohz_full && !tick_nohz_full_cpu(cpu)) {
+ boot_cpu_is_nohz_full = false;
/*
* The boot CPU will stay in periodic (NOHZ disabled)
* mode until clocksource_done_booting() called after
@@ -221,7 +209,6 @@ static void tick_setup_device(struct tick_device *td,
* check in tick_periodic() but this race is harmless.
*/
WRITE_ONCE(tick_do_timer_cpu, cpu);
-#endif
}

/*
--
2.25.1.362.g51ebf55



2024-06-03 21:50:33

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full

Le Mon, Jun 03, 2024 at 05:35:57PM +0200, Oleg Nesterov a ?crit :
> We don't need to record the cpu number of the nohz_full boot CPU, a simple
> boolean is enough. We could even kill it, the "else if" branch could check
> tick_nohz_full_cpu(tick_do_timer_cpu) && !tick_nohz_full_cpu(cpu).
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Frederic Weisbecker <[email protected]>

2024-06-03 21:51:01

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

Le Mon, Jun 03, 2024 at 05:41:33PM +0200, Oleg Nesterov a ?crit :
> On 06/02, Frederic Weisbecker wrote:
> >
> > I guess the static version above should work to remove the ifdef. And yes on top is fine.
>
> OK, I've sent v2.
>
> But again, I won't argue if you prefer to keep tick_do_timer_boot_cpu and add
> WARN_ON_ONCE(tick_cpu != tick_do_timer_boot_cpu) before WRITE_ONCE(tick_do_timer_cpu).
> In this case another patch makes no sense, I'll update this one.
>
> Just tell me what you like more. Sorry for the chaotic emails.

I'm fine with the last one posted :-)

Thanks.

2024-06-04 05:16:21

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full

On Tue Jun 4, 2024 at 1:35 AM AEST, Oleg Nesterov wrote:
> We don't need to record the cpu number of the nohz_full boot CPU, a simple
> boolean is enough. We could even kill it, the "else if" branch could check
> tick_nohz_full_cpu(tick_do_timer_cpu) && !tick_nohz_full_cpu(cpu).
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Reviewed-by: Nicholas Piggin <[email protected]>

> ---
> kernel/time/tick-common.c | 21 ++++-----------------
> 1 file changed, 4 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 27d0018c8b05..82c1b2206631 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -49,15 +49,6 @@ ktime_t tick_next_period;
> * procedure also covers cpu hotplug.
> */
> int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT;
> -#ifdef CONFIG_NO_HZ_FULL
> -/*
> - * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns
> - * tick_do_timer_cpu and it should be taken over by an eligible secondary
> - * when one comes online.
> - */
> -static int tick_do_timer_boot_cpu __read_mostly = -1;
> -#endif
> -
> /*
> * Debugging: see timer_list.c
> */
> @@ -192,6 +183,7 @@ static void tick_setup_device(struct tick_device *td,
> * First device setup ?
> */
> if (!td->evtdev) {
> + static bool boot_cpu_is_nohz_full __read_mostly;
> /*
> * If no cpu took the do_timer update, assign it to
> * this cpu:
> @@ -199,18 +191,14 @@ static void tick_setup_device(struct tick_device *td,
> if (READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
> WRITE_ONCE(tick_do_timer_cpu, cpu);
> tick_next_period = ktime_get();
> -#ifdef CONFIG_NO_HZ_FULL
> /*
> * The boot CPU may be nohz_full, in which case the
> * first housekeeping secondary will take do_timer()
> * from us.
> */
> - if (tick_nohz_full_cpu(cpu))
> - tick_do_timer_boot_cpu = cpu;
> -
> - } else if (tick_do_timer_boot_cpu != -1 &&
> - !tick_nohz_full_cpu(cpu)) {
> - tick_do_timer_boot_cpu = -1;
> + boot_cpu_is_nohz_full = tick_nohz_full_cpu(cpu);
> + } else if (boot_cpu_is_nohz_full && !tick_nohz_full_cpu(cpu)) {
> + boot_cpu_is_nohz_full = false;
> /*
> * The boot CPU will stay in periodic (NOHZ disabled)
> * mode until clocksource_done_booting() called after
> @@ -221,7 +209,6 @@ static void tick_setup_device(struct tick_device *td,
> * check in tick_periodic() but this race is harmless.
> */
> WRITE_ONCE(tick_do_timer_cpu, cpu);
> -#endif
> }
>
> /*


2024-06-10 16:32:13

by Oleg Nesterov

[permalink] [raw]
Subject: [PING ;)] Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

Thomas, et al,

should I redo and/or resend this patch (and the next one on top of it) ?

Or what else can I do to fix this problem?

Thanks,

Oleg.

On 05/28, Oleg Nesterov wrote:
>
> After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot
> crash when the boot CPU is nohz_full") the kernel no longer crashes, but
> there is another problem.
>
> In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
> update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
> in smp_call_function_single().
>
> Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
> comment tries to explain why this is safe (thanks Thomas!).
>
> Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
> Link: https://lore.kernel.org/all/[email protected]
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/time/tick-common.c | 39 +++++++++++++--------------------------
> 1 file changed, 13 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index d88b13076b79..27d0018c8b05 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
> }
> }
>
> -#ifdef CONFIG_NO_HZ_FULL
> -static void giveup_do_timer(void *info)
> -{
> - int cpu = *(unsigned int *)info;
> -
> - WARN_ON(tick_do_timer_cpu != smp_processor_id());
> -
> - tick_do_timer_cpu = cpu;
> -}
> -
> -static void tick_take_do_timer_from_boot(void)
> -{
> - int cpu = smp_processor_id();
> - int from = tick_do_timer_boot_cpu;
> -
> - if (from >= 0 && from != cpu)
> - smp_call_function_single(from, giveup_do_timer, &cpu, 1);
> -}
> -#endif
> -
> /*
> * Setup the tick device
> */
> @@ -221,19 +201,26 @@ static void tick_setup_device(struct tick_device *td,
> tick_next_period = ktime_get();
> #ifdef CONFIG_NO_HZ_FULL
> /*
> - * The boot CPU may be nohz_full, in which case set
> - * tick_do_timer_boot_cpu so the first housekeeping
> - * secondary that comes up will take do_timer from
> - * us.
> + * The boot CPU may be nohz_full, in which case the
> + * first housekeeping secondary will take do_timer()
> + * from us.
> */
> if (tick_nohz_full_cpu(cpu))
> tick_do_timer_boot_cpu = cpu;
>
> } else if (tick_do_timer_boot_cpu != -1 &&
> !tick_nohz_full_cpu(cpu)) {
> - tick_take_do_timer_from_boot();
> tick_do_timer_boot_cpu = -1;
> - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
> + /*
> + * The boot CPU will stay in periodic (NOHZ disabled)
> + * mode until clocksource_done_booting() called after
> + * smp_init() selects a high resolution clocksource and
> + * timekeeping_notify() kicks the NOHZ stuff alive.
> + *
> + * So this WRITE_ONCE can only race with the READ_ONCE
> + * check in tick_periodic() but this race is harmless.
> + */
> + WRITE_ONCE(tick_do_timer_cpu, cpu);
> #endif
> }
>
> --
> 2.25.1.362.g51ebf55
>


2024-06-10 18:15:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PING ;)] Re: [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device()

On Mon, Jun 10 2024 at 17:55, Oleg Nesterov wrote:
> should I redo and/or resend this patch (and the next one on top of it) ?

I lost track of the maze in this discussion. As the patch is obviously
correct, I just apply it as is and any cosmetic changes can go on top.

Thanks,

tglx

Subject: [tip: timers/urgent] tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID: 07c54cc5988f19c9642fd463c2dbdac7fc52f777
Gitweb: https://git.kernel.org/tip/07c54cc5988f19c9642fd463c2dbdac7fc52f777
Author: Oleg Nesterov <[email protected]>
AuthorDate: Tue, 28 May 2024 14:20:19 +02:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Mon, 10 Jun 2024 20:18:13 +02:00

tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()

After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash
when the boot CPU is nohz_full") the kernel no longer crashes, but there is
another problem.

In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
in smp_call_function_single().

Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
comment explains why this is safe (thanks Thomas!).

Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]
---
kernel/time/tick-common.c | 42 ++++++++++++--------------------------
1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b130..a47bcf7 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
}
}

-#ifdef CONFIG_NO_HZ_FULL
-static void giveup_do_timer(void *info)
-{
- int cpu = *(unsigned int *)info;
-
- WARN_ON(tick_do_timer_cpu != smp_processor_id());
-
- tick_do_timer_cpu = cpu;
-}
-
-static void tick_take_do_timer_from_boot(void)
-{
- int cpu = smp_processor_id();
- int from = tick_do_timer_boot_cpu;
-
- if (from >= 0 && from != cpu)
- smp_call_function_single(from, giveup_do_timer, &cpu, 1);
-}
-#endif
-
/*
* Setup the tick device
*/
@@ -221,19 +201,25 @@ static void tick_setup_device(struct tick_device *td,
tick_next_period = ktime_get();
#ifdef CONFIG_NO_HZ_FULL
/*
- * The boot CPU may be nohz_full, in which case set
- * tick_do_timer_boot_cpu so the first housekeeping
- * secondary that comes up will take do_timer from
- * us.
+ * The boot CPU may be nohz_full, in which case the
+ * first housekeeping secondary will take do_timer()
+ * from it.
*/
if (tick_nohz_full_cpu(cpu))
tick_do_timer_boot_cpu = cpu;

- } else if (tick_do_timer_boot_cpu != -1 &&
- !tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
+ } else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) {
tick_do_timer_boot_cpu = -1;
- WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
+ /*
+ * The boot CPU will stay in periodic (NOHZ disabled)
+ * mode until clocksource_done_booting() called after
+ * smp_init() selects a high resolution clocksource and
+ * timekeeping_notify() kicks the NOHZ stuff alive.
+ *
+ * So this WRITE_ONCE can only race with the READ_ONCE
+ * check in tick_periodic() but this race is harmless.
+ */
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
#endif
}


2024-06-10 19:42:20

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [tip: timers/urgent] tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()

Le Mon, Jun 10, 2024 at 06:26:21PM -0000, tip-bot2 for Oleg Nesterov a ?crit :
> The following commit has been merged into the timers/urgent branch of tip:
>
> Commit-ID: 07c54cc5988f19c9642fd463c2dbdac7fc52f777
> Gitweb: https://git.kernel.org/tip/07c54cc5988f19c9642fd463c2dbdac7fc52f777
> Author: Oleg Nesterov <[email protected]>
> AuthorDate: Tue, 28 May 2024 14:20:19 +02:00
> Committer: Thomas Gleixner <[email protected]>
> CommitterDate: Mon, 10 Jun 2024 20:18:13 +02:00
>
> tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
>
> After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash
> when the boot CPU is nohz_full") the kernel no longer crashes, but there is
> another problem.
>
> In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
> update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
> in smp_call_function_single().
>
> Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
> comment explains why this is safe (thanks Thomas!).
>
> Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
> Signed-off-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Link: https://lore.kernel.org/r/[email protected]
> Link: https://lore.kernel.org/all/[email protected]

I think we agreed on that version actually:

https://lore.kernel.org/all/[email protected]/

Thanks.