When nohz_full CPU stops tick in tick_nohz_irq_exit(),
It wouldn't be chosen to perform idle load balancing bacause it doesn't
call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
becomes idle.
tick_nohz_idle_stop_tick() is only called in idle state and
nohz_balance_enter_idle() tracks the CPU is part of nohz.idle_cpus_mask
with rq->nohz_tick_stopped.
So, nohz_balance_enter_idle() could be called safely without !was_stooped
check.
Signed-off-by: Levi Yun <[email protected]>
---
kernel/time/tick-sched.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 71a792cd8936..31a4cd89782f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
ts->idle_sleeps++;
ts->idle_expires = expires;
- if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
- ts->idle_jiffies = ts->last_jiffies;
+ if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+ if (!was_stopped)
+ ts->idle_jiffies = ts->last_jiffies;
+
nohz_balance_enter_idle(cpu);
}
} else {
--
2.41.0
> When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> It wouldn't be chosen to perform idle load balancing bacause it doesn't
…
> So, nohz_balance_enter_idle() could be called safely without !was_stooped
> check.
* Please avoid typos in this change description.
* How do you think about to add the tag “Fixes”?
* Would you like to use imperative wordings for an improved changelog?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.9-rc7#n94
Regards,
Markus
When nohz_full CPU stops tick in tick_nohz_irq_exit(),
It wouldn't be chosen to perform idle load balancing bacause it doesn't
call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
becomes idle.
tick_nohz_idle_stop_tick() is only called in idle state and
nohz_balance_enter_idle() tracks the CPU is part of nohz.idle_cpus_mask
with rq->nohz_tick_stopped.
So, nohz_balance_enter_idle() could be called safely without !was_stooped
check.
Fixes: 0e7767687fda ("time: tick-sched: Reorganize idle tick management code")
Signed-off-by: Levi Yun <[email protected]>
---
kernel/time/tick-sched.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 71a792cd8936..31a4cd89782f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
ts->idle_sleeps++;
ts->idle_expires = expires;
- if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
- ts->idle_jiffies = ts->last_jiffies;
+ if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+ if (!was_stopped)
+ ts->idle_jiffies = ts->last_jiffies;
+
nohz_balance_enter_idle(cpu);
}
} else {
--
2.41.0
> When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> It wouldn't be chosen to perform idle load balancing bacause it doesn't
…
> So, nohz_balance_enter_idle() could be called safely without !was_stooped
> check.
* Why did you repeat typos in this patch iteration?
* Would you like to take the relevance of imperative wordings for changelogs
better into account?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.9-rc7#n94
> ---
> kernel/time/tick-sched.c | 6 ++++--
Will further patch version descriptions become helpful behind the marker line?
Regards,
Markus
Hi Markus.
> * Why did you repeat typos in this patch iteration?
> * Would you like to take the relevance of imperative wordings for changelogs
better into account?
Sorry for my mistake, I've fixed it but, I sent an old one :(
Thanks
On Wed, May 8, 2024 at 7:38 PM Markus Elfring <[email protected]> wrote:
>
> > When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> > It wouldn't be chosen to perform idle load balancing bacause it doesn't
> …
> > So, nohz_balance_enter_idle() could be called safely without !was_stooped
> > check.
>
> * Why did you repeat typos in this patch iteration?
>
> * Would you like to take the relevance of imperative wordings for changelogs
> better into account?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.9-rc7#n94
>
>
> > ---
> > kernel/time/tick-sched.c | 6 ++++--
>
> Will further patch version descriptions become helpful behind the marker line?
>
> Regards,
> Markus
When nohz_full CPU stops tick in tick_nohz_irq_exit(),
It wouldn't be chosen to perform idle load balancing because it doesn't
call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
becomes idle.
tick_nohz_idle_stop_tick() is only called in idle state and
nohz_balance_enter_idle() tracks the CPU which is part of nohz.idle_cpus_mask
with rq->nohz_tick_stopped.
Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
without checking !was_stopped so that nohz_full cpu can be chosen to
perform idle load balancing when it enters idle state.
Signed-off-by: Levi Yun <[email protected]>
---
v3:
- Rewording commit message.
v2:
- Fix typos in commit message.
kernel/time/tick-sched.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 71a792cd8936..31a4cd89782f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
ts->idle_sleeps++;
ts->idle_expires = expires;
- if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
- ts->idle_jiffies = ts->last_jiffies;
+ if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+ if (!was_stopped)
+ ts->idle_jiffies = ts->last_jiffies;
+
nohz_balance_enter_idle(cpu);
}
} else {
--
2.41.0
> When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> It wouldn't be chosen to perform idle load balancing because it doesn't
> call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
> becomes idle.
>
> tick_nohz_idle_stop_tick() is only called in idle state and
> nohz_balance_enter_idle() tracks the CPU which is part of nohz.idle_cpus_mask
> with rq->nohz_tick_stopped.
>
> Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
> without checking !was_stopped so that nohz_full cpu can be chosen to
> perform idle load balancing when it enters idle state.
Would you eventually like to add the tag “Fixes” once more?
…
> +++ b/kernel/time/tick-sched.c
> @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> ts->idle_sleeps++;
> ts->idle_expires = expires;
>
> - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> - ts->idle_jiffies = ts->last_jiffies;
> + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> + if (!was_stopped)
> + ts->idle_jiffies = ts->last_jiffies;
> +
> nohz_balance_enter_idle(cpu);
> }
…
I interpret these diff data in the way that you propose to reorder
two condition checks.
But I wonder still how “good” the presented change description fits to
the suggested source code adjustment.
Regards,
Markus
> Would you eventually like to add the tag “Fixes” once more?
Sorry. I forgot :(
> > +++ b/kernel/time/tick-sched.c
> > @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> > ts->idle_sleeps++;
> > ts->idle_expires = expires;
> >
> > - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> > - ts->idle_jiffies = ts->last_jiffies;
> > + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> > + if (!was_stopped)
> > + ts->idle_jiffies = ts->last_jiffies;
> > +
> > nohz_balance_enter_idle(cpu);
> > }
> …
>
> I interpret these diff data in the way that you propose to reorder
> two condition checks.
>
> But I wonder still how “good” the presented change description fits to
> the suggested source code adjustment.
FWIW it doesn't need to check !was_stopped to
call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick().
Formerly, __tick_nohz_idle_enter() is called in both
tick_nohz_irq_exit() and in do_idle().
That's why it's required for nohz full cpu which already stop the
tick, but not idle
to prevent enter idle balance.
(but it makes nohz full cpu enter nohz idle balance as side effect i think?)
but after some reorganizing code tick_nohz_idle_stop_tick() becomes
the code called in
only when enter idle.
What I point is that it doesn't need to check !was_stopped to call
nohz_balance_enter_idle()
in tick_nohz_idle_stop_tick().
So, I think it's enough in commit message?
Am I wrong?
Thanks.
>>> +++ b/kernel/time/tick-sched.c
>>> @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
>>> ts->idle_sleeps++;
>>> ts->idle_expires = expires;
>>>
>>> - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
>>> - ts->idle_jiffies = ts->last_jiffies;
>>> + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
>>> + if (!was_stopped)
>>> + ts->idle_jiffies = ts->last_jiffies;
>>> +
>>> nohz_balance_enter_idle(cpu);
>>> }
…
> So, I think it's enough in commit message?
…
We are trying to clarify special implementation details here.
Our corresponding wording preferences are probably different.
I hope that a better common understanding can be achieved also for
another transformation.
* Thus I became curious how you got interested to adjust this software
component further.
* Will any other data representation become more helpful for the circumstances
according to calls of a function like “tick_nohz_idle_stop_tick”?
* How do you think about to stress condition ordering concerns around
the system configuration “nohz_full”?
* How will related changelogs evolve further?
Regards,
Markus
Hi Markus.
.
> * Will any other data representation become more helpful for the circumstances
> according to calls of a function like “tick_nohz_idle_stop_tick”?
Maybe not in this commit..?
> * How do you think about stress condition ordering concerns around
> the system configuration “nohz_full”?
Well.. regardless of the stress condition, it wants to fix the
inconsistent behavior
happening when enter "idle state"
Let's think about two cases:
1. nohz_full cpu stop tick in tick_nohz_irq_exiit() while it runs
only 1 cfs task.
2. nohz_full cpu which doesn't stop the tick and switches to idle task.
Without this commit, case (1) wouldn't participate in idle balance
when it switches to idle task while its tick is already stopped.
case (2) although nohz_full cpu participates in idle balcancing
because former clock isn't stopped yet.
> * How will related changelogs evolve further?
>
Thanks for the suggestion. But I'll add some more background commit
message then.
Thanks again :)
When nohz_full CPU stops tick in tick_nohz_irq_exit(),
It wouldn't be chosen to perform idle load balancing because it doesn't
call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
becomes idle.
Formerly, __tick_nohz_idle_enter() is called in both
tick_nohz_irq_exit() and in do_idle().
That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
to the idle path") prevents nohz_full cpu which isn't yet
idle state but tick is stopped from entering idle balance.
However, this prevents nohz_full cpu which already stops tick from
entering idle balacne when this cpu really becomes idle state.
Currently, tick_nohz_idle_stop_tick() is only called in idle state and
it calls nohz_balance_enter_idle(). this function tracks the CPU
which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
without checking !was_stopped so that nohz_full cpu can be chosen to
perform idle load balancing when it enters idle state.
Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
Signed-off-by: Levi Yun <[email protected]>
---
v4:
- Add fixes tags.
v3:
- Rewording commit message.
v2:
- Fix typos in commit message.
kernel/time/tick-sched.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 71a792cd8936..31a4cd89782f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
ts->idle_sleeps++;
ts->idle_expires = expires;
- if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
- ts->idle_jiffies = ts->last_jiffies;
+ if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+ if (!was_stopped)
+ ts->idle_jiffies = ts->last_jiffies;
+
nohz_balance_enter_idle(cpu);
}
} else {
--
2.41.0
>> * How do you think about stress condition ordering concerns around
>> the system configuration “nohz_full”?
>
> Well.. regardless of the stress condition, it wants to fix the
> inconsistent behavior
> happening when enter "idle state"
Were any special test cases or related analysis tools involved
in the discovery of improvable software behaviour?
Regards,
Markus
…
> Currently, tick_nohz_idle_stop_tick() is only called in idle state and
> it calls nohz_balance_enter_idle(). this function tracks the CPU
> which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
…
* How did you notice improvement possibilities for such data processing?
* I hope that a few remaining wording “weaknesses” can be adjusted accordingly.
Regards,
Markus
Markus, kernel-janitors doesn't need to be a part of these discussions.
Could you please stop adding us to the CC list?
regards,
dan carpenter
Gentle ping..?
Am I missing something?
Thanks.
On Thu, May 9, 2024 at 10:29 AM Levi Yun <[email protected]> wrote:
>
> When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> It wouldn't be chosen to perform idle load balancing because it doesn't
> call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
> becomes idle.
>
> Formerly, __tick_nohz_idle_enter() is called in both
> tick_nohz_irq_exit() and in do_idle().
> That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
> to the idle path") prevents nohz_full cpu which isn't yet
> idle state but tick is stopped from entering idle balance.
>
> However, this prevents nohz_full cpu which already stops tick from
> entering idle balacne when this cpu really becomes idle state.
>
> Currently, tick_nohz_idle_stop_tick() is only called in idle state and
> it calls nohz_balance_enter_idle(). this function tracks the CPU
> which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
>
> Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
> without checking !was_stopped so that nohz_full cpu can be chosen to
> perform idle load balancing when it enters idle state.
>
> Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
> Signed-off-by: Levi Yun <[email protected]>
> ---
> v4:
> - Add fixes tags.
>
> v3:
> - Rewording commit message.
>
> v2:
> - Fix typos in commit message.
>
> kernel/time/tick-sched.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 71a792cd8936..31a4cd89782f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> ts->idle_sleeps++;
> ts->idle_expires = expires;
>
> - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> - ts->idle_jiffies = ts->last_jiffies;
> + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> + if (!was_stopped)
> + ts->idle_jiffies = ts->last_jiffies;
> +
> nohz_balance_enter_idle(cpu);
> }
> } else {
> --
> 2.41.0
Le Thu, May 09, 2024 at 10:29:32AM +0100, Levi Yun a ?crit :
> When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> It wouldn't be chosen to perform idle load balancing because it doesn't
> call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
> becomes idle.
>
> Formerly, __tick_nohz_idle_enter() is called in both
> tick_nohz_irq_exit() and in do_idle().
> That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
> to the idle path") prevents nohz_full cpu which isn't yet
> idle state but tick is stopped from entering idle balance.
>
> However, this prevents nohz_full cpu which already stops tick from
> entering idle balacne when this cpu really becomes idle state.
>
> Currently, tick_nohz_idle_stop_tick() is only called in idle state and
> it calls nohz_balance_enter_idle(). this function tracks the CPU
> which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
>
> Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
> without checking !was_stopped so that nohz_full cpu can be chosen to
> perform idle load balancing when it enters idle state.
>
> Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
> Signed-off-by: Levi Yun <[email protected]>
> ---
> v4:
> - Add fixes tags.
>
> v3:
> - Rewording commit message.
>
> v2:
> - Fix typos in commit message.
>
> kernel/time/tick-sched.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 71a792cd8936..31a4cd89782f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> ts->idle_sleeps++;
> ts->idle_expires = expires;
>
> - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> - ts->idle_jiffies = ts->last_jiffies;
> + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> + if (!was_stopped)
> + ts->idle_jiffies = ts->last_jiffies;
> +
I've taken some time to respond because your patch has raised more questions
while discussing this with Anna-Maria:
1) Is Idle load balancing actually relevant for nohz_full? HK_TYPE_MISC already
prevent those CPUs from becoming idle load balancer. They can still be
targets for load balancing but nohz_full CPUs are supposed to run only one
task.
2) This is related to previous point: HK_TYPE_SCHED is never activated. It would
prevent the CPU from even beeing part of idle load balancing. Should we
remove it or plug it?
3) nohz_balance_enter_idle() is called when the tick is stopped for the first
time and nohz_balance_exit_idle() is called from the tick. But that also
applies to idle ticks. So if the load balancing triggers while the tick is
stopped, nohz_balance_enter_idle() won't be re-called in the idle loop even
though the tick is stopped (that would be fixed with your patch).
4) Why is nohz_balance_exit_idle() called from the tick and not from the idle
exit path? Is it to avoid overhead?
I'm adding some scheduler people in Cc who might help answer some of those
questions.
Thanks.
> nohz_balance_enter_idle(cpu);
> }
> } else {
> --
> 2.41.0
HI. Frederic!
Thanks for your reply.
> 1) Is Idle load balancing actually relevant for nohz_full? HK_TYPE_MISC already
> prevent those CPUs from becoming idle load balancer. They can still be
> targets for load balancing but nohz_full CPUs are supposed to run only one
> task.
I'm not sure nohz_full cpu is really relevant with Idle load balancing.
However, I couldn't find any reason it shouldn't participate Idle load
balancing.
And about HK_TYPE_MISC, this is half true when it sets with
"nohz_full=" options,
if nohz_full cpu is set via "isolcpus=nohz," it's still in HK_TYPE_MISC mask
so find_ilb() doesn't prevent entering idle load balance.
> 2) This is related to previous point: HK_TYPE_SCHED is never activated. It would
> prevent the CPU from even beeing part of idle load balancing. Should we
> remove it or plug it?
IMHO, If we want to prevent nohz_full cpu entering, I should set with
HK_TYPE_TICK | HK_TYPE_SCHED together.
Because, find_new_ilb()'s comment say
" HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED is
not set anywhere yet."
or integrate with HK_TYPE_TICK....? removing HK_TYPE_SCHED in the hk list...
It depends on ilb relevant for nohz_full...
If it is, Idle load balance could be controlled with HK_TYPE_SCHED.
If it doesn't ,IMHO, It seems better to remove HK_TYPE_SCHED and check
with HK_TYPE_SCHED.
> 4) Why is nohz_balance_exit_idle() called from the tick and not from the idle
> exit path? Is it to avoid overhead?
> I'm adding some scheduler people in Cc who might help answer some of those
> questions.
..
Thanks :)
On Thu, May 16, 2024 at 12:52:06AM +0200, Frederic Weisbecker wrote:
> Le Thu, May 09, 2024 at 10:29:32AM +0100, Levi Yun a ?crit :
> > When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> > It wouldn't be chosen to perform idle load balancing because it doesn't
> > call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
> > becomes idle.
> >
> > Formerly, __tick_nohz_idle_enter() is called in both
> > tick_nohz_irq_exit() and in do_idle().
> > That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
> > to the idle path") prevents nohz_full cpu which isn't yet
> > idle state but tick is stopped from entering idle balance.
> >
> > However, this prevents nohz_full cpu which already stops tick from
> > entering idle balacne when this cpu really becomes idle state.
> >
> > Currently, tick_nohz_idle_stop_tick() is only called in idle state and
> > it calls nohz_balance_enter_idle(). this function tracks the CPU
> > which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
> >
> > Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
> > without checking !was_stopped so that nohz_full cpu can be chosen to
> > perform idle load balancing when it enters idle state.
> >
> > Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
> > Signed-off-by: Levi Yun <[email protected]>
> > ---
> > v4:
> > - Add fixes tags.
> >
> > v3:
> > - Rewording commit message.
> >
> > v2:
> > - Fix typos in commit message.
> >
> > kernel/time/tick-sched.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 71a792cd8936..31a4cd89782f 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> > ts->idle_sleeps++;
> > ts->idle_expires = expires;
> >
> > - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> > - ts->idle_jiffies = ts->last_jiffies;
> > + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> > + if (!was_stopped)
> > + ts->idle_jiffies = ts->last_jiffies;
> > +
>
> I've taken some time to respond because your patch has raised more questions
> while discussing this with Anna-Maria:
>
> 1) Is Idle load balancing actually relevant for nohz_full? HK_TYPE_MISC already
> prevent those CPUs from becoming idle load balancer. They can still be
> targets for load balancing but nohz_full CPUs are supposed to run only one
> task.
>
> 2) This is related to previous point: HK_TYPE_SCHED is never activated. It would
> prevent the CPU from even beeing part of idle load balancing. Should we
> remove it or plug it?
>
>
> 3) nohz_balance_enter_idle() is called when the tick is stopped for the first
> time and nohz_balance_exit_idle() is called from the tick. But that also
> applies to idle ticks. So if the load balancing triggers while the tick is
> stopped, nohz_balance_enter_idle() won't be re-called in the idle loop even
> though the tick is stopped (that would be fixed with your patch).
>
> 4) Why is nohz_balance_exit_idle() called from the tick and not from the idle
> exit path? Is it to avoid overhead?
>
> I'm adding some scheduler people in Cc who might help answer some of those
> questions.
None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
single CPU partitions, and *that* should be avoiding any and all
load-balancing.
If there still is, that's a bug, but that's not related to HK goo.
As such, I don't think the HK_TYPE_SCHED check in
nohz_balance_enter_idle() actually makes sense, the on_null_omain()
check a little below that should already take care of things, no?
> None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
> single CPU partitions, and *that* should be avoiding any and all
> load-balancing.
Do you mean.. tick_nohz_full cpu (non-HK-ticked cpu) shouldn't belong
to any sched_domain?
>
> If there still is, that's a bug, but that's not related to HK goo.
>
> As such, I don't think the HK_TYPE_SCHED check in
> nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> check a little below that should already take care of things, no?
IIUC,
currently, whether cpu belongs on domain or null is determined by
HK_DOMAIN_FLAGS
However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
belongs to sched_domain
so, it couldn't be filtered out by on_null_domain().
unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
with "isolcpus=tick", it participates sched_domain.
On Thu, May 16, 2024 at 09:20:08AM +0100, Yun Levi wrote:
> > None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
> > single CPU partitions, and *that* should be avoiding any and all
> > load-balancing.
>
> Do you mean.. tick_nohz_full cpu (non-HK-ticked cpu) shouldn't belong
> to any sched_domain?
AFAIK NOHZ_FULL still hard relies on the isolcpus garbage, so yeah, it
should be all single cpu partitions, which don't have a domain.
(this really should migrate to use cpusets partitions)
> > If there still is, that's a bug, but that's not related to HK goo.
> >
> > As such, I don't think the HK_TYPE_SCHED check in
> > nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> > check a little below that should already take care of things, no?
>
> IIUC,
> currently, whether cpu belongs on domain or null is determined by
> HK_DOMAIN_FLAGS
No! you can create NULL domains without any of the HK nonsense. Both
isolcpus and cpusets can create single CPU partitions.
> However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
> belongs to sched_domain
> so, it couldn't be filtered out by on_null_domain().
>
> unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
> with "isolcpus=tick", it participates sched_domain.
Frederic ?!? You can use nohz_full without isolcpus? That makes no
sense. If you do that you get to keep the pieces.
On Thu, May 16, 2024 at 10:49:11AM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2024 at 09:20:08AM +0100, Yun Levi wrote:
> > > None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
> > > single CPU partitions, and *that* should be avoiding any and all
> > > load-balancing.
> >
> > Do you mean.. tick_nohz_full cpu (non-HK-ticked cpu) shouldn't belong
> > to any sched_domain?
>
> AFAIK NOHZ_FULL still hard relies on the isolcpus garbage, so yeah, it
> should be all single cpu partitions, which don't have a domain.
>
> (this really should migrate to use cpusets partitions)
>
> > > If there still is, that's a bug, but that's not related to HK goo.
> > >
> > > As such, I don't think the HK_TYPE_SCHED check in
> > > nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> > > check a little below that should already take care of things, no?
> >
> > IIUC,
> > currently, whether cpu belongs on domain or null is determined by
> > HK_DOMAIN_FLAGS
>
> No! you can create NULL domains without any of the HK nonsense. Both
> isolcpus and cpusets can create single CPU partitions.
>
> > However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
> > belongs to sched_domain
> > so, it couldn't be filtered out by on_null_domain().
> >
> > unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
> > with "isolcpus=tick", it participates sched_domain.
>
> Frederic ?!? You can use nohz_full without isolcpus? That makes no
> sense. If you do that you get to keep the pieces.
I fear you can yes, even though most users combine it with isolcpus. I
know, that interface is terrible but it dates from times when we weren't
sure about all the potential usecases of nohz_full. There was a possibility
that HPC could just want to reduce ticks without all the hard and costly
isolation around. But all the usecases I have witnessed so far in ten years
involved wanting 0 noise after all...
> > > > As such, I don't think the HK_TYPE_SCHED check in
> > > > nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> > > > check a little below that should already take care of things, no?
> > >
> > > IIUC,
> > > currently, whether cpu belongs on domain or null is determined by
> > > HK_DOMAIN_FLAGS
> >
> > No! you can create NULL domains without any of the HK nonsense. Both
> > isolcpus and cpusets can create single CPU partitions.
Yes. However what I said, nohz_full cpu isn't on null_domain
unless it was configured by cpusets.
even with option "nohz_full="
> > > However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
> > > belongs to sched_domain
> > > so, it couldn't be filtered out by on_null_domain().
> > >
> > > unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
> > > with "isolcpus=tick", it participates sched_domain.
> >
> > Frederic ?!? You can use nohz_full without isolcpus? That makes no
> > sense. If you do that you get to keep the pieces.
>
> I fear you can yes, even though most users combine it with isolcpus. I
> know, that interface is terrible but it dates from times when we weren't
> sure about all the potential usecases of nohz_full. There was a possibility
> that HPC could just want to reduce ticks without all the hard and costly
> isolation around. But all the usecases I have witnessed so far in ten years
> involved wanting 0 noise after all...
If I make you annoyed I'm sorry in advance but let me clarify please.
1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu),
It should be on the null_domain. right?
2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN
to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)?
3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case
and the check for this can be replaced by on_null_domain().
Many thanks!
On Thu, May 16, 2024 at 04:00:03PM +0200, Peter Zijlstra wrote:
> > If I make you annoyed I'm sorry in advance but let me clarify please.
> >
> > 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu),
> > It should be on the null_domain. right?
> >
> > 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN
> > to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)?
> >
> > 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case
> > and the check for this can be replaced by on_null_domain().
>
> I've no idea about all those HK knobs, it's all insane if you ask me.
>
> Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total
> nonsense, can you please explain?
Yes. Lemme unearth this patch:
https://lore.kernel.org/all/[email protected]/
Because all we need now is:
_ HK_TYPE_KERNEL_NOISE: nohz_full= or isolcpus=nohz
_ HK_TYPE_DOMAIN: isolcpus=domain (or classic isolcpus= alone)
_ HK_TYPE_MANAGED_IRQ: isolcpus=managed_irq
And that's it. Then let's remove HK_TYPE_SCHED that is unused. And then
lemme comment the HK_TYPE_* uses within sched/* within the same
patchset.
Just a question, correct me if I'm wrong, we don't want nohz_full= to ever
take the idle load balancer duty (this is what HK_TYPE_MISC prevents in
find_new_ilb) because the nohz_full CPU going back to userspace concurrently
doesn't want to be disturbed by a loose IPI telling it to do idle balancing. But
we still want nohz_full CPUs to be part of nohz.idle_cpus_mask so that the
idle balancing can be performed on them by a non isolated CPU. Right?
Thanks.
>
> If the CPU participates in load-balancing, it gets to fully participate.
> If you want to get out of load-balancing, you get single CPU partitions.
On Thu, May 16, 2024 at 04:23:31PM +0200, Frederic Weisbecker wrote:
> On Thu, May 16, 2024 at 04:00:03PM +0200, Peter Zijlstra wrote:
> > > If I make you annoyed I'm sorry in advance but let me clarify please.
> > >
> > > 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu),
> > > It should be on the null_domain. right?
> > >
> > > 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN
> > > to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)?
> > >
> > > 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case
> > > and the check for this can be replaced by on_null_domain().
> >
> > I've no idea about all those HK knobs, it's all insane if you ask me.
> >
> > Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total
> > nonsense, can you please explain?
>
> Yes. Lemme unearth this patch:
> https://lore.kernel.org/all/[email protected]/
AFAICT we need more cleanups.
> Because all we need now is:
>
> _ HK_TYPE_KERNEL_NOISE: nohz_full= or isolcpus=nohz
> _ HK_TYPE_DOMAIN: isolcpus=domain (or classic isolcpus= alone)
What does this do?
> _ HK_TYPE_MANAGED_IRQ: isolcpus=managed_irq
>
> And that's it. Then let's remove HK_TYPE_SCHED that is unused. And then
> lemme comment the HK_TYPE_* uses within sched/* within the same
> patchset.
Please, I find this MISC and DOMAIN stuff confusing, wth does it do? It
can't possibly be right.
> Just a question, correct me if I'm wrong, we don't want nohz_full= to ever
> take the idle load balancer duty (this is what HK_TYPE_MISC prevents in
> find_new_ilb) because the nohz_full CPU going back to userspace concurrently
> doesn't want to be disturbed by a loose IPI telling it to do idle balancing. But
> we still want nohz_full CPUs to be part of nohz.idle_cpus_mask so that the
> idle balancing can be performed on them by a non isolated CPU. Right?
I'm confused, none of that makes sense. If you're part of a
load-balancer, you're part of a load-balancer, no ifs buts or other
nonsense.
idle load balancer is no different from regular load balancing.
Fundamentally, you can't disable the tick if you're part of a
load-balance group, the load-balancer needs the tick.
The only possible way to use nohz_full is to not be part of a
load-balancer, and the only way that is so is by having (lots of) single
CPU partitions.
On Thu, May 16, 2024 at 04:45:04PM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2024 at 04:23:31PM +0200, Frederic Weisbecker wrote:
> > On Thu, May 16, 2024 at 04:00:03PM +0200, Peter Zijlstra wrote:
> > > > If I make you annoyed I'm sorry in advance but let me clarify please.
> > > >
> > > > 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu),
> > > > It should be on the null_domain. right?
> > > >
> > > > 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN
> > > > to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)?
> > > >
> > > > 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case
> > > > and the check for this can be replaced by on_null_domain().
> > >
> > > I've no idea about all those HK knobs, it's all insane if you ask me.
> > >
> > > Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total
> > > nonsense, can you please explain?
> >
> > Yes. Lemme unearth this patch:
> > https://lore.kernel.org/all/[email protected]/
>
> AFAICT we need more cleanups.
Well, we need to start somewhere :-)
>
> > Because all we need now is:
> >
> > _ HK_TYPE_KERNEL_NOISE: nohz_full= or isolcpus=nohz
> > _ HK_TYPE_DOMAIN: isolcpus=domain (or classic isolcpus= alone)
>
> What does this do?
So housekeeping_cpumask(HK_TYPE_KERNEL_NOISE) will return all
the CPUs not in nohz_full=
That is, all the CPUs that do all the housekeeping work on behalf
of nohz_full CPUs (unbound workqueues and timers, etc...).
Then in a similar way housekeeping_cpumask(HK_TYPE_DOMAIN) are all
the CPUs not in isolcpus= (on_null_domain()). Perhaps that one is
confusing and we should just have a simple isolcpus_cpumask instead?
>
> > _ HK_TYPE_MANAGED_IRQ: isolcpus=managed_irq
> >
> > And that's it. Then let's remove HK_TYPE_SCHED that is unused. And then
> > lemme comment the HK_TYPE_* uses within sched/* within the same
> > patchset.
>
> Please, I find this MISC and DOMAIN stuff confusing, wth does it do? It
> can't possibly be right.
MISC is actually part of what is going to become HK_TYPE_KERNEL_NOISE. It's
an artificial microfeature but it's actually the same as _WORKQUEUE, _TICK, _RCU,
_TIMER, etc... All of which intended to be merged together.
>
> > Just a question, correct me if I'm wrong, we don't want nohz_full= to ever
> > take the idle load balancer duty (this is what HK_TYPE_MISC prevents in
> > find_new_ilb) because the nohz_full CPU going back to userspace concurrently
> > doesn't want to be disturbed by a loose IPI telling it to do idle balancing. But
> > we still want nohz_full CPUs to be part of nohz.idle_cpus_mask so that the
> > idle balancing can be performed on them by a non isolated CPU. Right?
>
> I'm confused, none of that makes sense. If you're part of a
> load-balancer, you're part of a load-balancer, no ifs buts or other
> nonsense.
>
> idle load balancer is no different from regular load balancing.
>
> Fundamentally, you can't disable the tick if you're part of a
> load-balance group, the load-balancer needs the tick.
>
> The only possible way to use nohz_full is to not be part of a
> load-balancer, and the only way that is so is by having (lots of) single
> CPU partitions.
So you're suggesting that nohz_full should just be part of the whole
ilb machinery by default (that is, not fiddle with ilb internals) and
then it's up to CPU partitioning (through cpuset or isolcpus) to disable
ilb naturally. Right?
Thanks.
On Thu, May 16, 2024 at 05:02:51PM +0200, Frederic Weisbecker wrote:
> > I'm confused, none of that makes sense. If you're part of a
> > load-balancer, you're part of a load-balancer, no ifs buts or other
> > nonsense.
> >
> > idle load balancer is no different from regular load balancing.
> >
> > Fundamentally, you can't disable the tick if you're part of a
> > load-balance group, the load-balancer needs the tick.
> >
> > The only possible way to use nohz_full is to not be part of a
> > load-balancer, and the only way that is so is by having (lots of) single
> > CPU partitions.
>
> So you're suggesting that nohz_full should just be part of the whole
> ilb machinery by default (that is, not fiddle with ilb internals) and
> then it's up to CPU partitioning (through cpuset or isolcpus) to disable
> ilb naturally. Right?
Yes, but stronger, as long as the CPU is part of a load-balance domain,
it must not disable the tick while running anything.
that is, NOHZ_FULL must not become active unless it's running on a
single CPU partition.
On Thu, May 16, 2024 at 05:19:53PM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2024 at 05:02:51PM +0200, Frederic Weisbecker wrote:
>
> > > I'm confused, none of that makes sense. If you're part of a
> > > load-balancer, you're part of a load-balancer, no ifs buts or other
> > > nonsense.
> > >
> > > idle load balancer is no different from regular load balancing.
> > >
> > > Fundamentally, you can't disable the tick if you're part of a
> > > load-balance group, the load-balancer needs the tick.
> > >
> > > The only possible way to use nohz_full is to not be part of a
> > > load-balancer, and the only way that is so is by having (lots of) single
> > > CPU partitions.
> >
> > So you're suggesting that nohz_full should just be part of the whole
> > ilb machinery by default (that is, not fiddle with ilb internals) and
> > then it's up to CPU partitioning (through cpuset or isolcpus) to disable
> > ilb naturally. Right?
>
> Yes, but stronger, as long as the CPU is part of a load-balance domain,
> it must not disable the tick while running anything.
>
> that is, NOHZ_FULL must not become active unless it's running on a
> single CPU partition.
I like the idea but I'm afraid to introduce regressions while doing so,
with people currently using nohz_full without proper partionning...
> > Yes, but stronger, as long as the CPU is part of a load-balance domain,
> > it must not disable the tick while running anything.
> >
> > that is, NOHZ_FULL must not become active unless it's running on a
> > single CPU partition.
>
> I like the idea but I'm afraid to introduce regressions while doing so,
> with people currently using nohz_full without proper partionning...
But I wonder when nohz_full cpu enters *idle".
If the cpuidle governor selects an idle state which wants "stop_tick",
If nohz_full cpu is still in sched domain, should it disable tick or not?
On Thu, May 16, 2024 at 01:43:49PM +0100, Yun Levi wrote:
> > > > > As such, I don't think the HK_TYPE_SCHED check in
> > > > > nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> > > > > check a little below that should already take care of things, no?
> > > >
> > > > IIUC,
> > > > currently, whether cpu belongs on domain or null is determined by
> > > > HK_DOMAIN_FLAGS
> > >
> > > No! you can create NULL domains without any of the HK nonsense. Both
> > > isolcpus and cpusets can create single CPU partitions.
>
> Yes. However what I said, nohz_full cpu isn't on null_domain
> unless it was configured by cpusets.
> even with option "nohz_full="
So if a CPU isn't isolated, either by isolcpus or cpusets, you get to
participate in load-balancing -- end of story.
> > > > However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
> > > > belongs to sched_domain
> > > > so, it couldn't be filtered out by on_null_domain().
> > > >
> > > > unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
> > > > with "isolcpus=tick", it participates sched_domain.
> > >
> > > Frederic ?!? You can use nohz_full without isolcpus? That makes no
> > > sense. If you do that you get to keep the pieces.
> >
> > I fear you can yes, even though most users combine it with isolcpus. I
> > know, that interface is terrible but it dates from times when we weren't
> > sure about all the potential usecases of nohz_full. There was a possibility
> > that HPC could just want to reduce ticks without all the hard and costly
> > isolation around. But all the usecases I have witnessed so far in ten years
> > involved wanting 0 noise after all...
>
>
> If I make you annoyed I'm sorry in advance but let me clarify please.
>
> 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu),
> It should be on the null_domain. right?
>
> 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN
> to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)?
>
> 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case
> and the check for this can be replaced by on_null_domain().
I've no idea about all those HK knobs, it's all insane if you ask me.
Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total
nonsense, can you please explain?
If the CPU participates in load-balancing, it gets to fully participate.
If you want to get out of load-balancing, you get single CPU partitions.
On Thu, May 16, 2024 at 05:32:56PM +0200, Frederic Weisbecker wrote:
> On Thu, May 16, 2024 at 05:19:53PM +0200, Peter Zijlstra wrote:
> > Yes, but stronger, as long as the CPU is part of a load-balance domain,
> > it must not disable the tick while running anything.
> >
> > that is, NOHZ_FULL must not become active unless it's running on a
> > single CPU partition.
>
> I like the idea but I'm afraid to introduce regressions while doing so,
> with people currently using nohz_full without proper partionning...
There is no regression, if this is possible today it is utterly broken.
This should never have been possible.
Le Thu, May 16, 2024 at 07:53:21PM +0200, Peter Zijlstra a ?crit :
> On Thu, May 16, 2024 at 05:32:56PM +0200, Frederic Weisbecker wrote:
> > On Thu, May 16, 2024 at 05:19:53PM +0200, Peter Zijlstra wrote:
>
> > > Yes, but stronger, as long as the CPU is part of a load-balance domain,
> > > it must not disable the tick while running anything.
> > >
> > > that is, NOHZ_FULL must not become active unless it's running on a
> > > single CPU partition.
> >
> > I like the idea but I'm afraid to introduce regressions while doing so,
> > with people currently using nohz_full without proper partionning...
>
> There is no regression, if this is possible today it is utterly broken.
>
> This should never have been possible.
Ok, I'll try something.
Thanks.