2021-06-20 10:21:25

by Ani Sinha

[permalink] [raw]
Subject: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible

Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
dependency mask model") had also removed the kernel warning
message informing the user that it was not possible to turn
on NO_HZ_FULL. Adding back that log message here. It is
unhelpful when the kernel turns off NO_HZ_FULL silently
without informing anyone.
Also added a kernel log when sched clock is marked as unstable.

Signed-off-by: Ani Sinha <[email protected]>
---
kernel/sched/clock.c | 5 ++++-
kernel/time/tick-sched.c | 14 +++++++++++---
2 files changed, 15 insertions(+), 4 deletions(-)

changelog:
v1: original patch
v2: updated log message to make it non specific to sched clock. Also
added another log when sched clock is marked as unstable.
v3: a bugfix to v2.

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index c2b2859ddd82..9f9fe658f8a5 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)

smp_mb(); /* matches sched_clock_init_late() */

- if (static_key_count(&sched_clock_running.key) == 2)
+ if (static_key_count(&sched_clock_running.key) == 2) {
+ WARN_ONCE(sched_clock_stable(),
+ "sched clock is now marked unstable.");
__clear_sched_clock_stable();
+ }
}

static void __sched_clock_gtod_offset(void)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 828b091501ca..6e583aa2d160 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -937,10 +937,18 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
return;

- if (can_stop_full_tick(cpu, ts))
+ if (can_stop_full_tick(cpu, ts)) {
tick_nohz_stop_sched_tick(ts, cpu);
- else if (ts->tick_stopped)
- tick_nohz_restart_sched_tick(ts, ktime_get());
+ } else {
+ /*
+ * Don't allow the user to think they can get
+ * full NO_HZ with this machine.
+ */
+ WARN_ONCE(tick_nohz_full_running,
+ "NO_HZ_FULL will not work for the current system.");
+ if (ts->tick_stopped)
+ tick_nohz_restart_sched_tick(ts, ktime_get());
+ }
#endif
}

--
2.25.1


2021-06-22 15:18:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible

On Sun, Jun 20 2021 at 15:43, Ani Sinha wrote:

> Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> dependency mask model") had also removed the kernel warning
> message informing the user that it was not possible to turn
> on NO_HZ_FULL. Adding back that log message here. It is
> unhelpful when the kernel turns off NO_HZ_FULL silently
> without informing anyone.
> Also added a kernel log when sched clock is marked as unstable.

Don't do two things at once. See Documentation/process/....

Also your subject line want's a proper prefix.

> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index c2b2859ddd82..9f9fe658f8a5 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
>
> smp_mb(); /* matches sched_clock_init_late() */
>
> - if (static_key_count(&sched_clock_running.key) == 2)
> + if (static_key_count(&sched_clock_running.key) == 2) {
> + WARN_ONCE(sched_clock_stable(),
> + "sched clock is now marked unstable.");

What's the WARN for here? That backtrace is largely uninteresting.

> - if (can_stop_full_tick(cpu, ts))
> + if (can_stop_full_tick(cpu, ts)) {
> tick_nohz_stop_sched_tick(ts, cpu);
> - else if (ts->tick_stopped)
> - tick_nohz_restart_sched_tick(ts, ktime_get());
> + } else {
> + /*
> + * Don't allow the user to think they can get
> + * full NO_HZ with this machine.
> + */
> + WARN_ONCE(tick_nohz_full_running,
> + "NO_HZ_FULL will not work for the current system.");

can_stop_full_tick() returning false can be transient and then the user
still has no idea _why_ this is printed.

Also assume the user/admin starts perf and knows he's going to disturb
NOHZ full, then _why_ would he be interested in that warning.

And again the backtrace is useless. The call path is known.

Thanks,

tglx

2021-06-26 16:20:08

by Ani Sinha

[permalink] [raw]
Subject: Re: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible



On Tue, 22 Jun 2021, Thomas Gleixner wrote:

> On Sun, Jun 20 2021 at 15:43, Ani Sinha wrote:
>
> > Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> > dependency mask model") had also removed the kernel warning
> > message informing the user that it was not possible to turn
> > on NO_HZ_FULL. Adding back that log message here. It is
> > unhelpful when the kernel turns off NO_HZ_FULL silently
> > without informing anyone.
> > Also added a kernel log when sched clock is marked as unstable.
>
> Don't do two things at once. See Documentation/process/....

Ok I will split up this patch into two.

>
> Also your subject line want's a proper prefix.

OK will fix.

>
> > diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> > index c2b2859ddd82..9f9fe658f8a5 100644
> > --- a/kernel/sched/clock.c
> > +++ b/kernel/sched/clock.c
> > @@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
> >
> > smp_mb(); /* matches sched_clock_init_late() */
> >
> > - if (static_key_count(&sched_clock_running.key) == 2)
> > + if (static_key_count(&sched_clock_running.key) == 2) {
> > + WARN_ONCE(sched_clock_stable(),
> > + "sched clock is now marked unstable.");
>
> What's the WARN for here? That backtrace is largely uninteresting.

OK to be consistent with other parts of the code, I will replace this with
pr_warn()

>
> > - if (can_stop_full_tick(cpu, ts))
> > + if (can_stop_full_tick(cpu, ts)) {
> > tick_nohz_stop_sched_tick(ts, cpu);
> > - else if (ts->tick_stopped)
> > - tick_nohz_restart_sched_tick(ts, ktime_get());
> > + } else {
> > + /*
> > + * Don't allow the user to think they can get
> > + * full NO_HZ with this machine.
> > + */
> > + WARN_ONCE(tick_nohz_full_running,
> > + "NO_HZ_FULL will not work for the current system.");
>
> can_stop_full_tick() returning false can be transient and then the user
> still has no idea _why_ this is printed.
>
> Also assume the user/admin starts perf and knows he's going to disturb
> NOHZ full, then _why_ would he be interested in that warning.
>

If NOHZ is disabled intentionally, that is not an interesting case. I am
worried about the situation where the user specifies NOHZ option in the
kernel commandline and the kernel silently disabled this because of one or
more limitations in the system. I want to address this. All the reasons
specified in the following commit is still true:

commit e12d0271774fea9fddf1e2a7952a0bffb2ee8e8b
Author: Steven Rostedt <[email protected]>
Date: Fri May 10 17:12:28 2013 -0400

nohz: Warn if the machine can not perform nohz_full

If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
due to the machine having a unstable clock, warn about it.

We do not want users thinking that they are getting the benefit
of nohz when their machine can not support it.

Signed-off-by: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>

This log was removed from the kernel in commit

commit 4f49b90abb4aca6fe677c95fc352fd0674d489bd
Author: Frederic Weisbecker <[email protected]>
Date: Wed Jul 22 17:03:52 2015 +0200

sched-clock: Migrate to use new tick dependency mask model

Instead of checking sched_clock_stable from the nohz subsystem to
verify
its tick dependency, migrate it to the new mask in order to include it
to the all-in-one check.


and it seems to me that the removal of the log defeats the purpose that
commit e12d0271774fea9fddf tried to serve.

Please enlighten me as to how to achieve this and I will cook up
something.

Ani

2021-07-03 17:34:08

by Ani Sinha

[permalink] [raw]
Subject: Re: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible

ping ...

On Sat, Jun 26, 2021 at 9:49 PM Ani Sinha <[email protected]> wrote:
>
>
>
> On Tue, 22 Jun 2021, Thomas Gleixner wrote:
>
> > On Sun, Jun 20 2021 at 15:43, Ani Sinha wrote:
> >
> > > Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> > > dependency mask model") had also removed the kernel warning
> > > message informing the user that it was not possible to turn
> > > on NO_HZ_FULL. Adding back that log message here. It is
> > > unhelpful when the kernel turns off NO_HZ_FULL silently
> > > without informing anyone.
> > > Also added a kernel log when sched clock is marked as unstable.
> >
> > Don't do two things at once. See Documentation/process/....
>
> Ok I will split up this patch into two.
>
> >
> > Also your subject line want's a proper prefix.
>
> OK will fix.
>
> >
> > > diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> > > index c2b2859ddd82..9f9fe658f8a5 100644
> > > --- a/kernel/sched/clock.c
> > > +++ b/kernel/sched/clock.c
> > > @@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
> > >
> > > smp_mb(); /* matches sched_clock_init_late() */
> > >
> > > - if (static_key_count(&sched_clock_running.key) == 2)
> > > + if (static_key_count(&sched_clock_running.key) == 2) {
> > > + WARN_ONCE(sched_clock_stable(),
> > > + "sched clock is now marked unstable.");
> >
> > What's the WARN for here? That backtrace is largely uninteresting.
>
> OK to be consistent with other parts of the code, I will replace this with
> pr_warn()
>
> >
> > > - if (can_stop_full_tick(cpu, ts))
> > > + if (can_stop_full_tick(cpu, ts)) {
> > > tick_nohz_stop_sched_tick(ts, cpu);
> > > - else if (ts->tick_stopped)
> > > - tick_nohz_restart_sched_tick(ts, ktime_get());
> > > + } else {
> > > + /*
> > > + * Don't allow the user to think they can get
> > > + * full NO_HZ with this machine.
> > > + */
> > > + WARN_ONCE(tick_nohz_full_running,
> > > + "NO_HZ_FULL will not work for the current system.");
> >
> > can_stop_full_tick() returning false can be transient and then the user
> > still has no idea _why_ this is printed.
> >
> > Also assume the user/admin starts perf and knows he's going to disturb
> > NOHZ full, then _why_ would he be interested in that warning.
> >
>
> If NOHZ is disabled intentionally, that is not an interesting case. I am
> worried about the situation where the user specifies NOHZ option in the
> kernel commandline and the kernel silently disabled this because of one or
> more limitations in the system. I want to address this. All the reasons
> specified in the following commit is still true:
>
> commit e12d0271774fea9fddf1e2a7952a0bffb2ee8e8b
> Author: Steven Rostedt <[email protected]>
> Date: Fri May 10 17:12:28 2013 -0400
>
> nohz: Warn if the machine can not perform nohz_full
>
> If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
> kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
> due to the machine having a unstable clock, warn about it.
>
> We do not want users thinking that they are getting the benefit
> of nohz when their machine can not support it.
>
> Signed-off-by: Steven Rostedt <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Li Zhong <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
>
> This log was removed from the kernel in commit
>
> commit 4f49b90abb4aca6fe677c95fc352fd0674d489bd
> Author: Frederic Weisbecker <[email protected]>
> Date: Wed Jul 22 17:03:52 2015 +0200
>
> sched-clock: Migrate to use new tick dependency mask model
>
> Instead of checking sched_clock_stable from the nohz subsystem to
> verify
> its tick dependency, migrate it to the new mask in order to include it
> to the all-in-one check.
>
>
> and it seems to me that the removal of the log defeats the purpose that
> commit e12d0271774fea9fddf tried to serve.
>
> Please enlighten me as to how to achieve this and I will cook up
> something.
>
> Ani