Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
dependency mask model") had also removed the kernel warning
message informing the user that it was not possible to turn
on NO_HZ_FULL. This warning was added in commit e12d0271774fe
("nohz: Warn if the machine can not perform nohz_full") in order
to warn users when they enabled NOHZ using the kernel commandline
but kernel could not enable NOHZ. Removal of this log defeats this
purpose because we still do not want users thinking that they are
getting the benefit of nohz when their machine can not support it.
Hence this patch brings back the information log.
If at some point the conditions that prevented NOHZ from being
enabled goes away and the previous info log was printed, we print
another log informing the user that NOHZ was again possible.
Signed-off-by: Ani Sinha <[email protected]>
---
kernel/time/tick-sched.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
See also thread: https://lkml.org/lkml/2021/6/20/32
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 828b091501ca..fbef95b57b1d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -930,6 +930,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
{
#ifdef CONFIG_NO_HZ_FULL
int cpu = smp_processor_id();
+ static bool no_tick_warned;
if (!tick_nohz_full_cpu(cpu))
return;
@@ -937,10 +938,24 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
return;
- if (can_stop_full_tick(cpu, ts))
+ if (can_stop_full_tick(cpu, ts)) {
tick_nohz_stop_sched_tick(ts, cpu);
- else if (ts->tick_stopped)
- tick_nohz_restart_sched_tick(ts, ktime_get());
+ if (no_tick_warned) {
+ pr_info("NO_HZ_FULL is now enabled in the system.\n");
+ no_tick_warned = false;
+ }
+ } else {
+ /*
+ * Don't allow the user to think they can get
+ * full NO_HZ with this machine.
+ */
+ if (!no_tick_warned && tick_nohz_full_running) {
+ pr_info("NO_HZ_FULL has been disabled in the system.");
+ no_tick_warned = true;
+ }
+ if (ts->tick_stopped)
+ tick_nohz_restart_sched_tick(ts, ktime_get());
+ }
#endif
}
--
2.25.1
ping ...
On Sun, Jun 27, 2021 at 12:58 PM Ani Sinha <[email protected]> wrote:
>
> Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> dependency mask model") had also removed the kernel warning
> message informing the user that it was not possible to turn
> on NO_HZ_FULL. This warning was added in commit e12d0271774fe
> ("nohz: Warn if the machine can not perform nohz_full") in order
> to warn users when they enabled NOHZ using the kernel commandline
> but kernel could not enable NOHZ. Removal of this log defeats this
> purpose because we still do not want users thinking that they are
> getting the benefit of nohz when their machine can not support it.
> Hence this patch brings back the information log.
>
> If at some point the conditions that prevented NOHZ from being
> enabled goes away and the previous info log was printed, we print
> another log informing the user that NOHZ was again possible.
>
> Signed-off-by: Ani Sinha <[email protected]>
> ---
> kernel/time/tick-sched.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> See also thread: https://lkml.org/lkml/2021/6/20/32
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 828b091501ca..fbef95b57b1d 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -930,6 +930,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> {
> #ifdef CONFIG_NO_HZ_FULL
> int cpu = smp_processor_id();
> + static bool no_tick_warned;
>
> if (!tick_nohz_full_cpu(cpu))
> return;
> @@ -937,10 +938,24 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
> return;
>
> - if (can_stop_full_tick(cpu, ts))
> + if (can_stop_full_tick(cpu, ts)) {
> tick_nohz_stop_sched_tick(ts, cpu);
> - else if (ts->tick_stopped)
> - tick_nohz_restart_sched_tick(ts, ktime_get());
> + if (no_tick_warned) {
> + pr_info("NO_HZ_FULL is now enabled in the system.\n");
> + no_tick_warned = false;
> + }
> + } else {
> + /*
> + * Don't allow the user to think they can get
> + * full NO_HZ with this machine.
> + */
> + if (!no_tick_warned && tick_nohz_full_running) {
> + pr_info("NO_HZ_FULL has been disabled in the system.");
> + no_tick_warned = true;
> + }
> + if (ts->tick_stopped)
> + tick_nohz_restart_sched_tick(ts, ktime_get());
> + }
> #endif
> }
>
> --
> 2.25.1
>
Ani,
On Sun, Jun 27 2021 at 12:58, Ani Sinha wrote:
same comment vs. Subject, 'this patch' and 'we'
> @@ -930,6 +930,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> {
> #ifdef CONFIG_NO_HZ_FULL
> int cpu = smp_processor_id();
> + static bool no_tick_warned;
>
> if (!tick_nohz_full_cpu(cpu))
> return;
> @@ -937,10 +938,24 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
> return;
>
> - if (can_stop_full_tick(cpu, ts))
> + if (can_stop_full_tick(cpu, ts)) {
> tick_nohz_stop_sched_tick(ts, cpu);
> - else if (ts->tick_stopped)
> - tick_nohz_restart_sched_tick(ts, ktime_get());
> + if (no_tick_warned) {
> + pr_info("NO_HZ_FULL is now enabled in the system.\n");
> + no_tick_warned = false;
> + }
> + } else {
> + /*
> + * Don't allow the user to think they can get
> + * full NO_HZ with this machine.
> + */
> + if (!no_tick_warned && tick_nohz_full_running) {
> + pr_info("NO_HZ_FULL has been disabled in the system.");
> + no_tick_warned = true;
> + }
So this is going to emit this message everytime the NOHZ state of a CPU
changes, which is an easy to trigger dmesg flooding from unpriviledged
user space.
Also the message is misleading because this is a per CPU condition and
not a system wide condition.
Thanks,
tglx
On Wed, Jul 07, 2021 at 12:16:15PM +0200, Thomas Gleixner wrote:
> Ani,
>
> On Sun, Jun 27 2021 at 12:58, Ani Sinha wrote:
>
> same comment vs. Subject, 'this patch' and 'we'
>
> > @@ -930,6 +930,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> > {
> > #ifdef CONFIG_NO_HZ_FULL
> > int cpu = smp_processor_id();
> > + static bool no_tick_warned;
> >
> > if (!tick_nohz_full_cpu(cpu))
> > return;
> > @@ -937,10 +938,24 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> > if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
> > return;
> >
> > - if (can_stop_full_tick(cpu, ts))
> > + if (can_stop_full_tick(cpu, ts)) {
> > tick_nohz_stop_sched_tick(ts, cpu);
> > - else if (ts->tick_stopped)
> > - tick_nohz_restart_sched_tick(ts, ktime_get());
> > + if (no_tick_warned) {
> > + pr_info("NO_HZ_FULL is now enabled in the system.\n");
> > + no_tick_warned = false;
> > + }
> > + } else {
> > + /*
> > + * Don't allow the user to think they can get
> > + * full NO_HZ with this machine.
> > + */
> > + if (!no_tick_warned && tick_nohz_full_running) {
> > + pr_info("NO_HZ_FULL has been disabled in the system.");
> > + no_tick_warned = true;
> > + }
>
> So this is going to emit this message everytime the NOHZ state of a CPU
> changes, which is an easy to trigger dmesg flooding from unpriviledged
> user space.
>
> Also the message is misleading because this is a per CPU condition and
> not a system wide condition.
IIUC, the clock can be marked unstable anytime before or after nohz_full
is initialized.
So we can do two things:
* If tick_nohz_init() is called after the clock has been marked unstable,
just don't initialize nohz_full.
* If the clock is marked unstable after tick_nohz_init(), issue a pr_warn()
from __clear_sched_clock_stable().
Thanks.
On Wed, Jul 7, 2021 at 4:13 PM Frederic Weisbecker <[email protected]> wrote:
>
> On Wed, Jul 07, 2021 at 12:16:15PM +0200, Thomas Gleixner wrote:
> > Ani,
> >
> > On Sun, Jun 27 2021 at 12:58, Ani Sinha wrote:
> >
> > same comment vs. Subject, 'this patch' and 'we'
> >
> > > @@ -930,6 +930,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> > > {
> > > #ifdef CONFIG_NO_HZ_FULL
> > > int cpu = smp_processor_id();
> > > + static bool no_tick_warned;
> > >
> > > if (!tick_nohz_full_cpu(cpu))
> > > return;
> > > @@ -937,10 +938,24 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> > > if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
> > > return;
> > >
> > > - if (can_stop_full_tick(cpu, ts))
> > > + if (can_stop_full_tick(cpu, ts)) {
> > > tick_nohz_stop_sched_tick(ts, cpu);
> > > - else if (ts->tick_stopped)
> > > - tick_nohz_restart_sched_tick(ts, ktime_get());
> > > + if (no_tick_warned) {
> > > + pr_info("NO_HZ_FULL is now enabled in the system.\n");
> > > + no_tick_warned = false;
> > > + }
> > > + } else {
> > > + /*
> > > + * Don't allow the user to think they can get
> > > + * full NO_HZ with this machine.
> > > + */
> > > + if (!no_tick_warned && tick_nohz_full_running) {
> > > + pr_info("NO_HZ_FULL has been disabled in the system.");
> > > + no_tick_warned = true;
> > > + }
> >
> > So this is going to emit this message everytime the NOHZ state of a CPU
> > changes, which is an easy to trigger dmesg flooding from unpriviledged
> > user space.
> >
> > Also the message is misleading because this is a per CPU condition and
> > not a system wide condition.
>
> IIUC, the clock can be marked unstable anytime before or after nohz_full
> is initialized.
>
> So we can do two things:
>
> * If tick_nohz_init() is called after the clock has been marked unstable,
> just don't initialize nohz_full.
>
> * If the clock is marked unstable after tick_nohz_init(), issue a pr_warn()
> from __clear_sched_clock_stable().
As Peter mentions in the other thread, there is already logging for
this transition in the work function. The reason why I sent these
patches was because I was seeing some issues on the hyperV platform
where the sched clock was seemingly marked unstable but no logging was
present. I have bisected that issue and it seems we should fix this
elsewhere, in the hyperV platform init code and not here by adding
more logging, Tglx was right - this is per cpu state and we do see
some toggles between NOHZ enable and disable (some other condition
unrelated to sched clock stability). So adding logs here would add
more confusion and verbosity for reasons other than the one we wish to
alert users for. It would not add any value.
So this patch can be discarded.
Ani