2020-11-26 09:54:45

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 2/3] clear_warn_once: bind a timer to written reset value

Existing documentation has a write of "1" to clear/reset all the
WARN_ONCE and similar to the as-booted state, so they can possibly
be re-triggered again during debugging/testing.

But having them auto-reset once a day, or once a week, might shed
valuable information to a sysadmin on what the system is doing.

Here we extend the existing debugfs variable to bind a timer to the
written value N, so that it will reset every N seconds, for N>1.
Writing a zero will clear any previously set timer value.

The pre-existing behaviour of writing N=1 will do a one-shot clear.

Cc: Andi Kleen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: John Ogness <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
.../admin-guide/clearing-warn-once.rst | 9 ++++++
kernel/panic.c | 32 +++++++++++++++++++
2 files changed, 41 insertions(+)

diff --git a/Documentation/admin-guide/clearing-warn-once.rst b/Documentation/admin-guide/clearing-warn-once.rst
index 211fd926cf00..93cf3ba0b57d 100644
--- a/Documentation/admin-guide/clearing-warn-once.rst
+++ b/Documentation/admin-guide/clearing-warn-once.rst
@@ -7,3 +7,12 @@ echo 1 > /sys/kernel/debug/clear_warn_once

clears the state and allows the warnings to print once again.
This can be useful after test suite runs to reproduce problems.
+
+Values greater than one set a timer for a periodic state reset; e.g.
+
+echo 3600 > /sys/kernel/debug/clear_warn_once
+
+will establish an hourly state reset, effectively turning WARN_ONCE
+into a long period rate-limited warning.
+
+Writing a value of zero (or one) will remove any previously set timer.
diff --git a/kernel/panic.c b/kernel/panic.c
index 1d425970a50c..a23eb239fb17 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -655,6 +655,7 @@ EXPORT_SYMBOL(__warn_printk);
/* Support resetting WARN*_ONCE state */

static u64 warn_once_reset;
+static bool warn_timer_active;

static void do_clear_warn_once(void)
{
@@ -662,6 +663,14 @@ static void do_clear_warn_once(void)
memset(__start_once, 0, __end_once - __start_once);
}

+static void timer_warn_once(struct timer_list *timer)
+{
+ do_clear_warn_once();
+ timer->expires = jiffies + warn_once_reset * HZ;
+ add_timer(timer);
+}
+static DEFINE_TIMER(warn_reset_timer, timer_warn_once);
+
static int warn_once_get(void *data, u64 *val)
{
*val = warn_once_reset;
@@ -672,6 +681,29 @@ static int warn_once_set(void *data, u64 val)
{
warn_once_reset = val;

+ if (val > 1) { /* set/reset new timer */
+ unsigned long expires = jiffies + val * HZ;
+
+ if (warn_timer_active) {
+ mod_timer(&warn_reset_timer, expires);
+ } else {
+ warn_timer_active = 1;
+ warn_reset_timer.expires = expires;
+ add_timer(&warn_reset_timer);
+ }
+ return 0;
+ }
+
+ if (warn_timer_active) {
+ del_timer_sync(&warn_reset_timer);
+ warn_timer_active = 0;
+ }
+ warn_once_reset = 0;
+
+ if (val == 0) /* cleared timer, we are done */
+ return 0;
+
+ /* Getting here means val == 1 ---> so clear existing data */
do_clear_warn_once();
return 0;
}
--
2.25.1


2020-11-30 16:26:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] clear_warn_once: bind a timer to written reset value

On Thu, 26 Nov 2020 01:30:28 -0500
Paul Gortmaker <[email protected]> wrote:

> +++ b/Documentation/admin-guide/clearing-warn-once.rst
> @@ -7,3 +7,12 @@ echo 1 > /sys/kernel/debug/clear_warn_once
>
> clears the state and allows the warnings to print once again.
> This can be useful after test suite runs to reproduce problems.
> +
> +Values greater than one set a timer for a periodic state reset; e.g.
> +
> +echo 3600 > /sys/kernel/debug/clear_warn_once

I wonder if the value should be in minutes and not seconds, otherwise, a
wrong value could possibly DoS the machine, if you were to write 2 into it.
If there were a lot of warnings in high frequency events.

Or is dumping out a bunch of warnings every 2 seconds not be a problem?

Anyway, would there ever be a need to have it cleared in less than 1 minute
intervals?

-- Steve


> +
> +will establish an hourly state reset, effectively turning WARN_ONCE
> +into a long period rate-limited warning.

2020-11-30 17:23:05

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 2/3] clear_warn_once: bind a timer to written reset value

[Re: [PATCH 2/3] clear_warn_once: bind a timer to written reset value] On 30/11/2020 (Mon 11:20) Steven Rostedt wrote:

> On Thu, 26 Nov 2020 01:30:28 -0500
> Paul Gortmaker <[email protected]> wrote:
>
> > +++ b/Documentation/admin-guide/clearing-warn-once.rst
> > @@ -7,3 +7,12 @@ echo 1 > /sys/kernel/debug/clear_warn_once
> >
> > clears the state and allows the warnings to print once again.
> > This can be useful after test suite runs to reproduce problems.
> > +
> > +Values greater than one set a timer for a periodic state reset; e.g.
> > +
> > +echo 3600 > /sys/kernel/debug/clear_warn_once
>
> I wonder if the value should be in minutes and not seconds, otherwise, a
> wrong value could possibly DoS the machine, if you were to write 2 into it.
> If there were a lot of warnings in high frequency events.
>
> Or is dumping out a bunch of warnings every 2 seconds not be a problem?

It doesn't seem to be a problem - at least in that running a defconfig
build on an otherwise out of the box common distro doesn't seem to trip
any WARN or printk_once events in my testing. Of course there may be a
use case out there that is doing lots of them, however.

> Anyway, would there ever be a need to have it cleared in less than 1 minute
> intervals?

I don't think so - as I said in another follow up from last week:

https://lore.kernel.org/lkml/[email protected]/

I'd also indicated in the above that I'd be fine with adding a minimum
of 1m if people feel better about that. Also maybe moving the units to
minutes instead of seconds helps implicitly convey the intended use
better -- i.e. "don't be smashing on this every second" -- maybe that
was your point as well - and I'd agree with that.

Paul.
--

>
> -- Steve
>
>
> > +
> > +will establish an hourly state reset, effectively turning WARN_ONCE
> > +into a long period rate-limited warning.

2020-12-01 03:41:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] clear_warn_once: bind a timer to written reset value

On Mon, 30 Nov 2020 12:17:59 -0500
Paul Gortmaker <[email protected]> wrote:

> > Anyway, would there ever be a need to have it cleared in less than 1 minute
> > intervals?
>
> I don't think so - as I said in another follow up from last week:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> I'd also indicated in the above that I'd be fine with adding a minimum
> of 1m if people feel better about that. Also maybe moving the units to
> minutes instead of seconds helps implicitly convey the intended use
> better -- i.e. "don't be smashing on this every second" -- maybe that
> was your point as well - and I'd agree with that.

That was my second point. That is, why would anyone care about a
resolution in seconds for this?

-- Steve