2013-10-15 19:58:14

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

The WARN_ON_ONCE() code is to trigger a waring only once when some
condition happens. But due to the way it is written it is racy.

if (unlikely(condition)) {
if (WARN(!__warned))
__warned = true;
}

The problem is that multiple CPUs could hit the same warning and
produce multiple output dumps of the same warning, or an interrupt could
happen and hit the same warning and do the warning in the middle of a
previous one, especially since the WARN() does a dump of the current
stack.

Even more of a problem, a recent WARN_ON_ONCE() that was in the page
fault handler triggered and the stack dump of the WARN() caused the
same WARN_ON_ONCE() get hit again. Since the __warned = true is not
updated until after the WARN() is completed, each WARN() triggered
another page fault causing the stack to be filled and crashed the box.

The point of WARN_ON() is to warn the user and not to crash the box.

The easy fix is to update the __warned variable with a xchg(). This way
only one WARN_ON_ONCE() will actually happen, and prevents any issues
of the WARN() causing the same WARN() to be hit and crash the system.

[ Fixed the WARN() whitespace offset of a '\' while I was at it ]

Signed-off-by: Steven Rostedt <[email protected]>

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 7d10f96..ab7ebdb 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -11,6 +11,7 @@

#ifndef __ASSEMBLY__
#include <linux/kernel.h>
+#include <asm/cmpxchg.h>

#ifdef CONFIG_BUG

@@ -91,7 +92,7 @@ extern void warn_slowpath_null(const char *file, const int line);
#endif

#ifndef WARN
-#define WARN(condition, format...) ({ \
+#define WARN(condition, format...) ({ \
int __ret_warn_on = !!(condition); \
if (unlikely(__ret_warn_on)) \
__WARN_printf(format); \
@@ -134,32 +135,29 @@ extern void warn_slowpath_null(const char *file, const int line);
#endif

#define WARN_ON_ONCE(condition) ({ \
- static bool __section(.data.unlikely) __warned; \
+ static int __section(.data.unlikely) __warned; \
int __ret_warn_once = !!(condition); \
\
if (unlikely(__ret_warn_once)) \
- if (WARN_ON(!__warned)) \
- __warned = true; \
+ WARN_ON(!xchg(&__warned, 1)); \
unlikely(__ret_warn_once); \
})

#define WARN_ONCE(condition, format...) ({ \
- static bool __section(.data.unlikely) __warned; \
+ static int __section(.data.unlikely) __warned; \
int __ret_warn_once = !!(condition); \
\
if (unlikely(__ret_warn_once)) \
- if (WARN(!__warned, format)) \
- __warned = true; \
+ WARN(!xchg(&__warned, 1), format); \
unlikely(__ret_warn_once); \
})

#define WARN_TAINT_ONCE(condition, taint, format...) ({ \
- static bool __section(.data.unlikely) __warned; \
+ static int __section(.data.unlikely) __warned; \
int __ret_warn_once = !!(condition); \
\
if (unlikely(__ret_warn_once)) \
- if (WARN_TAINT(!__warned, taint, format)) \
- __warned = true; \
+ WARN_TAINT(!xchg(&__warned, 1), taint, format); \
unlikely(__ret_warn_once); \
})


2013-10-15 20:13:03

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

On Tue, 15 Oct 2013 15:58:06 -0400 Steven Rostedt <[email protected]> wrote:

> The WARN_ON_ONCE() code is to trigger a waring only once when some
> condition happens. But due to the way it is written it is racy.
>
> if (unlikely(condition)) {
> if (WARN(!__warned))
> __warned = true;
> }
>
> The problem is that multiple CPUs could hit the same warning and
> produce multiple output dumps of the same warning, or an interrupt could
> happen and hit the same warning and do the warning in the middle of a
> previous one, especially since the WARN() does a dump of the current
> stack.
>
> Even more of a problem, a recent WARN_ON_ONCE() that was in the page
> fault handler triggered and the stack dump of the WARN() caused the
> same WARN_ON_ONCE() get hit again. Since the __warned = true is not
> updated until after the WARN() is completed, each WARN() triggered
> another page fault causing the stack to be filled and crashed the box.
>
> The point of WARN_ON() is to warn the user and not to crash the box.
>
> The easy fix is to update the __warned variable with a xchg(). This way
> only one WARN_ON_ONCE() will actually happen, and prevents any issues
> of the WARN() causing the same WARN() to be hit and crash the system.

printk_once() has the same issue, and probably other places.

Is there some sneaky way of doing this operation as a common thing,
rather than open-coding it everywhere? Something like

#define ONCE() ({
static int state;
int ret;

ret = !xchg(&state, 1);
ret;
})

Also, is xchg() better than test_and_set_bit()? (test_and_set_bit()
requires a long, so more storage).

Also, we're now incurring an atomic op for every "call". Presumably
these calls are rare, but not necessarily - one can envisage uses of a
generic ONCE() which are called at high frequency. Should we avoid
that with

#define ONCE() ({
static int state;
int ret;

if (likely(state))
ret = 0;
else
ret = !xchg(&state, 1);
ret;
})

2013-10-15 20:18:56

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

On Tue, Oct 15, 2013 at 03:58:06PM -0400, Steven Rostedt wrote:
> The WARN_ON_ONCE() code is to trigger a waring only once when some
> condition happens. But due to the way it is written it is racy.
>
> if (unlikely(condition)) {
> if (WARN(!__warned))
> __warned = true;
> }
>
> The problem is that multiple CPUs could hit the same warning and
> produce multiple output dumps of the same warning, or an interrupt could
> happen and hit the same warning and do the warning in the middle of a
> previous one, especially since the WARN() does a dump of the current
> stack.
>
> Even more of a problem, a recent WARN_ON_ONCE() that was in the page
> fault handler triggered and the stack dump of the WARN() caused the
> same WARN_ON_ONCE() get hit again. Since the __warned = true is not
> updated until after the WARN() is completed, each WARN() triggered
> another page fault causing the stack to be filled and crashed the box.
>
> The point of WARN_ON() is to warn the user and not to crash the box.
>
> The easy fix is to update the __warned variable with a xchg(). This way
> only one WARN_ON_ONCE() will actually happen, and prevents any issues
> of the WARN() causing the same WARN() to be hit and crash the system.

How about just updating __warned without a cmpxchg. It's not that critical
if the update is not seen immediately to other CPUs. OTOH it's critical
that's it is visible immediately to the current CPU

I mean some warrning can be hard to reproduce and happen to some users
while staying for several kernel releases. If it's repetitive, the xchg
might impact the performance.

I may be overly paranoid, but I think barrier() (so that at least
we don't recurse locally) alone would be better.

2013-10-15 20:21:53

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

On Tue, Oct 15, 2013 at 01:12:59PM -0700, Andrew Morton wrote:
> On Tue, 15 Oct 2013 15:58:06 -0400 Steven Rostedt <[email protected]> wrote:
>
> > The WARN_ON_ONCE() code is to trigger a waring only once when some
> > condition happens. But due to the way it is written it is racy.
> >
> > if (unlikely(condition)) {
> > if (WARN(!__warned))
> > __warned = true;
> > }
> >
> > The problem is that multiple CPUs could hit the same warning and
> > produce multiple output dumps of the same warning, or an interrupt could
> > happen and hit the same warning and do the warning in the middle of a
> > previous one, especially since the WARN() does a dump of the current
> > stack.
> >
> > Even more of a problem, a recent WARN_ON_ONCE() that was in the page
> > fault handler triggered and the stack dump of the WARN() caused the
> > same WARN_ON_ONCE() get hit again. Since the __warned = true is not
> > updated until after the WARN() is completed, each WARN() triggered
> > another page fault causing the stack to be filled and crashed the box.
> >
> > The point of WARN_ON() is to warn the user and not to crash the box.
> >
> > The easy fix is to update the __warned variable with a xchg(). This way
> > only one WARN_ON_ONCE() will actually happen, and prevents any issues
> > of the WARN() causing the same WARN() to be hit and crash the system.
>
> printk_once() has the same issue, and probably other places.
>
> Is there some sneaky way of doing this operation as a common thing,
> rather than open-coding it everywhere? Something like
>
> #define ONCE() ({
> static int state;
> int ret;
>
> ret = !xchg(&state, 1);
> ret;
> })

Oh, I have a pending patchset that I worked on a few weeks ago which does that.
I did not post it because it made WARN_ONCE using the unlikely text section, but
the diffstat was nice.

I'm going to post that as RFC just in case.

2013-10-15 20:25:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

On Tue, 15 Oct 2013 22:18:48 +0200
Frederic Weisbecker <[email protected]> wrote:


> How about just updating __warned without a cmpxchg. It's not that critical
> if the update is not seen immediately to other CPUs. OTOH it's critical
> that's it is visible immediately to the current CPU

Well, I didn't use cmpxchg() I used xchg() which is actually quite
faster.

>
> I mean some warrning can be hard to reproduce and happen to some users
> while staying for several kernel releases. If it's repetitive, the xchg
> might impact the performance.

But do we care about that? A WARN_ON() means the kernel (or hardware)
is buggy. It should be fixed.

But Andrew's ONCE() request is something we would want to avoid the
xchg() every time.

>
> I may be overly paranoid, but I think barrier() (so that at least
> we don't recurse locally) alone would be better.

Heh, Boris is giving me the same argument on IRC ;-)

-- Steve

2013-10-15 20:36:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

On Tue, 15 Oct 2013 13:12:59 -0700
Andrew Morton <[email protected]> wrote:


> Also, we're now incurring an atomic op for every "call". Presumably
> these calls are rare, but not necessarily - one can envisage uses of a
> generic ONCE() which are called at high frequency. Should we avoid
> that with
>
> #define ONCE() ({
> static int state;
> int ret;
>
> if (likely(state))
> ret = 0;
> else
> ret = !xchg(&state, 1);
> ret;
> })

I was talking with Boris on IRC about having a shortcut if "state" is
already true. I argued against it, but that was just for the WARN()
operations because I rather add a LOCK xchg, then more branches to hot
paths.

But for a generic ONCE() function, I guess we would want the shortcut
as it may be used by something that gets hit all the time with a normal
kernel.

-- Steve

2013-10-15 20:36:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

On Tue, 15 Oct 2013 22:21:36 +0200
Frederic Weisbecker <[email protected]> wrote:

> Oh, I have a pending patchset that I worked on a few weeks ago which does that.
> I did not post it because it made WARN_ONCE using the unlikely text section, but
> the diffstat was nice.
>
> I'm going to post that as RFC just in case.

Oh well, I guess I don't need to work on that anymore ;-)

-- Steve

2013-10-15 20:37:23

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

On Tue, Oct 15, 2013 at 04:25:40PM -0400, Steven Rostedt wrote:
> On Tue, 15 Oct 2013 22:18:48 +0200
> Frederic Weisbecker <[email protected]> wrote:
>
>
> > How about just updating __warned without a cmpxchg. It's not that critical
> > if the update is not seen immediately to other CPUs. OTOH it's critical
> > that's it is visible immediately to the current CPU
>
> Well, I didn't use cmpxchg() I used xchg() which is actually quite
> faster.

Still, I doubt it's that cheap.

>
> >
> > I mean some warrning can be hard to reproduce and happen to some users
> > while staying for several kernel releases. If it's repetitive, the xchg
> > might impact the performance.
>
> But do we care about that? A WARN_ON() means the kernel (or hardware)
> is buggy. It should be fixed.

Sure it should, doesn't mean it is. I've seen warnings that stayed on my
boot for several releases with non responsive maintainers (fortunately now
it seems that is got fixed).

>
> But Andrew's ONCE() request is something we would want to avoid the
> xchg() every time.

We we use it yeah.

>
> >
> > I may be overly paranoid, but I think barrier() (so that at least
> > we don't recurse locally) alone would be better.
>
> Heh, Boris is giving me the same argument on IRC ;-)
>
> -- Steve

2013-10-15 20:41:37

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] bug: Use xchg() to update WARN_ON_ONCE() static variable

On Tue, Oct 15, 2013 at 04:36:22PM -0400, Steven Rostedt wrote:
> On Tue, 15 Oct 2013 22:21:36 +0200
> Frederic Weisbecker <[email protected]> wrote:
>
> > Oh, I have a pending patchset that I worked on a few weeks ago which does that.
> > I did not post it because it made WARN_ONCE using the unlikely text section, but
> > the diffstat was nice.
> >
> > I'm going to post that as RFC just in case.
>
> Oh well, I guess I don't need to work on that anymore ;-)

Well, the idea of setting __warned before the warning is still something we want
to fix seperately I think :)