Subject: [patch] change WARN_ON back to "BUG: at ..."
From: Ingo Molnar <[email protected]>
WARN_ON() ever triggering is a kernel bug. Do not try to paper over this
fact by suggesting to the user that this is 'only' a warning, as the
following recent commit does:
commit 30e25b71e725b150585e17888b130e3324f8cf7c
Author: Jeremy Fitzhardinge <[email protected]>
Date: Fri Dec 8 02:36:24 2006 -0800
[PATCH] Fix generic WARN_ON message
A warning is a warning, not a BUG.
( it might make sense to rename BUG() to CRASH() and BUG_ON() to
CRASH_ON(), but that does not change the fact that WARN_ON()
signals a kernel bug. )
i and others objected to this change during lkml review:
http://marc.theaimsgroup.com/?l=linux-kernel&m=116115160710533&w=2
still the change slipped upstream - grumble :)
Also, use the standard "BUG: " format to make it easier to grep logs and
to make it easier to google for kernel bugs.
Signed-off-by: Ingo Molnar <[email protected]>
---
include/asm-generic/bug.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/include/asm-generic/bug.h
===================================================================
--- linux.orig/include/asm-generic/bug.h
+++ linux/include/asm-generic/bug.h
@@ -35,7 +35,7 @@ struct bug_entry {
#define WARN_ON(condition) ({ \
typeof(condition) __ret_warn_on = (condition); \
if (unlikely(__ret_warn_on)) { \
- printk("WARNING at %s:%d %s()\n", __FILE__, \
+ printk("BUG: at %s:%d %s()\n", __FILE__, \
__LINE__, __FUNCTION__); \
dump_stack(); \
} \
> + printk("BUG: at %s:%d %s()\n", __FILE__, \
how about
BUG: Warning at ....
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org
Ingo Molnar wrote:
> Subject: [patch] change WARN_ON back to "BUG: at ..."
> From: Ingo Molnar <[email protected]>
>
> WARN_ON() ever triggering is a kernel bug. Do not try to paper over this
> fact by suggesting to the user that this is 'only' a warning, as the
> following recent commit does:
>
I disagree.
I think there are two issues here: intent and effect.
What's the intent of WARN_ON? Presumably its different from BUG_ON,
otherwise you could just use BUG_ON. Or if not, why not just have
BUG_ON? I think in practice many WARN_ONs are clearly not intended to
be as serious as BUG_ON: they warn about unimplemented things, transient
hiccups, clarifications of errno returns, etc. (Whether WARN_ON is a
good mechanism for all these things is a separate issue.)
Their effects are very different too. The effect of WARN_ON is simply a
message; if I see it in a log, I know that something happened which
should be fixed, but the system is in a fairly sane state. If I see a
BUG_ON, then I know something was killed with extreme prejudice - at
best a process got killed, but there may be stray locks held or other
damage - and the system is basically teetering if it hasn't crashed
already. Because the effects of the two warning mechanisms are so
different, I think its important to make them clearly visually distinct
when scanning the kernel output. My eye is trained to see "BUG: " as
meaning "something destabilizing happened"; if warnings also appear that
way, then it is just needlessly confusing.
J
* Jeremy Fitzhardinge <[email protected]> wrote:
> What's the intent of WARN_ON? Presumably its different from BUG_ON,
> otherwise you could just use BUG_ON. Or if not, why not just have
> BUG_ON? I think in practice many WARN_ONs are clearly not intended to
> be as serious as BUG_ON: [...]
you are quite wrong. For example i cannot remember the last time i added
a BUG_ON() to the core kernel, because a BUG_ON() in core code only
makes it harder to get the log output back to the developer! Often the
system is still alive enough to get the information out but crashing it
via BUG_ON() hides the messages . So for example i exclusively use
WARN_ON() in core kernel code.
> [...] they warn about unimplemented things, transient hiccups,
> clarifications of errno returns, etc. [...]
Your claims defy reality: i just checked the 200+ WARN_ON()s that are in
linux/*/*.c, and /none/ is a 'transient' failure or hickup or
'clarification'. Each one i checked signals a real kernel bug that i'd
not want a production system to have. Non-fatal messages should and do
get a normal KERN_ printk.
an no, i dont want to do a large-scale rename in either direction. Let
it be up to the developer whether he wants to crash the kernel upon
seeing a bug or not. But one thing is sure: a WARN_ON() is a kernel bug
just as much as a BUG_ON(), in 99% of the cases.
here's the history of these primitives: BUG() used to be the only
primitive back in the days, then came BUG_ON() and iirc i was the one
who added WARN_ON() years ago, as a mechanism to signal kernel bugs in a
less destructive way. And that's how it's used in the kernel. If you
disagree and still understand it to be different, that's really your
problem i think ...
Ingo
I've always felt that it is wrong (or at least misleading) that WARN_ON
prints "BUG". It would have been better if it had said "WARNING", and only
BUG_ON says "BUG".
But lots of people have now written downstream log-parsing tools which
might break due to this change, so I'm inclined to go with Ingo's patch,
and restore the old (il)logic.
On Fri, 2006-12-22 at 12:04 -0800, Andrew Morton wrote:
> I've always felt that it is wrong (or at least misleading) that WARN_ON
> prints "BUG". It would have been better if it had said "WARNING", and only
> BUG_ON says "BUG".
>
> But lots of people have now written downstream log-parsing tools which
> might break due to this change, so I'm inclined to go with Ingo's patch,
> and restore the old (il)logic.
>From hearing what Ingo has to say about this, it seems that calling it
WARING was wrong in the first place.
BUG (and BUG_ON) are really fatal bugs. Meaning that if you continue,
you will corrupt the system.
WARN_ON is still a BUG, but we know enough about it that we can just
cripple the system so that it doesn't break anything. But keeps the
system running so that someone can either read the logs, and perhaps, if
it happens to a developer, be able to do some more diagnostics.
But really, the WARN_ON should only be used when the system did
something that is as bad as a BUG, but you don't need to crash the
system since you can prevent data from being corrupted. But you want to
flag this so that it can be fixed. Since it really is a BUG! Calling
BUG_ON while the user is in X is really hopeless, since they may never
know why their system just crashed.
The kernel already prints out "warning" for other things that are not
bugs. Usually a warning means that something might not be compatible.
Or something might degrade performance. Or you have buggy hardware.
But a "warning" doesn't usually mean a kernel BUG. WARN_ON on the other
hand should only be used where the cause of the WARN_ON was due to a bug
in the kernel, hence, the word "BUG" should be used.
At least with the -rt patch, we get reports of a BUG happening where it
was from a WARN_ON, and this is a Good Thing(TM). I can foresee users
ignoring a warning message if that's all they see, and they don't
realize that something has been crippled, and the system is unstable.
Sometimes a WARN_ON can be a cause of a later BUG. And since the BUG
output has a "cut here" we may not get the output of the true bug.
Users are much more apt to report messages of BUG than WARNING. So, I
totally agree with Ingo, we need people to see these as BUGs, and report
them whenever they happen. Because when it comes down to it, the warning
is about a bug.
-- Steve
* Steven Rostedt <[email protected]> wrote:
> WARN_ON is still a BUG, but we know enough about it that we can just
> cripple the system so that it doesn't break anything. [...]
well - a WARN_ON() can be /anything/. It is the same as BUG_ON(), but it
doesnt crash the box immediately and on purpose. In that sense all
existing BUG_ON()s should be converted to WARN_ON()s or to panic()s
(whichever fits best for that particular BUG_ON()).
[ or all WARN_ON()s should be converted to BUG_ON()s and the behavior of
BUG_ON() should be changed to /not/ crash the system on purpose - and
for those cases where we really do not want to continue, panic()
should be used. That way the user can set panic policy himself. ]
i can whip up a patch for any of these conversions, but i dont think we
need this flux right now.
Ingo
On Sat, 23 Dec 2006, Ingo Molnar wrote:
>
> i can whip up a patch for any of these conversions, but i dont think we
> need this flux right now.
>
I agree, it's not needed right now. But making BUG_ON panic seems to be a
good idea, but if you do make that change (and even if you don't), could
you please add the dump_stack into panic (like you have in -rt)? Thanks!
-- Steve
Hi!
> I've always felt that it is wrong (or at least misleading) that WARN_ON
> prints "BUG". It would have been better if it had said "WARNING", and only
> BUG_ON says "BUG".
>
> But lots of people have now written downstream log-parsing tools which
> might break due to this change, so I'm inclined to go with Ingo's patch,
> and restore the old (il)logic.
People should not be parsing syslog. If they do, they deserve
occassional breakage.
Pavel
--
Thanks for all the (sleeping) penguins.
* Pavel Machek <[email protected]> wrote:
> > But lots of people have now written downstream log-parsing tools
> > which might break due to this change, so I'm inclined to go with
> > Ingo's patch, and restore the old (il)logic.
>
> People should not be parsing syslog. If they do, they deserve
> occassional breakage.
so what other method do you propose to those who are working on
increasing the reliability of the kernel by running automatic regression
tests and boot allyesconfig kernels, to figure out whether out of the
tons of kernel logs there was any problem? Right now "^BUG: " is a
pretty universal filter, and i dont see a problem in preserving that.
(for lockdep there's a 'debug_locks' line in /proc/lockdep_stats that
tells us whether any lock related assert was printed by the kernel, and
my scripts monitor that. Along that line we could introduce a
/proc/sys/kernel/bugs counter for tools to monitor. But even after that,
you have to find the place in the syslog so having a text signature for
kernel bugs is still a good idea.)
Ingo