Linus Torvalds wrote:
> On Tue, 10 Sep 2002, David Brownell wrote:
>
>>>In short:
>>>
>>> Either you want debugging (in which case BUG() is the wrong thing to
>>> do), or you don't want debugging (in which case BUG() is the wrong thing
>>> to do). You can choose either, but in neither case is BUG() acceptable.
>>
>>Or in even shorter sound bite format: "Just say no to BUG()s."
>
>
> Well, the thing is, BUG() _is_ sometimes useful. It's a dense and very
> convenient way to say that something catastrophic happened.
>
> And actually, outside of drivers and filesystems you can often know (or
> control) the number of locks the surrounding code is holding, and then a
> BUG() may not be as lethal. At which point the normal "oops and kill the
> process" action is clearly fine - the machine is still perfectly usable.
I know you probably don't like the name, but all over the kernel people
are using BUG() as ASSERT()... so why not create what people want?
IMO we should have ASSERT() and OHSHIT(), the latter being the true
meaning and current implementation of BUG(), the former being used when
the machine is still useable.
Jeff
From: Jeff Garzik <[email protected]>
Date: Tue, 10 Sep 2002 13:16:03 -0400
IMO we should have ASSERT() and OHSHIT(),
I fully support the addition of an OHSHIT() macro.
:-)
On Tue, 10 Sep 2002, David S. Miller wrote:
>
> IMO we should have ASSERT() and OHSHIT(),
>
> I fully support the addition of an OHSHIT() macro.
Oh, please no. We'd end up with endless asserts in the networking layer,
just because David would find it amusing.
I can just see it now - code bloat hell.
And no, I still don't like ASSERT().
I think the approach should clearly spell what the trouble level is:
DEBUG(x != y, "x=%d, y=%d\n", x, y);
WARN(x != y, "crap happens: x=%d y=%d\n", x, y);
FATAL(x != y, "Aiee: x=%d y=%d\n", x, y);
where the DEBUG one gets compiled out normally (or has some nice per-file
way of being enabled/disabled - a perfect world would expose the on/off in
devicefs as a per-file entity when kernel debugging is on), WARN continues
but writes a message (and normally does _not_ get compiled out), and FATAL
is like our current BUG_ON().
All would print out the filename and line number, the message, and the
backtrace.
Linus
On Tue, Sep 10, 2002 at 11:40:27AM -0700, Linus Torvalds wrote:
>
> On Tue, 10 Sep 2002, David S. Miller wrote:
> >
> > IMO we should have ASSERT() and OHSHIT(),
> >
> > I fully support the addition of an OHSHIT() macro.
>
> Oh, please no. We'd end up with endless asserts in the networking layer,
> just because David would find it amusing.
>
> I can just see it now - code bloat hell.
>
> And no, I still don't like ASSERT().
>
> I think the approach should clearly spell what the trouble level is:
>
> DEBUG(x != y, "x=%d, y=%d\n", x, y);
>
> WARN(x != y, "crap happens: x=%d y=%d\n", x, y);
>
> FATAL(x != y, "Aiee: x=%d y=%d\n", x, y);
>
> where the DEBUG one gets compiled out normally (or has some nice per-file
> way of being enabled/disabled - a perfect world would expose the on/off in
> devicefs as a per-file entity when kernel debugging is on), WARN continues
> but writes a message (and normally does _not_ get compiled out), and FATAL
> is like our current BUG_ON().
Which still leaves the question, does it really make sense for
FATAL/BUG to forcibly kill the machine? If the bug is truly fatal,
presumably the machine kills itself in short order anyway, otherwise
we might have a shot at recording the situation. A more useful
distinction might be in terms of risk of damaging filesystems (or perhaps
hardware) if we continue, something like BROKEN/DANGEROUSLY_BROKEN.
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
From: Linus Torvalds <[email protected]>
Date: Tue, 10 Sep 2002 11:40:27 -0700 (PDT)
We'd end up with endless asserts in the networking layer,
just because David would find it amusing.
:-) You know some of us do indeed miss the days of occaisionally
waking up to find things like "inode.c: the problem is here" on
our screens.
On Tue, 10 Sep 2002, Oliver Xymoron wrote:
>
> Which still leaves the question, does it really make sense for
> FATAL/BUG to forcibly kill the machine?
No. It should only be "locally fatal", and it should clearly just do what
BUG() does now - kill the process.
But that implies very much that you really cannot use FATAL() in general
at all, since it would be illegal to use whenever some caller holds some
non-local locks (which is almost always the case for most "peripheral
code").
Linus
On Tue, 10 Sep 2002, Oliver Xymoron wrote:
> Which still leaves the question, does it really make sense for
> FATAL/BUG to forcibly kill the machine? If the bug is truly fatal,
> presumably the machine kills itself in short order anyway, otherwise
> we might have a shot at recording the situation. A more useful
> distinction might be in terms of risk of damaging filesystems (or perhaps
> hardware) if we continue, something like BROKEN/DANGEROUSLY_BROKEN.
And that's the heart of the thing, if continuing is likely to trash
filesystem or (unlikely) damage hardware, then the system should go down
RIGHT NOW.
I've often wondered if it wouldn't be better to allow the user to provide
a partition for oops use, where the kernel could write kmen and a few
chosen other bit of information. Get all the oops output formatting code
out of the kernel. Then the user could run tools like ksymoops against the
oops after reboot, and a small utility could wrap and compress the oops,
symbols table, config, etc, for future use by the user or developer.
I've sent a fair number of crash dumps of AIX to IBM, seems a good idea.
And developers could have personal tools, archetecture dependent tools,
ksysoops could be enhanced after the fact.
If it would help with a nasty bug I'd put all that and the kernel source I
used, the module tree, etc, on a CD and send it. Whatever helps solve the
problem. Too often the output is lost on the console and not written
anywhere recoverable.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.