MIME-Version: 1.0
In-Reply-To: <CAP=VYLrEqciiV-DiqR35bV9bDE47v6Ww-N4JnohvYaLWXc40UA@mail.gmail.com>
References: <CA+55aFwyNTLuZgOWMTRuabWobF27ygskuxvFd-P0n-3UNT=0Og@mail.gmail.com>
 <CAP=VYLrEqciiV-DiqR35bV9bDE47v6Ww-N4JnohvYaLWXc40UA@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 4 Oct 2016 20:29:00 -0700
Message-ID: <CA+55aFycvN=3DvsnRNpZbQ8z3893EK-nJA+V=Fx8o8yaviW7VA@mail.gmail.com>
Subject: Re: BUG_ON() in workingset_node_shadows_dec() triggers
To: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Antonio SJ Musumeci <trapexit@spawn.link>,
        Miklos Szeredi <miklos@szeredi.hu>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        stable <stable@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2524
Lines: 62

On Tue, Oct 4, 2016 at 7:43 PM, Paul Gortmaker
<paul.gortmaker@windriver.com> wrote:
>
> A couple years ago Ingo had an idea to kill  BUG_ON abuse that I made
> a 1st pass at.  Back then it seemed nobody cared.  Maybe that has since
> changed?
>
> https://lkml.org/lkml/2014/4/30/359

So we actually have the checkpatch warning already:

      # avoid BUG() or BUG_ON()
                if ($line =~ /\b(?:BUG|BUG_ON)\b/) {
                        my $msg_type = \&WARN;
                        $msg_type = \&CHK if ($file);
                        &{$msg_type}("AVOID_BUG",
                                     "Avoid crashing the kernel - try
using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" .
$herecurr);
                }

but it doesn't trigger on VM_BUG_ON().

And I'm not convinced about replacing things with BUG_ON_AND_HALT(),
it simply doesn't fix the existing issue we have: people use BUG_ON(),
and worse, _when_ they use BUG_ON(), they use it instead of error
handling, so the code _around_ the BUG_ON() tends to then very much
depend on what the BUG_ON() checks.

This is actually one way that VM_BUG_ON() is better: it's very much by
design something that can be compiled away, so at least hopefully
nobody thinks of it as a security measure. So we could just say that
we will treat VM_BUG_ON() as a WARN_ON_ONCE(), and just not kill the
machine.

Because I could easily imagine that somebody *does* treat BUG_ON()
that way, thinking "well, if that BUG_ON() triggers at least it won't
then go off the rails later".

In fact, right now we mark BUG() in such a way that gcc can even take
advantage of the "crash the machine" semantics, because we explicitly
mark the BUG() with "unreachable()".

So what I think we should think about is:

 - extending the checkpatch warning to VM_BUG_ON too, to discourage new users.

 - look at making BUG_ON() simply be less lethal. Remove the
unrechable(), reorganize how the string is stored, and make it act
more like WARN_ON_ONCE() instead (it's the "rewind_stack_do_exit()"
that ends up causing us to try to kill things, we *could* just try to
stop doing that).

 - Instead of adding a BUG_ON_AND_HALT(), we could perhaps add a new
FATAL_ERROR() thing that acts like the current BUG_ON, and *not* call
it something similar (we don't want people doing mindless
conversions!). And that's the one that would do the whole
rewind_stack_do_exit() to kill the process.

But apart from the checkpatch thing, it's actually a pretty big change.

               Linus