MIME-Version: 1.0
In-Reply-To: <CA+55aFxEbtVpT0rJyfpLmXOzCe4i4VO1s=M2Q9mq8XbUBNopCQ@mail.gmail.com>
References: <CA+55aFwyNTLuZgOWMTRuabWobF27ygskuxvFd-P0n-3UNT=0Og@mail.gmail.com>
 <CAP=VYLrEqciiV-DiqR35bV9bDE47v6Ww-N4JnohvYaLWXc40UA@mail.gmail.com>
 <CA+55aFycvN=3DvsnRNpZbQ8z3893EK-nJA+V=Fx8o8yaviW7VA@mail.gmail.com>
 <20161005054407.GC7297@1wt.eu> <CA+55aFxeamjJmckjA_n_BMwpGKkxd51Tint0z7E9CDHAnRZJAw@mail.gmail.com>
 <20161005190604.GA8116@1wt.eu> <CA+55aFxMf+-0B4oEqAiRcNm5A=S1eFu0ugRJUJX02K4yA_xNjg@mail.gmail.com>
 <CAGXu5j+zH2cr408tmoXcCE8NzZtxkHinThUqSd9pYgHuwyprBQ@mail.gmail.com> <CA+55aFxEbtVpT0rJyfpLmXOzCe4i4VO1s=M2Q9mq8XbUBNopCQ@mail.gmail.com>
From: Kees Cook <keescook@chromium.org>
Date: Wed, 5 Oct 2016 15:17:02 -0700
Message-ID: <CAGXu5jJ9sAXauDMeW262qX_42TS2gmJBsR1yq2XDeHzn+54PoA@mail.gmail.com>
Subject: Re: BUG_ON() in workingset_node_shadows_dec() triggers
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>,
        Paul Gortmaker <paul.gortmaker@windriver.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Antonio SJ Musumeci <trapexit@spawn.link>,
        Miklos Szeredi <miklos@szeredi.hu>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        stable <stable@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2051
Lines: 44

On Wed, Oct 5, 2016 at 2:46 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Oct 5, 2016 at 2:14 PM, Kees Cook <keescook@chromium.org> wrote:
>> Now, it can be argued that killing the process part should be
>> configurable and that the code should be written to handle a WARN and
>> clean up and error out nicely. But I still want to retain the "kill
>> the process immediately" behavior in some capacity.
>
> If "some capacity" is "can't do user space accesses", we could easily
> force a SIGKILL of the current process. It won't die immediately in
> the kernel, but it won't be returning to user space either.

With my more paranoid desires, I would prefer to keep "stop kernel
execution with the state set up by this process", not just "make the
process never return to user-space". I would need to meditate on
whether what I really want is just "panic on Oops" or not, though.
Right now, for example, I don't use panic-on-oops when running lkdtm
tests since each test gets (correctly) killed and the Oops can be
examined for the expected failure mode, all without bringing down the
entire system.

> The problem with the immediate kill is that it can be in interrupt
> context, or just holding arbitrary locks. And it's hard to even tell
> dynamically (sometimes you can see it: with preemption enabled you can
> tell "am I in a non-preempt area", for example, but it ends up
> depending on config options).

Yeah, I've seen some hilarious failure modes while building lkdtm
tests for various kernel self-protections.

> And *if* we make BUG() actually do something sane (non-trapping), we
> can easily make it be generic, not arch-specific. In fact, I'd
> implement it by just adding a "handle_bug()" in kernel/panic.c...

Yeah, I'm not sure what the right next step would be. Do we need a new
set of functions between WARN and BUG? Or maybe extract the
process-killing logic on a per-arch level and make it a specific API
so that it can be explicitly called as part of error-handling? Hmm

-Kees

-- 
Kees Cook
Nexus Security