Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754769AbcJEVOK (ORCPT ); Wed, 5 Oct 2016 17:14:10 -0400 Received: from mail-wm0-f54.google.com ([74.125.82.54]:36732 "EHLO mail-wm0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753509AbcJEVOJ (ORCPT ); Wed, 5 Oct 2016 17:14:09 -0400 MIME-Version: 1.0 In-Reply-To: References: <20161005054407.GC7297@1wt.eu> <20161005190604.GA8116@1wt.eu> From: Kees Cook Date: Wed, 5 Oct 2016 14:14:06 -0700 X-Google-Sender-Auth: LC6FCu7mh1fj3Z3_bnzaiGUBDxA Message-ID: Subject: Re: BUG_ON() in workingset_node_shadows_dec() triggers To: Linus Torvalds Cc: Willy Tarreau , Paul Gortmaker , Johannes Weiner , Andrew Morton , Antonio SJ Musumeci , Miklos Szeredi , Linux Kernel Mailing List , stable Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2750 Lines: 64 On Wed, Oct 5, 2016 at 12:18 PM, Linus Torvalds wrote: > On Wed, Oct 5, 2016 at 12:06 PM, Willy Tarreau wrote: >> >> I have the same doubts, so at least I would not want to run the "sed" >> immediately, at least to keep the initial intent. But I think everyone >> is right in is own yard when he puts a BUG_ON() when he doesn't know >> how to handle an unsafe situation, he's wrong from a global perspective. > > Yes. And as you say, even when the developer might be right in sone > situations, you'd easily still be wrong for the same code in some > other situation. I just want to chime and and confirm that we really don't want to just wholesale replace BUG with WARN. Most situations using BUG (whether or not they should be) are totally unprepared to continue execution. Which means we'd just get some memory trap or bizarre crash after the WARN instead of the "clean" BUG behavior. > Quite frankly, I wouldn't do a sed-script pass to actually change > existing users. I'd just change how the BUG() implementation itself > works. Not make it a direct WARN_ON(), but perhaps something like > > - use WARN_ON() with a global rate limiter (we do *not* want BUG > cascades, but re-enable the warning after a few minutes) > > - have some kernel command line option for the server people to allow > them to just force a reboot for it > > Hmm? > > Anybody want to play with it? We absolutely have a granularity problem, but we have to retain the no-continued-execution nature of BUG() users. The problem with BUG() is that it is so context-sensitive. In the case you hit, killing the process and continuing life fundamentally failed and the entire system fell over. That wasn't the intent, obviously, but that BUG() got effectively "promoted" to panic(). The cases where I've used BUG() are entirely about doing two things: reporting the current state of the CPU and call stack and to kill the process. (And I'd like to add a third: passing a meaningful string, which right now has to happen with a separate pr_*() call that appears outside the "cut here" line that x86 produces on a BUG.) Now, it can be argued that killing the process part should be configurable and that the code should be written to handle a WARN and clean up and error out nicely. But I still want to retain the "kill the process immediately" behavior in some capacity. The implementation of BUG is also arch-specific, which is frustrating to make changes on. So, maybe another question is "when does BUG kill the system and not just the process?" And can we detect these like we already detect bad locking, interrupt contexts, etc? (Is this question going to have an arch-specific answer?) -Kees -- Kees Cook Nexus Security