Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752642AbcJEFoc (ORCPT ); Wed, 5 Oct 2016 01:44:32 -0400 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:33577 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751347AbcJEFob (ORCPT ); Wed, 5 Oct 2016 01:44:31 -0400 Date: Wed, 5 Oct 2016 07:44:07 +0200 From: Willy Tarreau To: Linus Torvalds Cc: Paul Gortmaker , Johannes Weiner , Andrew Morton , Antonio SJ Musumeci , Miklos Szeredi , Linux Kernel Mailing List , stable Subject: Re: BUG_ON() in workingset_node_shadows_dec() triggers Message-ID: <20161005054407.GC7297@1wt.eu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1974 Lines: 38 On Tue, Oct 04, 2016 at 08:29:00PM -0700, Linus Torvalds wrote: > So what I think we should think about is: > > - extending the checkpatch warning to VM_BUG_ON too, to discourage new users. > > - look at making BUG_ON() simply be less lethal. Remove the > unrechable(), reorganize how the string is stored, and make it act > more like WARN_ON_ONCE() instead (it's the "rewind_stack_do_exit()" > that ends up causing us to try to kill things, we *could* just try to > stop doing that). > > - Instead of adding a BUG_ON_AND_HALT(), we could perhaps add a new > FATAL_ERROR() thing that acts like the current BUG_ON, and *not* call > it something similar (we don't want people doing mindless > conversions!). And that's the one that would do the whole > rewind_stack_do_exit() to kill the process. I think instead we should completely remove any simple way to halt the system and document how to do it. I've already seen some userland code stuffed with thousands of assert() everywhere and their developers are proud of this because their code looks clean and they show that they care for all errors. But the cost of their stupidity doesn't seem to affect them. Maybe they'll start to think about it the day they're brought into a self-driven car and will realize that it'd better recover from a failing flasher and not just crash in the middle of the highway. Thus since their motives are just to easily write nice-looking code, I'd simply force them to explicitly write their condition and the associated printk() and panic() calls. It will become much more of a hassle and will make their code less elegant, they should be much less tempted. So I think that we'd rather run a huge sed all over the code to replace BUG/BUG_ON with their WARN/WARN_ON equivalent. We'll very likely notice a lot of new gcc warnings from code that was supposed not to every be reachable, which will tell us a lot about some limited error checking in these respective code parts. Willy