Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754343AbcJED3D (ORCPT ); Tue, 4 Oct 2016 23:29:03 -0400 Received: from mail-oi0-f49.google.com ([209.85.218.49]:33609 "EHLO mail-oi0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752551AbcJED3B (ORCPT ); Tue, 4 Oct 2016 23:29:01 -0400 MIME-Version: 1.0 In-Reply-To: References: From: Linus Torvalds Date: Tue, 4 Oct 2016 20:29:00 -0700 X-Google-Sender-Auth: uNAtA8AmOIhE1NxUz2yUBktLf3Y Message-ID: Subject: Re: BUG_ON() in workingset_node_shadows_dec() triggers To: Paul Gortmaker Cc: Johannes Weiner , Andrew Morton , Antonio SJ Musumeci , Miklos Szeredi , Linux Kernel Mailing List , stable Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2524 Lines: 62 On Tue, Oct 4, 2016 at 7:43 PM, Paul Gortmaker wrote: > > A couple years ago Ingo had an idea to kill BUG_ON abuse that I made > a 1st pass at. Back then it seemed nobody cared. Maybe that has since > changed? > > https://lkml.org/lkml/2014/4/30/359 So we actually have the checkpatch warning already: # avoid BUG() or BUG_ON() if ($line =~ /\b(?:BUG|BUG_ON)\b/) { my $msg_type = \&WARN; $msg_type = \&CHK if ($file); &{$msg_type}("AVOID_BUG", "Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr); } but it doesn't trigger on VM_BUG_ON(). And I'm not convinced about replacing things with BUG_ON_AND_HALT(), it simply doesn't fix the existing issue we have: people use BUG_ON(), and worse, _when_ they use BUG_ON(), they use it instead of error handling, so the code _around_ the BUG_ON() tends to then very much depend on what the BUG_ON() checks. This is actually one way that VM_BUG_ON() is better: it's very much by design something that can be compiled away, so at least hopefully nobody thinks of it as a security measure. So we could just say that we will treat VM_BUG_ON() as a WARN_ON_ONCE(), and just not kill the machine. Because I could easily imagine that somebody *does* treat BUG_ON() that way, thinking "well, if that BUG_ON() triggers at least it won't then go off the rails later". In fact, right now we mark BUG() in such a way that gcc can even take advantage of the "crash the machine" semantics, because we explicitly mark the BUG() with "unreachable()". So what I think we should think about is: - extending the checkpatch warning to VM_BUG_ON too, to discourage new users. - look at making BUG_ON() simply be less lethal. Remove the unrechable(), reorganize how the string is stored, and make it act more like WARN_ON_ONCE() instead (it's the "rewind_stack_do_exit()" that ends up causing us to try to kill things, we *could* just try to stop doing that). - Instead of adding a BUG_ON_AND_HALT(), we could perhaps add a new FATAL_ERROR() thing that acts like the current BUG_ON, and *not* call it something similar (we don't want people doing mindless conversions!). And that's the one that would do the whole rewind_stack_do_exit() to kill the process. But apart from the checkpatch thing, it's actually a pretty big change. Linus