Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752595AbbFHIJR (ORCPT ); Mon, 8 Jun 2015 04:09:17 -0400 Received: from mail-wi0-f176.google.com ([209.85.212.176]:36972 "EHLO mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751989AbbFHIJJ (ORCPT ); Mon, 8 Jun 2015 04:09:09 -0400 Date: Mon, 8 Jun 2015 10:09:04 +0200 From: Ingo Molnar To: Alexander Holler Cc: Linus Torvalds , Tejun Heo , Louis Langholtz , Linux Kernel Mailing List , trivial@kernel.org, Rusty Russell , Andrew Morton , Peter Zijlstra , Thomas Gleixner Subject: Re: [PATCH] debug: Deprecate BUG_ON() use in new code, introduce CRASH_ON() Message-ID: <20150608080903.GA1236@gmail.com> References: <1433721270-9182-1-git-send-email-lou_langholtz@me.com> <20150608000007.GA3543@mtj.duckdns.org> <20150608071215.GA369@gmail.com> <557546E6.5030304@ahsoftware.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <557546E6.5030304@ahsoftware.de> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2774 Lines: 60 * Alexander Holler wrote: > Am 08.06.2015 um 09:12 schrieb Ingo Molnar: > > > >* Linus Torvalds wrote: > > > >>Stop with the random BUG_ON() additions. > > > > Yeah, so I propose the attached patch which attempts to resist new BUG_ON() > > additions. > > As this reminded me at flame I received once from a maintainer because I wanted > to avoid a desastrous memory corruption by using a BUG_ON(). maybe someone > should mention that a BUG_ON or now CRASH_ON should be still prefered instead of > some random memory corruption which might lead to worse things. Or how is the > viewpoint of the kernel masters in regard to memory corruptions and use of > BUG_ON, WARN_ON or CRASH_ON? So it depends on the actual change, but there's very few cases where a BUG_ON() is justified, even if the code detects memory corruption. Most instances of memory corruption either come from the hardware or come from some other piece of code, so _your_ code crashing the system will be unexpected, and in most cases unproductive to finding the cause of the corruption. The best action is to stop doing whatever your code was doing, trying to bail out with as little extra changes done to the system as possible. An example for that are lockdep's asserts. An actual lockdep warning in a released, production kernel is frequently connected to a real risk of data corruption - yet what we do is that we report the bug non-intrusively and turn off lockdep completely, so that it does not make the situation worse and that we have a chance the messages can be saved and can be reported back to kernel developers. The origins of widespread BUG_ON() use are twofold: - 20 years ago we didn't have much of any locking in the kernel, so a BUG_ON() resulted in essence in a graceful segfault of the application that happened to trigger it, in most cases. Kernel logs were still possible to retrieve if the bug did not trigger too often - and if not (because for example the crash happened in the idle thread) then the backtrace was still visible on the VGA text console. - in the early days we didn't have WARN_ON(), we only had BUG_ON(), so people used that. BUG_ON() used to be the 'graceful' assert, panic() was the equivalent of CRASH_ON(). These days a BUG_ON() is almost always fatal due to unreleased locks, plus we still don't print kernel crashes to the graphical console, so they are silent hard lockups in 99% of the cases. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/