Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934263AbcLMQXl (ORCPT ); Tue, 13 Dec 2016 11:23:41 -0500 Received: from mail-lf0-f52.google.com ([209.85.215.52]:34305 "EHLO mail-lf0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932594AbcLMQXh (ORCPT ); Tue, 13 Dec 2016 11:23:37 -0500 MIME-Version: 1.0 In-Reply-To: References: From: Dmitry Vyukov Date: Tue, 13 Dec 2016 17:23:14 +0100 Message-ID: Subject: Re: usb/core: warning in usb_create_ep_devs/sysfs_create_dir_ns To: syzkaller Cc: Andrey Konovalov , Greg Kroah-Hartman , USB list , LKML , Kostya Serebryany Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5734 Lines: 129 On Tue, Dec 13, 2016 at 4:52 PM, Alan Stern wrote: > On Tue, 13 Dec 2016, Dmitry Vyukov wrote: > >> >> > If it is >> >> > not a bug in kernel source code, then it must not produce a WARNING. >> > >> > What about a memory allocation failure? The memory management part of >> > the kernel produces a WARNING message if an allocation fails and the >> > caller did not specify __GFP_NOWARN. >> > >> > There is no way for a driver to guarantee that a memory allocation >> > request will succeed -- failure is always an option. But obviously >> > memory allocation failures are not bugs in the kernel. >> > >> > Are you saying that mm/page_alloc.c:warn_alloc() should produce >> > something other than a WARNING? >> >> >> The main thing I am saying is that we absolutely need a way for a >> human or a computer program to be able to determine if there is >> anything wrong with kernel or not. > > Okay, I agree with that. I'm not at all sure that this decision should > be based on whether a message is a WARNING. > > In my experience, there is no general consensus on what conditions > should qualify as a WARNING. There aren't any hard-and-fast rules, so > people are forced to rely on their own judgment. The general > interpretation seems to be that WARNING is appropriate for notifying > the user whenever something important has gone unexpectedly wrong. > That covers a lot of ground, including things (like hardware failures) > that are not kernel bugs. see below >> Page_alloc produces a WARNING iff you ask for an amount of memory that >> can't possibly be allocated under any circumstances (order >= >> MAX_ORDER). > > Doesn't it also produce a WARNING under other circumstances? No. OOM is not a WARNING and is easily distinguishable from BUG/WARNING. >> That's not just an allocation failure. Kernel itself >> should not generally ask for such large amounts of memory. If the >> allocation size is dictated by user, then kernel code should either >> use __GFP_NOWARN, or impose own stricter limit dictated by context >> (e.g. if it's a text command of known format, then limit can be as >> small as say 128 bytes). > > All right, yes, it makes sense to use __GFP_NOWARN when the allocation > size is dictated by the user. > > But still, even when a stricter limit has been imposed, a memory > allocation request may fail. In fact, as far as I know, the kernel has > _no_ guarantee at all that any memory allocation request will succeed! > The best it offers is that sufficiently small requests may wait > indefinitely for memory to become available, which isn't much better > than failing. Memory allocator does not print WARNINGs on allocation failures. >> Re fake hardware. panic_on_warn will definitely cause problems. I >> don't know if it used in any paranoid production systems or not, >> though. But more generally, I don't see how it is different from >> incorrect syscall arguments or nonsensical network packets received >> from free internet. In ideal productions environments none of these >> incorrect inputs to kernel should happen. I don't see any single >> reason to not protect kernel from incorrect input in this case as >> well, as we do in all other cases. > > I _do_ see a reason. Real computers don't live in ideal production > environments. Why go to extra trouble to add protection for some > particular incorrect input when the kernel is _already_ capable of > handling this input in a valid manner (even though this may include > logging a WARNING message)? One reason is to be able to distinguish kernel bugs from incorrect inputs to kernel. also see below >> In particular, it would resolve a >> very real and practical issue for us -- fuzzer will not reboot machine >> every minute, and we will not spend time looking at these WARNINGs, >> and we will not spend your time by reporting these WARNINGs. > > Maybe you should decide that ERROR messages indicate a kernel bug, > rather than WARNING messages. Even that is questionable, but you will > get far fewer false positives. > > Even better, you should publicize this decision (in the kernel > documentation, on various mailing lists, on LWN, and so forth), and > enforce it by reducing existing ERROR severity levels to WARNINGs in > places where they do not indicate a kernel bug. OK, I have some numbers on my hands. To date we've run about 1e12 random programs covering 1000+ syscalls and have found 300+ bugs. 50+ of these bugs are found on WARNINGs (search by "warning" here https://github.com/google/syzkaller/wiki/Found-Bugs). Some of the bugs found on WARNINGs are as severe as VMM exploitations. With that coverage the _only_ case I know of now where a WARNING does not mean bug in kernel source code is basically the one we are arguing about now. Provided that total number of WARN/WARN_ON in kernel code is ~10000. There is currently a very-very-very-very-very clear WARNING usage practice that suggests that WARNING is-a bug in kernel code. If we stop treating WARNINGs as bugs we will miss dozens of real bugs. This just doesn't make sense provided that we are talking about basically a single precedent. We actually even treat INFO as bugs, because there are: INFO: possible circular locking dependency detected INFO: rcu_preempt detected stalls INFO: rcu_sched detected stalls INFO: rcu_preempt self-detected stall on CPU INFO: rcu_sched self-detected stall on CPU INFO: suspicious RCU usage INFO: task .* blocked for more than [0-9]+ seconds with only a single exception of: INFO: lockdep is turned off For INFO the story is so clear. The name strongly suggests that it is not a bug. So probably we should promote harmful INFO to WARNINGs. And then stop treating INFO as bugs.