Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3532222imu; Mon, 24 Dec 2018 03:59:42 -0800 (PST) X-Google-Smtp-Source: ALg8bN5ehwO2Q9gx7e5hGUhLakgnN8HoVNvSWNk14f2dEKbjHz9uM1CcRPhutKBZqJxZog+icBvN X-Received: by 2002:a17:902:1682:: with SMTP id h2mr12714922plh.243.1545652782378; Mon, 24 Dec 2018 03:59:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545652782; cv=none; d=google.com; s=arc-20160816; b=tW8InQ0DNqZ0Eret8KRWk/+8JLUbGMjEp02Vj34xswjvG0XUAR2Z7/XYuoVo3pmtep VnLq7czX8drdTYh4w0ue23I1GKxBnJ5SNBWf3uUTIgzb3LcitAOuNJHyzaIACqlF6scz BsHQ2p+AL9+yb6ZdEA5DVwMwJvl9JmxkFKi3Ffm8OZYCYx7Y5jg5Oi/JHEaRvB/kc+7V vAkTKGOB4V0wSFQjyN2fiEnr3JlM4v7m70OLfEXeeWygk7IzvsysAGFnaNbmNq8H6hce qm85gBqxwNtrMK27xqWJrRlWGmNfvtzg4KKObyGYKeVjiEfiMJjWuFCQEKaL4qYxmc+V Rbgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=Dsa1UZlhXVxjOcRc2wQXLmaY9e4FunxgC8tJ8azhmSQ=; b=U7/sQacWPSP/yZ6JQYBNZnkUfC4PrzRSgCGUm7y3jpWe1Oa4VCpCqE/fcxbDNhMCjB 3z+bwGwCJBV9Pkc/BqOq6Sedzt5DJTXx89isj0Gi2GA/kNpkiT6Ta6EbvGscYR32kg/M gkgWeqx4B7ePgIcDf0mcNNSqQ9bBffWEB52HP7HTfr3JQ5uNjG5tC9RsdLGJwfk1dxZp neuME017+3q+iaajllYwXYGM0S88drd9MdJOUah2t1ZfTTIkftt4CHjMDocvtF+6E44D OkEmqdv6VWr76wNNqRdHgPZP/95BZ8z0IveCJ7lpW2CJbjBepjV6zKcPa8Or1a8ttxDN GiSw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b1si27962808plc.332.2018.12.24.03.59.26; Mon, 24 Dec 2018 03:59:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725909AbeLXL62 (ORCPT + 99 others); Mon, 24 Dec 2018 06:58:28 -0500 Received: from 178.115.242.59.static.drei.at ([178.115.242.59]:60427 "EHLO mail.osadl.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725355AbeLXL61 (ORCPT ); Mon, 24 Dec 2018 06:58:27 -0500 Received: by mail.osadl.at (Postfix, from userid 1001) id 85CF45C06EB; Mon, 24 Dec 2018 12:58:18 +0100 (CET) Date: Mon, 24 Dec 2018 12:58:18 +0100 From: Nicholas Mc Guire To: Michal Hocko Cc: David Rientjes , Nicholas Mc Guire , Andrew Morton , Chintan Pandya , Andrey Ryabinin , Arun KS , Joe Perches , "Luis R. Rodriguez" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail Message-ID: <20181224115818.GA3063@osadl.at> References: <1545337437-673-1-git-send-email-hofrat@osadl.org> <20181222080421.GB26155@osadl.at> <20181224081056.GD9063@dhcp22.suse.cz> <20181224093804.GA16933@osadl.at> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181224093804.GA16933@osadl.at> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 24, 2018 at 10:38:04AM +0100, Nicholas Mc Guire wrote: > On Mon, Dec 24, 2018 at 09:10:56AM +0100, Michal Hocko wrote: > > On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote: > > > On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote: > > > > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote: > > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > > > index 871e41c..1c118d7 100644 > > > > > --- a/mm/vmalloc.c > > > > > +++ b/mm/vmalloc.c > > > > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void) > > > > > > > > > > /* Import existing vmlist entries. */ > > > > > for (tmp = vmlist; tmp; tmp = tmp->next) { > > > > > - va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT); > > > > > + va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL); > > > > > va->flags = VM_VM_AREA; > > > > > va->va_start = (unsigned long)tmp->addr; > > > > > va->va_end = va->va_start + tmp->size; > > > > > > > > Hi Nicholas, > > > > > > > > You're right that this looks wrong because there's no guarantee that va is > > > > actually non-NULL. __GFP_NOFAIL won't help in init, unfortunately, since > > > > we're not giving the page allocator a chance to reclaim so this would > > > > likely just end up looping forever instead of crashing with a NULL pointer > > > > dereference, which would actually be the better result. > > > > > > > tried tracing the __GFP_NOFAIL path and had concluded that it would > > > end in out_of_memory() -> panic("System is deadlocked on memory\n"); > > > which also should point cleanly to the cause - but I?m actually not > > > that sure if that trace was correct in all cases. > > > > No, we do not trigger the memory reclaim path nor the oom killer when > > using GFP_NOWAIT. In fact the current implementation even ignores > > __GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect > > that we used to loop fpr __GFP_NOFAIL at some point in the past). The > > patch simply doesn't have any effect. But the primary objection is that > > the behavior might change in future and you certainly do not want to get > > stuck in the boot process without knowing what is going on. Crashing > > will tell you that quite obviously. Although I have hard time imagine > > how that could happen in a reasonably configured system. > > I think most of the defensive structures are covering rare to almost > impossible cases - but those are precisely the hard ones to understand if > they do happen. > > > > > > > You could do > > > > > > > > BUG_ON(!va); > > > > > > > > to make it obvious why we crashed, however. It makes it obvious that the > > > > crash is intentional rather than some error in the kernel code. > > > > > > makes sense - that atleast makes it imediately clear from the code > > > that there is no way out from here. > > > > How does it differ from blowing up right there when dereferencing flags? > > It would be clear from the oops. > > The question is how soon does it blow-up if it were imediate then three is > probably no real difference if there is some delay say due to the region > affected by the NULL pointer not being imediately in use - it may be very > hard to differenciate between an allocation failure and memory corruption > so having a directly associated trace should be significantly simpler to > understand - and you might actually not want a system to try booting if there > are problems at this level. > sorry - you are right - it would blow up imediately - so there is no way this could be delayed in this case. So then its just a matter of the code making clear that the NULL case was considered - by a comment or by BUG_ON(). thx! hofrat