Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp330124ybb; Wed, 1 Apr 2020 00:27:42 -0700 (PDT) X-Google-Smtp-Source: APiQypIjJVXUrDcrJryMt0ff2K7bJpOAvDh0iagikaum5OWW6QJyRoiMl9Aycg90Dp/fX/0ewXfY X-Received: by 2002:aca:b308:: with SMTP id c8mr1774392oif.106.1585726062701; Wed, 01 Apr 2020 00:27:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585726062; cv=none; d=google.com; s=arc-20160816; b=t93s8S5G36oOPWF2IR2NC1C3sSjNbZqqeCG2RKCOc2WoOWlM1V0qA2p3sLYpW/XL0N R6cY8NhkA3272KhwSH+Dqwlc5RztTOk/TMcwq9NYl+Fivw3aqYJ6dYrBioN8XfFs6M2a OmmmFt6Cpj46kroUqhYaTKAiDFJWptiNMcCJrTcbvo03J27fYZC03GcGAqO9VbZx3jTG 6Sk/KLoLcx1XcJ8F1E8F2JtMxIG/VtFUINjA42qehmHT/vGq5e54wZOvBKhCyIPsLiwm aw7t1+CzTpy2GjvMAX3ZAxhSRW8I6dwDoY1zTeyFfam1nPcncg1aV2yg2b/R7HkqhWxh THCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=bFk7ot1Ggv8JmeCshN6lhpNLDTZm2f9F+linXAAvHe0=; b=Nd7gLvDo4B3XDCLBOrIQjp88LlCcy0iZ0Zo1v2Vt6nfS72wRnHrOfXiswe63UUBtTP qwtoi638nC3gHA1sfWtuVcARdJVBHYVV5J25U2y7/dQurADvFQDgmPmbCkMTI/oU/xz5 55GkufrhIIFw+03NYMdo1IrXahwXH4C6dhsqPE8AN5IcN6Bv3zbabeH9OASkteWWseYL 2MWExpfq/7nwywaf+9s4KdtNwJcYSOnr+TDWISq8r/XddMh0+DTaR5LMzLrovc1MCB1j LvxhmoqEQRZwKd7byVH1GV+FkXx2PtydCAGS2HDp1/sLTuhK6DSnWpbXTP3g54aGK/2k jE1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l20si471156otk.242.2020.04.01.00.27.25; Wed, 01 Apr 2020 00:27:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731912AbgDAHYE (ORCPT + 99 others); Wed, 1 Apr 2020 03:24:04 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:52179 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731680AbgDAHYE (ORCPT ); Wed, 1 Apr 2020 03:24:04 -0400 Received: by mail-wm1-f67.google.com with SMTP id z7so2714045wmk.1; Wed, 01 Apr 2020 00:24:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=bFk7ot1Ggv8JmeCshN6lhpNLDTZm2f9F+linXAAvHe0=; b=DzMdeOWySEqZWGLrodYmq1OgQzEHI5Mmg7vNBBrch7wE2M3E1vwkfrinnznPwJt3Tp fMKjExsVeRZQYNjRBZyXNyWemJokk/dtwDJMzEA/0fnGr+UK/L+FiuNzEQi9TOInnRGm gMJ6mnSA6IBE934HxbRrdaEken5/SbbyFGhkj5gtmH5F5D2wuU1GxF5ipRwepWVQDXEB ccvRrdf88XEU2e4fgENUvUBA7odUai1you+jrzsATwbtjqBw3BtN+ixUZ9Cy/R5oitQQ r9ms3bVaatg0VOZ6aqwq9NgURZD6XL3NDi/aCOTxj//fVXjqqHE+pDGHZqnxkA7Pqxx/ 7ZNQ== X-Gm-Message-State: AGi0PubD6BlbMEv5kw2itMidCH3b8SMexddC8kQTKlTVnDGI8zxCg1aX /1D/1o8ZIjzZOUGOgFGUBRM= X-Received: by 2002:a1c:648b:: with SMTP id y133mr2720322wmb.173.1585725840905; Wed, 01 Apr 2020 00:24:00 -0700 (PDT) Received: from localhost (ip-37-188-180-223.eurotel.cz. [37.188.180.223]) by smtp.gmail.com with ESMTPSA id k15sm1740199wrm.55.2020.04.01.00.23.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Apr 2020 00:24:00 -0700 (PDT) Date: Wed, 1 Apr 2020 09:23:59 +0200 From: Michal Hocko To: Joel Fernandes Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, rcu@vger.kernel.org, willy@infradead.org, peterz@infradead.org, neilb@suse.com, vbabka@suse.cz, mgorman@suse.de, Andrew Morton , Josh Triplett , Lai Jiangshan , Mathieu Desnoyers , "Paul E. McKenney" , Steven Rostedt Subject: Re: [PATCH RFC] rcu/tree: Use GFP_MEMALLOC for alloc memory to free memory pattern Message-ID: <20200401072359.GC22681@dhcp22.suse.cz> References: <20200331131628.153118-1-joel@joelfernandes.org> <20200331145806.GB236678@google.com> <20200331153450.GM30449@dhcp22.suse.cz> <20200331160117.GA170994@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200331160117.GA170994@google.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 31-03-20 12:01:17, Joel Fernandes wrote: > On Tue, Mar 31, 2020 at 05:34:50PM +0200, Michal Hocko wrote: > > On Tue 31-03-20 10:58:06, Joel Fernandes wrote: > > [...] > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 4be763355c9fb..965deefffdd58 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -3149,7 +3149,7 @@ static inline struct rcu_head *attach_rcu_head_to_object(void *obj) > > > > > > > > if (!ptr) > > > > ptr = kmalloc(sizeof(unsigned long *) + > > > > - sizeof(struct rcu_head), GFP_ATOMIC | __GFP_NOWARN); > > > > + sizeof(struct rcu_head), GFP_MEMALLOC); > > > > > > Just to add, the main requirements here are: > > > 1. Allocation should be bounded in time. > > > 2. Allocation should try hard (possibly tapping into reserves) > > > 3. Sleeping is Ok but should not affect the time bound. > > > > > > __GFP_ATOMIC | __GFP_HIGH is the way to get an additional access to > > memory reserves regarless of the sleeping status. > > > > Using __GFP_MEMALLOC is quite dangerous because it can deplete _all_ the > > memory. What does prevent the above code path to do that? Neil has provided a nice explanation down the email thread. But let me clarify few things here. > Can you suggest what prevents other users of GFP_MEMALLOC from doing that > also? There is no explicit mechanism which is indeed unfortunate. The only user real user of the flag is Swap over NFS AFAIK. I have never dared to look into details on how the complete reserves depletion is prevented. Mel would be much better fit here. > That's the whole point of having a reserve, in normal usage no one will > use it, but some times you need to use it. Keep in mind this is not a common > case in this code here, this is triggered only if earlier allocation attempts > failed. Only *then* we try with GFP_MEMALLOC with promises to free additional > memory soon. You are right that this is the usecase for the flag. But this should be done with an extreme care because the core MM relies on those reserves so any other users should better make sure they do not consume a lot from reserves as well. > > If a partial access to reserves is sufficient then why the existing > > modifiers (mentioned above are not sufficient? > > The point with using GFP_MEMALLOC is it is useful for situations where you > are about to free memory and needed some memory temporarily, to free that. It > depletes it a bit temporarily to free even more. Is that not the point of > PF_MEMALLOC? > * %__GFP_MEMALLOC allows access to all memory. This should only be used when > * the caller guarantees the allocation will allow more memory to be freed > * very shortly e.g. process exiting or swapping. Users either should > * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). > > I was just recommending usage of this flag here because it fits the > requirement of allocating some memory to free some memory. I am also Ok with > GFP_ATOMIC with the GFP_NOWARN removed, if you are Ok with that. Maybe we need to refine this documentation to be more explicit about an extreme care to be taken when using the flag. diff --git a/include/linux/gfp.h b/include/linux/gfp.h index e5b817cb86e7..e436a7e28392 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -110,6 +110,9 @@ struct vm_area_struct; * the caller guarantees the allocation will allow more memory to be freed * very shortly e.g. process exiting or swapping. Users either should * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). + * Users of this flag have to be extremely careful to not deplete the reserve + * completely and implement a throttling mechanism which controls the consumption + * based on the amount of freed memory. * * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves. * This takes precedence over the %__GFP_MEMALLOC flag if both are set. -- Michal Hocko SUSE Labs