Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4104138ybb; Tue, 7 Apr 2020 00:26:39 -0700 (PDT) X-Google-Smtp-Source: APiQypK7duv4kMUWJ/1naNUQNlEZ+qiHCgDXw2ZUbzOHC5IwIVICV0VuE7vpKH5eIKNQq4UIWzao X-Received: by 2002:aca:d489:: with SMTP id l131mr702284oig.5.1586244399600; Tue, 07 Apr 2020 00:26:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586244399; cv=none; d=google.com; s=arc-20160816; b=eN0cD9U1MJnKp3EzB1nRyuyyzfKVwHWS0gEi2IHB47uOOruUwl/GPvjiwFLwMJfKRD 76iK2p+I2JLp/TpEjk5eI3H73kWVLH8tZggzU4He3eEo8ul9LLt/GkgZJTWETfW/C7L3 pztf3pt/dwHhZEDrDktsJlVnqoOL6owJugXPUbBD8IWQpQQYEawn0dA8lwGpyNXK/IQZ /zgtmsq8zw8yz/Oi5TdzCqMtshaAgjfaN4rcqhFK+cpkOdM5M4RLHa9nvoxetoWhnX09 UMtXMe8INP/MGYd27CXJ0kEvrdgDpN+mTSkbQKSGYShRSq6/RvHi5sMoouajBNlAZKB5 1FTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=n6FnIuOZPi/ruTZydug6G1Y7/h7rJsp6Y+RjBmISzyY=; b=FWffUqaeSlebkmNj+NzeRSF7hwooBSHIjURRj7jkgcOK4JKTJDWwxt4BBbQuM0eDjb aJTwVRDIsKeTOYURsvvh877ZUYqFZr2rjqF9meEk6dgcjwxDnZaExoTQSNmKUZbpLNsq vQtItXWaZptRfRMzy/ktqbcMtFXWYqf732nKlBW6kuX8hExU4EBqxsUMnnhaV3ZB/Zsx l43ou4J8GMF216N7m8nhK1u+d9AF9PEUqYlN8/tKNPuyVUzF6jTLXbl86KkwE0NBbzjM tvViOD1050FL8wboYjylHcBISAYXagowJkU1JtQRSKHhrlGZJk/kAVsgymGU2B8b1OhS qGSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i184si809844oia.14.2020.04.07.00.26.26; Tue, 07 Apr 2020 00:26:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727547AbgDGHYp (ORCPT + 99 others); Tue, 7 Apr 2020 03:24:45 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:44233 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726767AbgDGHYo (ORCPT ); Tue, 7 Apr 2020 03:24:44 -0400 Received: by mail-wr1-f67.google.com with SMTP id c15so2568501wro.11 for ; Tue, 07 Apr 2020 00:24:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=n6FnIuOZPi/ruTZydug6G1Y7/h7rJsp6Y+RjBmISzyY=; b=hbRKeFevB0/QvXWxcrfP39sTcHKYRCDHXDFsbkEFjImvLwkJ3+byN7FJk7r1orVZaU NxXuvskWK0Se9HThUHu7YATdFkIXLkABSBmJ6slAc/35/TF8o6rLM6LOMNAmVQ2gFOe5 Qnr+xEPM/cdaEl/Ftf6B//mRpjsoA3Ev7iUOkiK9Yu8A0oLQoRzF11nCHb1hVd+tkOYO yQDjdATKhv9Gi4waxGSCkCYSyzFkwYt/Jd1S3JBKpL4ndC7HxedNrpDg5qXjnsSj545n rozZyqt3YULEmacoHqgyMwIy1nGXcKQ30E9Ps+0AcTYO7WuiEG2gQPBjdf+dYtI87p2m tSfA== X-Gm-Message-State: AGi0PubGNJSyaZfIz2Vu2KVLCJjtwai/0L3tIFJrA1r3cXyvq1wyx25X nYP5B1j8tXcU8jw+n3zGrc8= X-Received: by 2002:adf:9e8c:: with SMTP id a12mr1120975wrf.273.1586244281615; Tue, 07 Apr 2020 00:24:41 -0700 (PDT) Received: from localhost (ip-37-188-180-223.eurotel.cz. [37.188.180.223]) by smtp.gmail.com with ESMTPSA id n64sm1076078wme.45.2020.04.07.00.24.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Apr 2020 00:24:40 -0700 (PDT) Date: Tue, 7 Apr 2020 09:24:39 +0200 From: Michal Hocko To: NeilBrown Cc: John Hubbard , David Rientjes , Andrew Morton , Joel Fernandes , "Paul E. McKenney" , linux-mm@kvack.org, LKML Subject: Re: [PATCH 1/2] mm: clarify __GFP_MEMALLOC usage Message-ID: <20200407072439.GG18914@dhcp22.suse.cz> References: <20200403083543.11552-1-mhocko@kernel.org> <20200403083543.11552-2-mhocko@kernel.org> <87blo8xnz2.fsf@notabene.neil.brown.name> <20200406070137.GC19426@dhcp22.suse.cz> <4f861f07-4b47-8ddc-f783-10201ea302d3@nvidia.com> <875zecw1n6.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <875zecw1n6.fsf@notabene.neil.brown.name> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 07-04-20 11:00:29, Neil Brown wrote: > On Mon, Apr 06 2020, John Hubbard wrote: > > > On 4/6/20 12:01 AM, Michal Hocko wrote: [...] > >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h > >> index e5b817cb86e7..9cacef1a3ee0 100644 > >> --- a/include/linux/gfp.h > >> +++ b/include/linux/gfp.h > >> @@ -110,6 +110,11 @@ struct vm_area_struct; > >> * the caller guarantees the allocation will allow more memory to be freed > >> * very shortly e.g. process exiting or swapping. Users either should > >> * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). > >> + * Users of this flag have to be extremely careful to not deplete the reserve > >> + * completely and implement a throttling mechanism which controls the consumption > >> + * of the reserve based on the amount of freed memory. > >> + * Usage of a pre-allocated pool (e.g. mempool) should be always considered before > >> + * using this flag. > > I think this version is pretty good. Thanks! I will stick with it then. [...] > I think it is hard to write rules because the rules are a bit spongey. Exactly! And the more specific we are the more likely people are going to follow literally. And we do not want that. We want people to be aware of the limitation but want them to think hard before using the flag. > With mempools, we have a nice clear rule. When you allocate from a > mempool you must have a clear path to freeing that allocation which will > not block on memory allocation except from a subordinate mempool. This > implies a partial ordering between mempools. When you have layered > block devices the path through the layers from filesystem down to > hardware defines the order. It isn't enforced, but it is quite easy to > reason about. > > GFP_MEMALLOC effectively provides multiple mempools. So it could > theoretically deadlock if multiple long dependency chains > happened. i.e. if 1000 threads each make a GFP_MEMALLOC allocation and > then need to make another one before the first can be freed - then you > hit problems. There is no formal way to guarantee that this doesn't > happen. We just say "be gentle" and minimize the users of this flag, > and keep more memory in reserve than we really need. > Note that 'threads' here might not be Linux tasks. If you have an IO > request that proceed asynchronously, moving from queue to queue and > being handled by different task, then each one is a "thread" for the > purpose of understanding mem-alloc dependency. > > So maybe what I really should focus on is not how quickly things happen, > but how many happen concurrently. The idea of throttling is to allow > previous requests to complete before we start too many more. > > With Swap-over-NFS, some of the things that might need to be allocated > are routing table entries. These scale with the number of NFS servers > rather than the number of IO requests, so they are not going to cause > concurrency problems. > We also need memory to store replies, but these never exceed the number > of pending requests, so there is limited concurrency there. > NFS can send a lot of requests in parallel, but the main limit is the > RPC "slot table" and while that grows dynamically, it does so with > GFP_NOFS, so it can block or fail (I wonder if that should explicitly > disable the use of the reserves). > > So there a limit on concurrency imposed by non-GFP_MEMALLOC allocations This really makes sense to mention in the allocation manual (Documentation/core-api/memory-allocation.rst) as suggested by John. Care to make it into a patch? -- Michal Hocko SUSE Labs