Received: by 10.213.65.68 with SMTP id h4csp2294515imn; Mon, 9 Apr 2018 00:38:02 -0700 (PDT) X-Google-Smtp-Source: AIpwx497bKVPmZASLdc/RM3IzJ4AT9lyiraXrrPVAIQqu144MoAerPe+7jwuTon/5mEOLUehmbp0 X-Received: by 2002:a17:902:3341:: with SMTP id a59-v6mr37523124plc.68.1523259482335; Mon, 09 Apr 2018 00:38:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523259482; cv=none; d=google.com; s=arc-20160816; b=BbrjP20CKWTQ1F+u5sgaJSOmtQJWADqy1i08i81LxLLipz4oDr+tlUppcN5Qetwq2K WMEVvLzIGG3wQfr9nDFiJJqTKCxo0h4xXcNNuwCVYK71ZaAmcE9rzggCtIldBSC1J8PO 53Txn3DjfIENyB9tLEtdkYrBx6p0ezs2duXLfIYpFE/gBJH9Z6hEtC6wrThOx2NNTlMQ wGE+u1gzrx8fl2GZmXBDSiPX+1NWun590x7fYrWfi3RoPiCfF3SW3efq9NemmTCNV6nK h6mGBqspD8kIUF50sM2op0dwzeq+Cumc82IjGZyIP95EhLVcY/yjMgdd1TDaHSBVa9j7 6XhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=9T5OXvVi3YjohBtTHazaBcECW0maUMY3J6B60HBNkOs=; b=bILGVKr/Vvx9XfORRsDkm+446O6yb+lWptv6GL4OOJTWRc3frtpbXdm8z7s28fns91 gj0tAe+7uHszqvGH0AIaZhXr3a2xZtK7BtamUZzXGG1SFoOh0TEI96XOLetsk89VOLja Tac5304LCWCgHG2+K8gpZhJSLcMsDXooN5bqbjGOttA3hYnJIi/42GTknZ6JyfwAsmB6 RLiJLj4/pj0qfpRhicrOFxr+PxZbO1JiU46cGPKLyDV5HH8PbPcFaRe9FlqzqO/6fZfJ xdBUF0KnabMbSAsD/f5qhivWhM+YBVbiD7XNp6bRaVizYQP3BSWxjVn3Iw3HIyE2lBbi jbjw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e39-v6si15163072plg.335.2018.04.09.00.37.25; Mon, 09 Apr 2018 00:38:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752488AbeDIHeP (ORCPT + 99 others); Mon, 9 Apr 2018 03:34:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:38684 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752394AbeDIHeL (ORCPT ); Mon, 9 Apr 2018 03:34:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id DA956ABBD; Mon, 9 Apr 2018 07:34:09 +0000 (UTC) Date: Mon, 9 Apr 2018 09:34:07 +0200 From: Michal Hocko To: Matthew Wilcox Cc: LKML , linux-mm@kvack.org, Vlastimil Babka Subject: Re: __GFP_LOW Message-ID: <20180409073407.GD21835@dhcp22.suse.cz> References: <20180405142258.GA28128@bombadil.infradead.org> <20180405142749.GL6312@dhcp22.suse.cz> <20180405151359.GB28128@bombadil.infradead.org> <20180405153240.GO6312@dhcp22.suse.cz> <20180405161501.GD28128@bombadil.infradead.org> <20180405185444.GQ6312@dhcp22.suse.cz> <20180405201557.GA3666@bombadil.infradead.org> <20180406060953.GA8286@dhcp22.suse.cz> <20180408042709.GC32632@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180408042709.GC32632@bombadil.infradead.org> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat 07-04-18 21:27:09, Matthew Wilcox wrote: > On Fri, Apr 06, 2018 at 08:09:53AM +0200, Michal Hocko wrote: > > OK, we already split the documentation into these categories. So we got > > at least the structure right ;) > > Yes, this part of the documentation makes sense to me :-) > > > > - What kind of memory to allocate (DMA, NORMAL, HIGHMEM) > > > - Where to get the pages from > > > - Local node only (THISNODE) > > > - Only in compliance with cpuset policy (HARDWALL) > > > - Spread the pages between zones (WRITE) > > > - The movable zone (MOVABLE) > > > - The reclaimable zone (RECLAIMABLE) > > > - What you are willing to do if no free memory is available: > > > - Nothing at all (NOWAIT) > > > - Use my own time to free memory (DIRECT_RECLAIM) > > > - But only try once (NORETRY) > > > - Can call into filesystems (FS) > > > - Can start I/O (IO) > > > - Can sleep (!ATOMIC) > > > - Steal time from other processes to free memory (KSWAPD_RECLAIM) > > > > What does that mean? If I drop the flag, do not steal? Well I do because > > they will hit direct reclaim sooner... > > If they allocate memory, sure. A process which stays in its working > set won't, unless it's preempted by kswapd. Well, I was probably not clear here. KSWAPD_RECLAIM is not something you want to drop because this is a cooperative flag. If you do not use it then you are effectivelly pushing others to the direct reclaim because the kswapd won't be woken up and won't do the background work. Your working make it sound as a good thing to drop. > > > - Kill other processes to get their memory (!RETRY_MAYFAIL) > > > > Not really for costly orders. > > Yes, need to be more precise there. > > > > - All of the above, and wait forever (NOFAIL) > > > - Take from emergency reserves (HIGH) > > > - ... but not the last parts of the regular reserves (LOW) > > > > What does that mean and how it is different from NOWAIT? Is this about > > the low watermark and if yes do we want to teach users about this and > > make the whole thing even more complicated? Does it wake > > kswapd? What is the eagerness ordering? LOW, NOWAIT, NORETRY, > > RETRY_MAYFAIL, NOFAIL? > > LOW doesn't quite fit into the eagerness scale with the other flags; > instead it's composable with them. So you can specify NOWAIT | LOW, > NORETRY | LOW, NOFAIL | LOW, etc. All I have in mind is something > like this: > > if (alloc_flags & ALLOC_HIGH) > min -= min / 2; > + if (alloc_flags & ALLOC_LOW) > + min += min / 2; > > The idea is that a GFP_KERNEL | __GFP_LOW allocation cannot force a > GFP_KERNEL allocation into an OOM situation because it cannot take > the last pages of memory before the watermark. So what are we going to do if the LOW watermark cannot succeed? > It can still make a > GFP_KERNEL allocation *more likely* to hit OOM (just like any other kind > of allocation can), but it can't do it by itself. So who would be a user of __GFP_LOW? > --- > > I've been wondering about combining the DIRECT_RECLAIM, NORETRY, > RETRY_MAYFAIL and NOFAIL flags together into a single field: > 0 => RECLAIM_NEVER, /* !DIRECT_RECLAIM */ > 1 => RECLAIM_ONCE, /* NORETRY */ > 2 => RECLAIM_PROGRESS, /* RETRY_MAYFAIL */ > 3 => RECLAIM_FOREVER, /* NOFAIL */ > > The existance of __GFP_RECLAIM makes this a bit tricky. I honestly don't > know what this code is asking for: I am not sure I follow here. Is the RECLAIM_ an internal thing to the allocator? > kernel/power/swap.c: __get_free_page(__GFP_RECLAIM | __GFP_HIGH); > but I suspect I'll have to find out. There's about 60 places to look at. Well, it would be more understandable if this was written as (GFP_KERNEL | __GFP_HIGH) & ~(__GFP_FS|__GFP_IO) > I also want to add __GFP_KILL (to be part of the GFP_KERNEL definition). What does __GFP_KILL means? > That way, each bit that you set in the GFP mask increases the things the > page allocator can do to get memory for you. At the moment, RETRY_MAYFAIL > subtracts the ability to kill other tasks, which is unusual. Well, it is not all that great because some flags add capabilities while some remove them but, well, life is hard when you try to extend an interface which was not all that great from the very beginning. > For example, > this test in kvmalloc_node: > > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > doesn't catch RETRY_MAYFAIL being set. It doesn't really have to. We want to catch obviously broken gfp flags here. That means mostly GFP_NO{FS,IO} because those might simply deadlock. RETRY_MAYFAIL is even supported to some extend. -- Michal Hocko SUSE Labs