Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp1132350pxb; Wed, 15 Sep 2021 23:53:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzB7HWJ34RctDwrPcfE95qZ+pZfeNHRQs/T/uyq6IegpKVcr4SlW935kPsiIR4mcAHmxQ/T X-Received: by 2002:a17:906:6148:: with SMTP id p8mr4651220ejl.263.1631775210367; Wed, 15 Sep 2021 23:53:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631775210; cv=none; d=google.com; s=arc-20160816; b=V5H53fThqtuQLwfi//r8OHE9hrMqwMOnejebqKZ1J43jItjIvwbgsBqL1n2PBtKtP7 2jKWZnWfT002yZ972HqM030X4jNKQ1Jh1b1K1WefeeexHo807u2dXzv/1YPNKzLdqcUd h73m18uxSSmkzsSu3V3+fObYgXYg26OIlJfi1zc9B4NuO05cnXVC86ng/HhqSM9owKTn 4eHERWzq9V38xR6yoh9lrn06E9dZ+NgK/QPvZ70NiBeiWHsD/6ucF9juLaipyOkJL7Qz 5DA6VKh3Ild3OmvTBolqVY88NlBI5PO7yBuG5fsWTJvhiMMBpl0sTpjDwqjns+X5SXPU X25w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=rgPYC/jO9gO7VeQL2HNt6VwK+j/K8XgK6Ia/iOSZmOA=; b=YBn7D12PVFsQMAbhFa1GKluvngr4+g4sF/lxUt7j3E5E/gM5mV9OB+OT2kZMRxzoc9 pIO/uB0auli+dvojIprlQ7Di6KCVTCKH5AuLP6LiRG6UPluitVqpaVtLROklcKttJfXN HRaqWdgADbiIsK0g272qC37h6JZ+ENJ3Nre6Yn9LRrjguXx+cDas3zMti+0Z06bdd+Nn 9TRFQoJ8L58osc5ZNDDr6z41KNvw+MjLHOO3lnLyFvxNl5w+WPBSahJbAxsKVmMRYfmb WdI8kn6OGv/TjDHdlMuXU+vt0ndcemLF8bkSonA4gFNuZ7fEQSn338a3THwDmm5wd2zQ 5gtg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=hyumKxv6; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y12si2481531edr.430.2021.09.15.23.52.57; Wed, 15 Sep 2021 23:53:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=hyumKxv6; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234623AbhIPGxr (ORCPT + 99 others); Thu, 16 Sep 2021 02:53:47 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:48426 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232254AbhIPGxr (ORCPT ); Thu, 16 Sep 2021 02:53:47 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id BD63922324; Thu, 16 Sep 2021 06:52:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1631775144; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rgPYC/jO9gO7VeQL2HNt6VwK+j/K8XgK6Ia/iOSZmOA=; b=hyumKxv6Z1+GNRpK2emC5w8B+VLGOQafgHC+qT62hFQsVpViXyiB+mwGicPS7LoAcfjwD/ OnsM+osaP8zGhyId8SbUcUt7jIxfG37qnmBQLJNvqD8YBhtYB1FqAqMxHRsalzcEFswLt2 KT6QGXECdYCpeVo67h3EGaQ5VfNThQk= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 59A0AA3B87; Thu, 16 Sep 2021 06:52:24 +0000 (UTC) Date: Thu, 16 Sep 2021 08:52:23 +0200 From: Michal Hocko To: NeilBrown Cc: Mel Gorman , Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops. Message-ID: References: <163157808321.13293.486682642188075090.stgit@noble.brown> <163157838437.13293.14244628630141187199.stgit@noble.brown> <20210914163432.GR3828@suse.com> <163165609100.3992.1570739756456048657@noble.neil.brown.name> <163174534006.3992.15394603624652359629@noble.neil.brown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <163174534006.3992.15394603624652359629@noble.neil.brown.name> Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu 16-09-21 08:35:40, Neil Brown wrote: > On Wed, 15 Sep 2021, Michal Hocko wrote: > > On Wed 15-09-21 07:48:11, Neil Brown wrote: > > > > > > Why does __GFP_NOFAIL access the reserves? Why not require that the > > > relevant "Try harder" flag (__GFP_ATOMIC or __GFP_MEMALLOC) be included > > > with __GFP_NOFAIL if that is justified? > > > > Does 5020e285856c ("mm, oom: give __GFP_NOFAIL allocations access to > > memory reserves") help? > > Yes, that helps. A bit. > > I'm not fond of the clause "the allocation request might have come with some > locks held". What if it doesn't? Does it still have to pay the price. > > Should we not require that the caller indicate if any locks are held? I do not think this would help much TBH. What if the lock in question doesn't impose any dependency through allocation problem? > That way callers which don't hold locks can use __GFP_NOFAIL without > worrying about imposing on other code. > > Or is it so rare that __GFP_NOFAIL would be used without holding a lock > that it doesn't matter? > > The other commit of interest is > > Commit: 6c18ba7a1899 ("mm: help __GFP_NOFAIL allocations which do not trigger OOM killer") > > I don't find the reasoning convincing. It is a bit like "Robbing Peter > to pay Paul". It takes from the reserves to allow a __GFP_NOFAIL to > proceed, with out any reason to think this particular allocation has any > more 'right' to the reserves than anything else. I do agree that this is not really optimal. I do not remember exact details but these changes were mostly based or inspired by extreme memory pressure testing by Tetsuo who has managed to trigger quite some corner cases. Especially those where NOFS was involved were problematic. > While I don't like the reasoning in either of these, they do make it > clear (to me) that the use of reserves is entirely an internal policy > decision. They should *not* be seen as part of the API and callers > should not have to be concerned about it when deciding whether to use > __GFP_NOFAIL or not. Yes. NOFAIL should have high enough bar to use - essentially there is no other way than use it - that memory reserves shouldn't be a road block. If we learn that existing users can seriously deplete memory reserves then we might need to reconsider the existing logic. So far there are no indications that NOFAIL would really cause any problems in that area. > The use of these reserves is, at most, a hypothetical problem. If it > ever looks like becoming a real practical problem, it needs to be fixed > internally to the page allocator. Maybe an extra water-mark which isn't > quite as permissive as ALLOC_HIGH... > > I'm inclined to drop all references to reserves from the documentation > for __GFP_NOFAIL. I have found your additions to the documentation useful. > I think there are enough users already that adding a > couple more isn't going to make problems substantially more likely. And > more will be added anyway that the mm/ team won't have the opportunity > or bandwidth to review. > > Meanwhile I'll see if I can understand the intricacies of alloc_page so > that I can contibute to making it more predictable. > > Question: In those cases where an open-coded loop is appropriate, such > as when you want to handle signals or can drop locks, how bad would it > be to have a tight loop without any sleep? > > should_reclaim_retry() will sleep 100ms (sometimes...). Is that enough? > __GFP_NOFAIL doesn't add any sleep when looping. Yeah, NOFAIL doesn't add any explicit sleep points. In general there is no guarantee that a sleepable allocation will sleep. We do cond_resched in general but sleeping is enforced only for worker contexts because WQ concurrency depends on an explicit sleeping. So to answer your question, if you really need to sleep between retries then you should do it manually but cond_resched can be implied. -- Michal Hocko SUSE Labs