Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753347AbdCHNOz (ORCPT ); Wed, 8 Mar 2017 08:14:55 -0500 Received: from mx2.suse.de ([195.135.220.15]:54093 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752889AbdCHNOs (ORCPT ); Wed, 8 Mar 2017 08:14:48 -0500 Date: Wed, 8 Mar 2017 13:54:31 +0100 From: Michal Hocko To: Tetsuo Handa Cc: linux-mm@kvack.org, Vlastimil Babka , Johannes Weiner , Mel Gorman , Andrew Morton , LKML , "Darrick J. Wong" Subject: Re: [RFC PATCH 3/4] xfs: map KM_MAYFAIL to __GFP_RETRY_MAYFAIL Message-ID: <20170308125431.GI11028@dhcp22.suse.cz> References: <20170307154843.32516-1-mhocko@kernel.org> <20170307154843.32516-4-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3227 Lines: 75 On Wed 08-03-17 20:23:37, Tetsuo Handa wrote: > On 2017/03/08 0:48, Michal Hocko wrote: > > From: Michal Hocko > > > > KM_MAYFAIL didn't have any suitable GFP_FOO counterpart until recently > > so it relied on the default page allocator behavior for the given set > > of flags. This means that small allocations actually never failed. > > > > Now that we have __GFP_RETRY_MAYFAIL flag which works independently on the > > allocation request size we can map KM_MAYFAIL to it. The allocator will > > try as hard as it can to fulfill the request but fails eventually if > > the progress cannot be made. > > > > Cc: Darrick J. Wong > > Signed-off-by: Michal Hocko > > --- > > fs/xfs/kmem.h | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > > index ae08cfd9552a..ac80a4855c83 100644 > > --- a/fs/xfs/kmem.h > > +++ b/fs/xfs/kmem.h > > @@ -54,6 +54,16 @@ kmem_flags_convert(xfs_km_flags_t flags) > > lflags &= ~__GFP_FS; > > } > > > > + /* > > + * Default page/slab allocator behavior is to retry for ever > > + * for small allocations. We can override this behavior by using > > + * __GFP_RETRY_MAYFAIL which will tell the allocator to retry as long > > + * as it is feasible but rather fail than retry for ever for all > > + * request sizes. > > + */ > > + if (flags & KM_MAYFAIL) > > + lflags |= __GFP_RETRY_MAYFAIL; > > I don't see advantages of supporting both __GFP_NORETRY and __GFP_RETRY_MAYFAIL. > kmem_flags_convert() can always set __GFP_NORETRY because the callers use > opencoded __GFP_NOFAIL loop (with possible allocation lockup warning) unless > KM_MAYFAIL is set. The behavior would be different (e.g. the OOM killer handling). [...] > line, which is likely always true); but this is off-topic for this thread. yes [...] > where both __GFP_NORETRY and __GFP_RETRY_MAYFAIL are checked after > direct reclaim and compaction failed. __GFP_RETRY_MAYFAIL optimistically > retries based on one of should_reclaim_retry() or should_compact_retry() > or read_mems_allowed_retry() returns true or mutex_trylock(&oom_lock) in > __alloc_pages_may_oom() returns 0. If !__GFP_FS allocation requests are > holding oom_lock each other, __GFP_RETRY_MAYFAIL allocation requests (which > are likely !__GFP_FS allocation requests due to __GFP_FS allocation requests > being blocked on direct reclaim) can be blocked for uncontrollable duration > without making progress. It seems to me that the difference between > __GFP_NORETRY and __GFP_RETRY_MAYFAIL is not useful. Rather, the caller can > set __GFP_NORETRY and retry with any control (e.g. set __GFP_HIGH upon first > timeout, give up upon second timeout). You are drown in implementation details here. Try to step back and think about the high level semantic I would like to achieve - which is essentially a middle ground between __GFP_NORETRY which doesn't retry and __GFP_NOFAIL to retry for ever. There are users who could benefit from such a semantic I believe (the most prominent example is kvmalloc which has different modes of how hard to try kmalloc before giving up and falling back to vmalloc).. -- Michal Hocko SUSE Labs