Received: by 2002:a4f:b056:0:0:0:0:0 with SMTP id m22csp2659220ivi; Tue, 15 Sep 2020 15:56:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxjCOEokDcFdgkBaiMSy7Cbep+jdPF1SuEDOQdLAuvYvRCoGXRMMncorguNkzoQeTgjTcx5 X-Received: by 2002:a17:906:3791:: with SMTP id n17mr22092334ejc.216.1600210599285; Tue, 15 Sep 2020 15:56:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600210599; cv=none; d=google.com; s=arc-20160816; b=yf0hT3mSVl6jacmmM74XrNr5p6f7ppHpHmZRE7tUDXV6e7tmGtSmRVcI4neGevSfxv JX0flMTO2WdSq7Gx83BYSVKm+0COz9+aLrfiVVXMnYO28x+Eaduhh1pbSrCxQDs1OziW 5JwbJs+uCoiKHK1dI0QjrS3ZWBsUFADjpCZcZYOTVNGO2Ec/1sxu6Dg1uJMaxO6RZffA oWtFnPuax+NGuazQl3ixlObS4goewJoOHUlUneOGchq1FA5aGICnwmSSvzPL3sTShnjo DIYLV27Oejr/p1hyCR1II9QpsXyRCtFZiW/5V+xhtmf7266x0hDgGGNdHLS8Zx/ggG11 iQXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=T8UDlV7gT67OgxRMu+Y50ij6TomcM+C1iy6v66P6GDs=; b=DJEwKK9hfUZfyutQDcZ2IhxM+P6MKT/drdJFrsjKPL+zqgI5qn1Q4/rFFIsvaQHmZn P3J/y2irmn0GK9o/c1AZNZDYV4Kt8xHRWx5cNvRnxTeqaUWoSaI8kydqGxVkN4qEcoSW QvALvdx8eE+EgLu9V7BIFdSWHjCZGibNMdJzDX2zP1hydIP9fZBi/WHIeRXm22UKZESp WpDPaWKnecAIbd8O4qYefv0mHw6COJr8T3qoGwtqPy9D3K9Wy+0bh7Yqm7deZaliGqHA df3r4u2adEBNUs6c90jAAOFUddypP9MUlwgQ3dA/a33niYqoAgsp2Cab9JaVJabrwUJg XxoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ce12si10110244ejb.719.2020.09.15.15.56.17; Tue, 15 Sep 2020 15:56:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726418AbgIOP1V (ORCPT + 99 others); Tue, 15 Sep 2020 11:27:21 -0400 Received: from mx2.suse.de ([195.135.220.15]:59104 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726586AbgIOOxr (ORCPT ); Tue, 15 Sep 2020 10:53:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 23A54B261; Tue, 15 Sep 2020 14:04:23 +0000 (UTC) Date: Tue, 15 Sep 2020 16:04:06 +0200 From: Michal Hocko To: Mateusz Nosek Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org Subject: Re: [RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size Message-ID: <20200915140406.GE3736@dhcp22.suse.cz> References: <20200914100654.21746-1-mateusznosek0@gmail.com> <20200914142233.GT16999@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 15-09-20 15:09:59, Mateusz Nosek wrote: > > > On 9/14/2020 4:22 PM, Michal Hocko wrote: > > On Mon 14-09-20 12:06:54, mateusznosek0@gmail.com wrote: > > > From: Mateusz Nosek > > > > > > Most operations from '__alloc_pages_may_oom' do not require oom_mutex hold. > > > Exception is 'out_of_memory'. The patch refactors '__alloc_pages_may_oom' > > > to reduce critical section size and improve overall system performance. > > > > This is a real slow path. What is the point of optimizing it? Do you > > have any numbers? > > > > I agree that as this is the slow path, then the hard, complicated > optimizations are not recommended. In my humble opinion introduced patch is > not complex and does not decrease code readability or maintainability. In a > nutshell I see no drawbacks of applying it. This is clearly a matter of taste. I do not see a good reason to apply it TBH. It is a claimed optimization without any numbers to back that claim. It is also a tricky area so I am usually very careful to touch this code without a strong reason. Others might feel differently of course. [...] Anyway, I have only now looked closer at the patch... > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index b9bd75cacf02..b07f950a5825 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -3935,18 +3935,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > .order = order, > > > }; > > > struct page *page; > > > - > > > - *did_some_progress = 0; > > > - > > > - /* > > > - * Acquire the oom lock. If that fails, somebody else is > > > - * making progress for us. > > > - */ > > > - if (!mutex_trylock(&oom_lock)) { > > > - *did_some_progress = 1; > > > - schedule_timeout_uninterruptible(1); > > > - return NULL; > > > - } > > > + bool success; > > > /* > > > * Go through the zonelist yet one more time, keep very high watermark > > > @@ -3959,14 +3948,17 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > ~__GFP_DIRECT_RECLAIM, order, > > > ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac); > > > if (page) > > > - goto out; > > > + return page; > > > + > > > + /* Check if somebody else is making progress for us. */ > > > + *did_some_progress = mutex_is_locked(&oom_lock); This is not only quite ugly but wrong as well. In general checking for a lock state is racy unless the lock is taken somewhere up the call chain. In this particular case it wouldn't be a big deal because an additional retry (did_some_progress = 1) is not really critical. It would likely be nicer to be deterministic here and not retry on all the early bailouts regardless of the lock state. > > > /* Coredumps can quickly deplete all memory reserves */ > > > if (current->flags & PF_DUMPCORE) > > > - goto out; > > > + return NULL; > > > /* The OOM killer will not help higher order allocs */ > > > if (order > PAGE_ALLOC_COSTLY_ORDER) > > > - goto out; > > > + return NULL; > > > /* > > > * We have already exhausted all our reclaim opportunities without any > > > * success so it is time to admit defeat. We will skip the OOM killer > > > @@ -3976,12 +3968,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > * The OOM killer may not free memory on a specific node. > > > */ > > > if (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_THISNODE)) > > > - goto out; > > > + return NULL; > > > /* The OOM killer does not needlessly kill tasks for lowmem */ > > > if (ac->highest_zoneidx < ZONE_NORMAL) > > > - goto out; > > > + return NULL; > > > if (pm_suspended_storage()) > > > - goto out; > > > + return NULL; > > > /* > > > * XXX: GFP_NOFS allocations should rather fail than rely on > > > * other request to make a forward progress. > > > @@ -3992,8 +3984,20 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > * failures more gracefully we should just bail out here. > > > */ > > > + /* > > > + * Acquire the oom lock. If that fails, somebody else is > > > + * making progress for us. > > > + */ > > > + if (!mutex_trylock(&oom_lock)) { > > > + *did_some_progress = 1; > > > + schedule_timeout_uninterruptible(1); > > > + return NULL; > > > + } > > > + success = out_of_memory(&oc); > > > + mutex_unlock(&oom_lock); > > > + > > > /* Exhausted what can be done so it's blame time */ > > > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > > + if (success || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > > *did_some_progress = 1; > > > /* > > > @@ -4004,8 +4008,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > page = __alloc_pages_cpuset_fallback(gfp_mask, order, > > > ALLOC_NO_WATERMARKS, ac); > > > } > > > -out: > > > - mutex_unlock(&oom_lock); > > > + > > > return page; > > > } > > > -- > > > 2.20.1 > > > > > > Sincerely yours, > Mateusz Nosek -- Michal Hocko SUSE Labs