Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp820535imm; Wed, 26 Sep 2018 07:18:35 -0700 (PDT) X-Google-Smtp-Source: ACcGV62m3gTMZvi1hXSsdibIe0F1RMmielkYG2JH0oOLlKUrtiqmcgCdubulw0ZPKROepRLQ5tuA X-Received: by 2002:a63:e318:: with SMTP id f24-v6mr5893125pgh.175.1537971515036; Wed, 26 Sep 2018 07:18:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537971515; cv=none; d=google.com; s=arc-20160816; b=Noqcq8raGFKeFl7VxlYhYdXSzUnam0xfOEHIMTMSm6xUBrAnH4jHkVKjB34g3D916Y 1N4Qrf5ywInScGqVTtexiUUVgOBRgbHZ/ZoqFLlIXdRSXwqeQcJyFW91z/mBKBaduSq6 PpO/ttdPUOWkpSbd2u285hxrijQqkw64zRn1gwgxS790f5jaZmurf3VagRkhXwLSuUA+ YqGoIBwUtpGnqgC+j+X+LQA17fDuVGdG4xkWStIPFQgc/WCJnuWCxH88E0uJ8Trqjo7c MMP6sUA1i+HizNzBendI+2Zd79fT7Sc2WrqznqmKNf74BWzssXOwL0uXnG3D6BG95KOB uEaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=N1E8BWIzqG6beDlg/CHmmMKqzqunPLjXJZn6/oHMK/s=; b=Vb1o6lgU0SNOfz5Fxzz/JQU84ojRb1NUi3nNgkCgNQBp7hA7zRnvOllK8ViT8yEAmf 4cgD+gIkM9Kh0ONUBUDvOYlnn6SLjbH8H8kw8Q9z5riP48dqb3cq6Xl8jdZDCFYhgYah O0aKhcY47a+RQd0piW5pl9iZBrOJfrFlEGTOMo26dVKkbS/BzaQI8nfRcs7+78s5AC9H jet7iuMQeFjtRjR/daYXt8rwNOdNky+V61piYZisTBLs2Ma9bKEdtNO/ceBJlWmY1naV ECZNE6systesbI6q2fZbmpolS+PwdLi0aHpGEnHEMGArQZCqhamLo69b7t3JScGE8Q8B VA/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v19-v6si5359077pgh.36.2018.09.26.07.18.19; Wed, 26 Sep 2018 07:18:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728351AbeIZUaX (ORCPT + 99 others); Wed, 26 Sep 2018 16:30:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:50162 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726937AbeIZUaW (ORCPT ); Wed, 26 Sep 2018 16:30:22 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5BD36B095; Wed, 26 Sep 2018 14:17:10 +0000 (UTC) Date: Wed, 26 Sep 2018 16:17:08 +0200 From: Michal Hocko To: "Kirill A. Shutemov" Cc: Andrew Morton , Mel Gorman , Vlastimil Babka , David Rientjes , Andrea Argangeli , Zi Yan , Stefan Priebe - Profihost AG , linux-mm@kvack.org, LKML Subject: Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask Message-ID: <20180926141708.GX6278@dhcp22.suse.cz> References: <20180925120326.24392-1-mhocko@kernel.org> <20180925120326.24392-3-mhocko@kernel.org> <20180926133039.y7o5x4nafovxzh2s@kshutemo-mobl1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180926133039.y7o5x4nafovxzh2s@kshutemo-mobl1> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 26-09-18 16:30:39, Kirill A. Shutemov wrote: > On Tue, Sep 25, 2018 at 02:03:26PM +0200, Michal Hocko wrote: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index c3bc7e9c9a2a..c0bcede31930 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -629,21 +629,40 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, > > * available > > * never: never stall for any thp allocation > > */ > > -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) > > +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) > > { > > const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); > > + gfp_t this_node = 0; > > + > > +#ifdef CONFIG_NUMA > > + struct mempolicy *pol; > > + /* > > + * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not > > + * specified, to express a general desire to stay on the current > > + * node for optimistic allocation attempts. If the defrag mode > > + * and/or madvise hint requires the direct reclaim then we prefer > > + * to fallback to other node rather than node reclaim because that > > + * can lead to excessive reclaim even though there is free memory > > + * on other nodes. We expect that NUMA preferences are specified > > + * by memory policies. > > + */ > > + pol = get_vma_policy(vma, addr); > > + if (pol->mode != MPOL_BIND) > > + this_node = __GFP_THISNODE; > > + mpol_cond_put(pol); > > +#endif > > I'm not very good with NUMA policies. Could you explain in more details how > the code above is equivalent to the code below? MPOL_PREFERRED is handled by policy_node() before we call __alloc_pages_nodemask. __GFP_THISNODE is applied only when we are not using __GFP_DIRECT_RECLAIM which is handled in alloc_hugepage_direct_gfpmask now. Lastly MPOL_BIND wasn't handled explicitly but in the end the removed late check would remove __GFP_THISNODE for it as well. So in the end we are doing the same thing unless I miss something > > @@ -2026,60 +2025,6 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, > > goto out; > > } > > > > - if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { > > - int hpage_node = node; > > - > > - /* > > - * For hugepage allocation and non-interleave policy which > > - * allows the current node (or other explicitly preferred > > - * node) we only try to allocate from the current/preferred > > - * node and don't fall back to other nodes, as the cost of > > - * remote accesses would likely offset THP benefits. > > - * > > - * If the policy is interleave, or does not allow the current > > - * node in its nodemask, we allocate the standard way. > > - */ > > - if (pol->mode == MPOL_PREFERRED && > > - !(pol->flags & MPOL_F_LOCAL)) > > - hpage_node = pol->v.preferred_node; > > - > > - nmask = policy_nodemask(gfp, pol); > > - if (!nmask || node_isset(hpage_node, *nmask)) { > > - mpol_cond_put(pol); > > - /* > > - * We cannot invoke reclaim if __GFP_THISNODE > > - * is set. Invoking reclaim with > > - * __GFP_THISNODE set, would cause THP > > - * allocations to trigger heavy swapping > > - * despite there may be tons of free memory > > - * (including potentially plenty of THP > > - * already available in the buddy) on all the > > - * other NUMA nodes. > > - * > > - * At most we could invoke compaction when > > - * __GFP_THISNODE is set (but we would need to > > - * refrain from invoking reclaim even if > > - * compaction returned COMPACT_SKIPPED because > > - * there wasn't not enough memory to succeed > > - * compaction). For now just avoid > > - * __GFP_THISNODE instead of limiting the > > - * allocation path to a strict and single > > - * compaction invocation. > > - * > > - * Supposedly if direct reclaim was enabled by > > - * the caller, the app prefers THP regardless > > - * of the node it comes from so this would be > > - * more desiderable behavior than only > > - * providing THP originated from the local > > - * node in such case. > > - */ > > - if (!(gfp & __GFP_DIRECT_RECLAIM)) > > - gfp |= __GFP_THISNODE; > > - page = __alloc_pages_node(hpage_node, gfp, order); > > - goto out; > > - } > > - } > > - > > nmask = policy_nodemask(gfp, pol); > > preferred_nid = policy_node(gfp, pol, node); > > page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask); > > -- > Kirill A. Shutemov -- Michal Hocko SUSE Labs