Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753091AbbBYKwc (ORCPT ); Wed, 25 Feb 2015 05:52:32 -0500 Received: from cantor2.suse.de ([195.135.220.15]:52652 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752580AbbBYKwb (ORCPT ); Wed, 25 Feb 2015 05:52:31 -0500 Message-ID: <54EDA96C.4000609@suse.cz> Date: Wed, 25 Feb 2015 11:52:28 +0100 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: David Rientjes , Andrew Morton CC: Greg Thelen , "Aneesh Kumar K.V" , Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch v2 for-4.0] mm, thp: really limit transparent hugepage allocation to local node References: In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2858 Lines: 69 On 02/25/2015 12:24 AM, David Rientjes wrote: > From: Greg Thelen > > Commit 077fcf116c8c ("mm/thp: allocate transparent hugepages on local > node") restructured alloc_hugepage_vma() with the intent of only > allocating transparent hugepages locally when there was not an effective > interleave mempolicy. > > alloc_pages_exact_node() does not limit the allocation to the single > node, however, but rather prefers it. This is because __GFP_THISNODE is > not set which would cause the node-local nodemask to be passed. Without > it, only a nodemask that prefers the local node is passed. Oops, good catch. But I believe we have the same problem with khugepaged_alloc_page(), rendering the recent node determination and zone_reclaim strictness patches partially useless. Then I start to wonder about other alloc_pages_exact_node() users. Some do pass __GFP_THISNODE, others not - are they also mistaken? I guess the function is a misnomer - when I see "exact_node", I expect the __GFP_THISNODE behavior. I think to avoid such hidden catches, we should create alloc_pages_preferred_node() variant, change the exact_node() variant to pass __GFP_THISNODE, and audit and adjust all callers accordingly. Also, you pass __GFP_NOWARN but that should be covered by GFP_TRANSHUGE already. Of course, nothing guarantees that hugepage == true implies that gfp == GFP_TRANSHUGE... but current in-tree callers conform to that. > Fix this by passing __GFP_THISNODE and falling back to small pages when > the allocation fails. > > Fixes: 077fcf116c8c ("mm/thp: allocate transparent hugepages on local node") > Signed-off-by: Greg Thelen > Signed-off-by: David Rientjes > --- > v2: GFP_THISNODE actually defers compaction and reclaim entirely based on > the combination of gfp flags. We want to try compaction and reclaim, > so only set __GFP_THISNODE. We still set __GFP_NOWARN to suppress > oom warnings in the kernel log when we can simply fallback to small > pages. > > mm/mempolicy.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1985,7 +1985,10 @@ retry_cpuset: > nmask = policy_nodemask(gfp, pol); > if (!nmask || node_isset(node, *nmask)) { > mpol_cond_put(pol); > - page = alloc_pages_exact_node(node, gfp, order); > + page = alloc_pages_exact_node(node, gfp | > + __GFP_THISNODE | > + __GFP_NOWARN, > + order); > goto out; > } > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/