Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp5360790imm; Wed, 12 Sep 2018 05:05:39 -0700 (PDT) X-Google-Smtp-Source: ANB0VdY2eaivSQXxnVYU5I64cQ1DATGM7q16RppegnYEj+f4/9iKk9xlFDd+kiV80LfgummOPLLH X-Received: by 2002:a63:115f:: with SMTP id 31-v6mr1916688pgr.53.1536753939009; Wed, 12 Sep 2018 05:05:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536753938; cv=none; d=google.com; s=arc-20160816; b=MTTBqYnnd9aSmDJooLwauOn3B2fyMd6Rg2xa7w4XxdW2H6/ONFm85qnMwXkJfsG0lt oqB8q0xbNYdOzJft8oob9/3Zll/OH+yV3Y5oJnxWWj6ZKw/JNCouTCQaDW914nEep7iv fktTkhi+5zlZ5qhZEfHEjQVUXyUZzvyzQnQpTcExTyzwNpOEA9F2+28s3s2jy5Vre7wX QF9nDSwqliZJQfs1cvlPZ9MUns3apcxzpG3Y1cjDDXEfYBrqWeKzYHIzSvmK6nkMvqlV JmzzPj34Phqc52b2V9eFN1QyLqJujSL80MRJaGszEmApTL72BZeDRt5jW6pgQcbkzoGl fsEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=YARCKOyL6cfujxGAeU1Rojv9MNED9kKBRvFdlXxe/CU=; b=Zs2p+LWn1zV7YBQ60kXfxCOqJefWMHOr5cTiN9no2yMU8oEVruxUD3i4fgBoKVLk81 5mTS+FngrkLJH0dqjEY8AM9pnOptYburb7KlLT7l34Zn0Nd8wf0QxS2lIsNaqP/0nWfG jQp1+bfzf5xRI18F8jxJmGkLTobw5jmSSGJvAwHv+OAKe+xM/O/DeUkgb1RLufF84AGh 6KNrR5opIdWhjqla+uV+2K48XSXMBCKWQOdmEG2NCqUz1HPR9uuBZ/aiTY18tvSchlO8 70YPfh32iJHF2Qm03iggv4tWILpkonuxZ3jJkWqbMZ0fAQ20NB5u9q09cfRNoN0y/r9Y Ectg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e21-v6si849873pgb.131.2018.09.12.05.05.19; Wed, 12 Sep 2018 05:05:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726712AbeILRJU (ORCPT + 99 others); Wed, 12 Sep 2018 13:09:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:47320 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726221AbeILRJU (ORCPT ); Wed, 12 Sep 2018 13:09:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4C3D7ACD1; Wed, 12 Sep 2018 12:05:05 +0000 (UTC) Date: Wed, 12 Sep 2018 14:05:04 +0200 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Andrea Arcangeli , Zi Yan , "Kirill A. Shutemov" , linux-mm@kvack.org, LKML , Stefan Priebe Subject: Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Message-ID: <20180912120504.GE10951@dhcp22.suse.cz> References: <20180907130550.11885-1-mhocko@kernel.org> <20180911115613.GR10951@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 11-09-18 13:30:20, David Rientjes wrote: > On Tue, 11 Sep 2018, Michal Hocko wrote: [...] > > hugepage specific MPOL flags sounds like yet another step into even more > > cluttered API and semantic, I am afraid. Why should this be any > > different from regular page allocations? You are getting off-node memory > > once your local node is full. You have to use an explicit binding to > > disallow that. THP should be similar in that regards. Once you have said > > that you _really_ want THP then you are closer to what we do for regular > > pages IMHO. > > > > Saying that we really want THP isn't an all-or-nothing decision. We > certainly want to try hard to fault hugepages locally especially at task > startup when remapping our .text segment to thp, and MADV_HUGEPAGE works > very well for that. Remote hugepages would be a regression that we now > have no way to avoid because the kernel doesn't provide for it, if we were > to remove __GFP_THISNODE that this patch introduces. Why cannot you use mempolicy to bind to local nodes if you really care about the locality? > On Broadwell, for example, we find 7% slower access to remote hugepages > than local native pages. On Naples, that becomes worse: 14% slower access > latency for intrasocket hugepages compared to local native pages and 39% > slower for intersocket. So, again, how does this compare to regular 4k pages? You are going to pay for the same remote access as well. From what you have said so far it sounds like you would like to have something like the zone/node reclaim mode fine grained for a specific mapping. If we really want to support something like that then it should be a generic policy rather than THP specific thing IMHO. As I've said it is hard to come up with a solution that would satisfy everybody but considering that the existing reports are seeing this a regression and cosindering their NUMA requirements are not so strict as yours I would tend to think that stronger NUMA requirements should be expressed explicitly rather than implicit effect of a madvise flag. We do have APIs for that. -- Michal Hocko SUSE Labs