Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp6542866ybi; Wed, 5 Jun 2019 02:35:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqx4nqPe/Yy/9/Wv2Zm+S4lfbc3JFkQrkv/5PT3Xtzg2TcTveBV0buz/eUO5tOCSLCeNmwCQ X-Received: by 2002:a17:902:e2:: with SMTP id a89mr42716054pla.210.1559727344971; Wed, 05 Jun 2019 02:35:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559727344; cv=none; d=google.com; s=arc-20160816; b=W34ffrtAGAP9wZIYZ+qwDgqK2DsOoRAL6Y+LRl3LRObDFantTkIv7NdUYPCP2Otwju fVNJI3eXd496AuRWsGQwSM1DlD03ayXj6u+FBh3DiQ3k6Rmss0ZdUWFkZ9TRjkUGNJ8q C8eHAC+JBzGrnc56cpcuynB0tdDJMZrWiQp/ut7NK/+cR3MYkGnIDOOLUPMla1Lkz3Hm 26WRRjx2l64ShR8Pv+qk4NEK5rd9x4nE20+C8NO44+V2ih0GGD5y7zGdMG+IIzn1mCe0 YpgrfbdSf81x0A34TSuEQjvAzFYwYYlPsj6hOe0q8CNUVzUqqmnKv42cxAdtt/LDr8ug LQvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=CN6NmX86IEzRdIctTNb85wMGuz/D32d5MdlpwXh7LF8=; b=pLj4dl5TUo+psgHJfz2PxAdeH8dnpgDPxC5OSPHNhSJ9IS7it0Y4VbGo2EuNQBkepQ 9UrYiHBpN7ZrZ0opMHtzK0lB74VdL2ODBSNVpIipcvg03LNtlrmbfrhbubkJW3iby9iY Wd/Y/qe2ym6S0bNuCdV3s9INtEZ4/ZidCheX1yLhKr9IYfUakGDf+gY1Oh1YWZ6gZo2G PXc7Lk5CyfD3xs4WU5SGCMsUgPPfc3qn+27oYBxBohQd2dVAIeIe2lt157+9Ab85lsNo hbLViSBXqwGDjendhvW7KLxDdyO7EBNDBjLt1PcpzOcVNk95EdNpjo+rKW45mOpY77lA oHpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 67si27098067plf.382.2019.06.05.02.35.28; Wed, 05 Jun 2019 02:35:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727085AbfFEJdA (ORCPT + 99 others); Wed, 5 Jun 2019 05:33:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:33780 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726690AbfFEJdA (ORCPT ); Wed, 5 Jun 2019 05:33:00 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E493CAEA3; Wed, 5 Jun 2019 09:32:57 +0000 (UTC) Date: Wed, 5 Jun 2019 11:32:57 +0200 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Mel Gorman , Andrea Arcangeli , Vlastimil Babka , Zi Yan , Stefan Priebe - Profihost AG , "Kirill A. Shutemov" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations" Message-ID: <20190605093257.GC15685@dhcp22.suse.cz> References: <20190503223146.2312-1-aarcange@redhat.com> <20190503223146.2312-3-aarcange@redhat.com> <20190520153621.GL18914@techsingularity.net> <20190523175737.2fb5b997df85b5d117092b5b@linux-foundation.org> <20190531092236.GM6896@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 31-05-19 14:53:35, David Rientjes wrote: > On Fri, 31 May 2019, Michal Hocko wrote: > > > > The problem which this patch addresses has apparently gone unreported for > > > 4+ years since > > > > Can we finaly stop considering the time and focus on the what is the > > most reasonable behavior in general case please? Conserving mistakes > > based on an argument that we have them for many years is just not > > productive. It is very well possible that workloads that suffer from > > this simply run on older distribution kernels which are moving towards > > newer kernels very slowly. > > > > That's fine, but we also must be mindful of users who have used > MADV_HUGEPAGE over the past four years based on its hard-coded behavior > that would now regress as a result. Absolutely, I am all for helping those usecases. First of all we need to understand what those usecases are though. So far we have only seen very vague claims about artificial worst case examples when a remote access dominates the overall cost but that doesn't seem to be the case in real life in my experience (e.g. numa balancing will correct things or the over aggressive node reclaim tends to cause problems elsewhere etc.). That being said I am pretty sure that a new memory policy as proposed previously that would allow for a node reclaim behavior is a way for those very specific workloads that absolutely benefit from a local access. There are however real life usecases that benefit from THP even on remote nodes as explained by Andrea (most notable kvm) and the only way those can express their needs is the madvise flag. Not to mention that the default node reclaim behavior might cause excessive reclaim as demonstrate by Mel and Anrea and that is certainly not desirable in itself. [...] > > > My goal is to reach a solution that does not cause anybody to incur > > > performance penalties as a result of it. > > > > That is certainly appreciated and I can offer my help there as well. But > > I believe we should start with a code base that cannot generate a > > swapping storm by a trivial code as demonstrated by Mel. A general idea > > on how to approve the situation has been already outlined for a default > > case and a new memory policy has been mentioned as well but we need > > something to start with and neither of the two is compatible with the > > __GFP_THISNODE behavior. > > > > Thus far, I haven't seen anybody engage in discussion on how to address > the issue other than proposed reverts that readily acknowledge they cause > other users to regress. If all nodes are fragmented, the swap storms that > are currently reported for the local node would be made worse by the > revert -- if remote hugepages cannot be faulted quickly then it's only > compounded the problem. Andrea has outline the strategy to go IIRC. There also has been a general agreement that we shouldn't be over eager to fall back to remote nodes if the base page size allocation could be satisfied from a local node. -- Michal Hocko SUSE Labs