Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5356506ybe; Tue, 10 Sep 2019 02:23:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqyqxAzRRR9UYfPKC6GzYvLmhh1ckyz4qjpyh3uJgdoKIlGJ3gjij5VhTVDr3KCy4Ubw0CS6 X-Received: by 2002:a17:906:16c4:: with SMTP id t4mr2125962ejd.48.1568107384050; Tue, 10 Sep 2019 02:23:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568107384; cv=none; d=google.com; s=arc-20160816; b=MgKIn4XfXaexn2NKxp1hAnidWgN13uRvFGtw/GlbBrngCK9aoEnejsTpDCiu0Upj5b zyKxCxSgF5qGF0tjvdZswDjNiywrsvVctyYO9XrEvZWlvGJ+mcdJwWZf/kQdXWfCO/5M bSOvKICPdj+R+InXmUOvyzCmbLHcVpJgUYHn3xvh3aW9MucwzJ4rD9bXKLEkyVD/9+DK /IOhHdyUwUX1GLzdb/xMN4/bLMWy3Ob1gtZIo4XW6RpYypzofdoB6+YosBqsAeM1a6ae CCccLSkVch3AH1B51Et86upbK4scBajIfKjPt+/4opk0wMF78RFrENpkRb74XRykDXok 3p9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Fz6dV/WjKiY9lq/VFGYf9Cizd9L/rhaOSNOkz8WILHM=; b=ynYzYWSSH0khdl+GYczESpW6onSula84AOodsRIvcPKVjsfItvh9aoAYr1h4oCIRCn F2eSut/eU95dND4gEe++WGmI1zJvH9oOMx55iMjdINkZP4c7W7Dcq8tavbEAgT0o6HXH 6VY9WCpIrm7nkzNyqrzjRp00UCDvFD7Y2f2Ll0HcnPZur9QTVWReRblHHoT0spFlmgD4 DaQGRJnkKzDEu3rkqVDHZqLcchoA1h8qsTzreLj1ZKoDst2eaKFuQwVkfKZ95Z+isD/J V0sOZl+ttuPDzevlXc0x6ekJku6q0EyvDS6uwaLeA8hQtZGLlB5r19yANAxpTMBX7QGH 3PCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f47si10459925ede.263.2019.09.10.02.22.40; Tue, 10 Sep 2019 02:23:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405546AbfIITaX (ORCPT + 99 others); Mon, 9 Sep 2019 15:30:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:40628 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730416AbfIITaX (ORCPT ); Mon, 9 Sep 2019 15:30:23 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A4E75B621; Mon, 9 Sep 2019 19:30:21 +0000 (UTC) Date: Mon, 9 Sep 2019 21:30:20 +0200 From: Michal Hocko To: David Rientjes Cc: Andrea Arcangeli , Linus Torvalds , Andrew Morton , Mel Gorman , Vlastimil Babka , "Kirill A. Shutemov" , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages Message-ID: <20190909193020.GD2063@dhcp22.suse.cz> References: <20190904205522.GA9871@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 05-09-19 14:06:28, David Rientjes wrote: > On Wed, 4 Sep 2019, Andrea Arcangeli wrote: > > > > This is an admittedly hacky solution that shouldn't cause anybody to > > > regress based on NUMA and the semantics of MADV_HUGEPAGE for the past > > > 4 1/2 years for users whose workload does fit within a socket. > > > > How can you live with the below if you can't live with 5.3-rc6? Here > > you allocate remote THP if the local THP allocation fails. > > > > > page = __alloc_pages_node(hpage_node, > > > gfp | __GFP_THISNODE, order); > > > + > > > + /* > > > + * If hugepage allocations are configured to always > > > + * synchronous compact or the vma has been madvised > > > + * to prefer hugepage backing, retry allowing remote > > > + * memory as well. > > > + */ > > > + if (!page && (gfp & __GFP_DIRECT_RECLAIM)) > > > + page = __alloc_pages_node(hpage_node, > > > + gfp | __GFP_NORETRY, order); > > > + > > > > You're still going to get THP allocate remote _before_ you have a > > chance to allocate 4k local this way. __GFP_NORETRY won't make any > > difference when there's THP immediately available in the remote nodes. > > > > This is incorrect: the fallback allocation here is only if the initial > allocation with __GFP_THISNODE fails. In that case, we were able to > compact memory to make a local hugepage available without incurring > excessive swap based on the RFC patch that appears as patch 3 in this > series. That patch is quite obscure and specific to pageblock_order+ sizes and for some reason requires __GPF_IO without any explanation on why. The problem is not THP specific, right? Any other high order has the same problem AFAICS. So it is just a hack and that's why it is hard to reason about. I believe it would be the best to start by explaining why we do not see the same problem with order-0 requests. We do not enter the slow path and thus the memory reclaim if there is any other node to pass through watermakr as well right? So essentially we are relying on kswapd to keep nodes balanced so that allocation request can be satisfied from a local node. We do have kcompactd to do background compaction. Why do we want to rely on the direct compaction instead? What is the fundamental difference? Your changelog goes in length about some problems in the compaction but I really do not see the underlying problem description. We cannot do any sensible fix/heuristic without capturing that IMHO. Either there is some fundamental difference between direct and background compaction and doing a the former one is necessary and we should be doing that by default for all higher order requests that are sleepable (aka __GFP_DIRECT_RECLAIM) or there is something to fix for the background compaction to be more pro-active. > > I said one good thing about this patch series, that it fixes the swap > > storms. But upstream 5.3 fixes the swap storms too and what you sent > > is not nearly equivalent to the mempolicy that Michal was willing > > to provide you and that we thought you needed to get bigger guarantees > > of getting only local 2m or local 4k pages. > > > > I haven't seen such a patch series, is there a link? not yet unfortunatelly. So far I haven't heard that you are even interested in that policy. You have never commented on that IIRC. -- Michal Hocko SUSE Labs