Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp9980588imu; Wed, 5 Dec 2018 13:47:57 -0800 (PST) X-Google-Smtp-Source: AFSGD/WE7UV7GQ+4qxprIHnCZle6keOEI9F2k/T4CXDid6GjKfdRTKdRKt4Geiq3cvADrY2VZ0ev X-Received: by 2002:a63:2263:: with SMTP id t35mr21897362pgm.69.1544046476881; Wed, 05 Dec 2018 13:47:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544046476; cv=none; d=google.com; s=arc-20160816; b=rBmJakCmLoedoqB7mRhlZVAkYAI8YkE2QDha2pZovSnq+MI71RQxjf2BvxGSVvN3OL JCoDZYwhlT8hEHjvHpCduf09TeF/GSugo1J0gmN8IKRVrqNboQJyoVeCuVBu8kE7jB7y lhwGaTtIoJVsefazAi5zLSyr4RVxaycJfq3ih+hOeeKVIQXc8MVVHoowZwjs78lcMSXi sr5rgl/CecZMt+VBDHnxjm8RJxxo1kx9fPU1JRt9Vo5YBIOUqjKsHlZwJ05XjR0kXrbo qXKygI4hZFfIXDReXBQfuvboqLkL7KAZ7F5zCc//q80hb4CqrCnsUlJF2uY4hUqjCYVt oAKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=YM1emS1KnZtjWGFON07frTS/1K7DHbZ2V26SHAVjR54=; b=U9lA249qL5EHl+gvyfxMIyLPBa8Gq60ZHha0rZj/RLSL0AV/nn7r15hae0/V7hJP6W n6bBgXHhwqdhOxDgV5Dw08+2I+mk8Eedfap44cY7fGsz+qnfZRlJd+c5R7OC45Amhiht fUeA9zJqU3Qo1m6CCfAMEYKs0P/622+kHFCAwdxWJbc00ekCy5voP8M4LlWaCWrzSbDA SUzgTitvbFr0HzeukbG1BYm+MhMm5qv3r8eeBYrd+Y+FWYxEy+oVOdmVeLWUS9HS5CtD JGmLbKZat58EbAd7z+VqMckSJsq6F4vO6rAHo7ECJtdT8yWwGaFM/yTxLq1+uscdd/Hk gsyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s4si21685229pfb.190.2018.12.05.13.47.41; Wed, 05 Dec 2018 13:47:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728548AbeLEVpu (ORCPT + 99 others); Wed, 5 Dec 2018 16:45:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37008 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727309AbeLEVpu (ORCPT ); Wed, 5 Dec 2018 16:45:50 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AEEE5307D85E; Wed, 5 Dec 2018 21:45:49 +0000 (UTC) Received: from sky.random (ovpn-122-73.rdu2.redhat.com [10.10.122.73]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A8E71194AF; Wed, 5 Dec 2018 21:45:43 +0000 (UTC) Date: Wed, 5 Dec 2018 16:45:42 -0500 From: Andrea Arcangeli To: David Rientjes Cc: Michal Hocko , Vlastimil Babka , Linus Torvalds , ying.huang@intel.com, s.priebe@profihost.ag, mgorman@techsingularity.net, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu Subject: Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions Message-ID: <20181205214542.GC11899@redhat.com> References: <20181205090554.GX1286@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.0 (2018-11-25) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Wed, 05 Dec 2018 21:45:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 05, 2018 at 11:49:26AM -0800, David Rientjes wrote: > High thp utilization is not always better, especially when those hugepages > are accessed remotely and introduce the regressions that I've reported. > Seeking high thp utilization at all costs is not the goal if it causes > workloads to regress. Is it possible what you need is a defrag=compactonly_thisnode to set instead of the default defrag=madvise? The fact you seem concerned about page fault latencies doesn't make your workload an obvious candidate for MADV_HUGEPAGE to begin with. At least unless you decide to smooth the MADV_HUGEPAGE behavior with an mbind that will simply add __GFP_THISNODE to the allocations, perhaps you'll be even faster if you invoke reclaim in the local node for 4k allocations too. It looks like for your workload THP is a nice to have add-on, which is practically true of all workloads (with a few corner cases that must use MADV_NOHUGEPAGE), and it's what the defrag= default is about. Is it possible that you just don't want to shut off completely compaction in the page fault and if you're ok to do it for your library, you may be ok with that for all other apps too? That's a different stance from other MADV_HUGEPAGE users because you don't seem to mind a severely crippled THP utilization in your app. With your patch the utilization will go down a lot compared to the previous __GFP_THISNODE swap storm capable and you're still very fine with that. The fact you're fine with that points in the direction of changing the default tuning for defrag= to something stronger than madvise (that is precisely the default setting that is forcing you to use MADV_HUGEPAGE to get a chance to get some THP once a in a while during the page fault, after some uptime). Considering mbind surprisingly isn't privileged (so I suppose it may cause swap storms equivalent to __GFP_THISNODE if maliciously used after all) you could even consider a defrag=thisnode to force compaction+defrag local to the node to retain your THP+NUMA dynamic partitioning behavior that ends up swappin heavy in the local node.