Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp9870142imu; Wed, 5 Dec 2018 11:42:55 -0800 (PST) X-Google-Smtp-Source: AFSGD/XXNiVSh/KRaZn+jE0mf4I4ulMLtO3WlkLcmPSfz9hOs6Fjgq2GADFjYIYpx9WYxFXuYXBp X-Received: by 2002:a62:dbc2:: with SMTP id f185mr25443318pfg.235.1544038975335; Wed, 05 Dec 2018 11:42:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544038975; cv=none; d=google.com; s=arc-20160816; b=JjViwHlRKxC8OpChVlE2a3HnayVN4MTXXQ4BkIQ6VsSLnbHuM4VBWkZa4i0a6tfhuM JV3ByGqD6EuQNnYk66GmdBO5TRfOKYwVwvEKtZ79Yv/JwO0TEyStRJZk/7DAN+AKDmLa gtEVL4IftVKVQ4Eo+nhPJFTz1h4DO+ryuk16qBGXonI3AQu0tUmSRAu3odA2SecmMwE1 B8U0POIJTQECIbv7w/rp/mPyyXFzqTWZsc8J6R7+3lU5YUKIg32coVZQhPJWQgI2OFkf kzLJ2RDWwZufG1FzxM/q+Kl8t39zjpu6culN7nQTUAvbiOkrq6t0AClh8YIz1dZ7c4KE NJRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=8X3OP8Ia1YXc3DnyZBzz87JOukWTLaSuWmbPxmbR5Uk=; b=SCr5pxGqFcH+EaMGyVPbIzD6prKHAnmwOV6Cs1nHOhj1i/GwPV7eS9fbFwr95UUMJK nMcbxgnCRGwH5JwsyLjGQwuYzfMKDOIZQ3DCFJ4SodQO9IP66VYU5mx5ZemlQhhbEhq+ EeAUfl3y2Ak0Cdjcq8EM+gvqrE7LI3typb/0OpkCE3amZRLBn0iwzcXJLXU49ZJshGgy dgr1agh7aglfSTwwMlBzgaJLClAfqqv1PGkWQ9Wvd/1fHTecFaMYTvCxHQ7oKfRSVgeJ ZtTPwn+tpOMa4SHvHpkJtR5cccCE1xRLZTAgWeK2Rw9fcD4kI4N333yCzJMTZGkfPndI jbuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="Uj4nHhC/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w11si19778757plz.327.2018.12.05.11.42.39; Wed, 05 Dec 2018 11:42:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="Uj4nHhC/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728457AbeLETld (ORCPT + 99 others); Wed, 5 Dec 2018 14:41:33 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:45951 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727889AbeLETld (ORCPT ); Wed, 5 Dec 2018 14:41:33 -0500 Received: by mail-pl1-f195.google.com with SMTP id a14so10533345plm.12 for ; Wed, 05 Dec 2018 11:41:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=8X3OP8Ia1YXc3DnyZBzz87JOukWTLaSuWmbPxmbR5Uk=; b=Uj4nHhC/ZLkCEoMUBYAsBURHFJxPvSen7Xc/aEI22SHjMyColpV42lb6Y8LtJqKoUr iFnMi4P1hOTVDGmRaYNGZN68GVJxlE5UuIXAgEL3lOo0c0i72ozuIn7Tbpz8A+/BLH2Y keV/f9Y2lg3FFwehYbIpQgpO5sXzypBU9T3lBDvCgvU0EB4vpyEDVymVkZOGOl02An9R QqtLCcol4beO3V5W4mJnuvtmfdW5jMURk593XAnGAwgR0F+o0UKZmjmaJhpFsZqm2c7H I3tic+fHI08seWjH07NBJcc1OTdzygWnPhhk4+DIITwHbNRlV/vjSciS463z1rDl5WbP fI+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=8X3OP8Ia1YXc3DnyZBzz87JOukWTLaSuWmbPxmbR5Uk=; b=VNUitWkqXBixt0k2ekzwh34eK0eYqTcCq32Bm2TWIqgUDaKLcxyIwGjSVQw2I4hFG8 WxCqIIVKfEHfH8AofykBgsLEaCc5gfhvAnTDPyJgEdpFYzJM5WeRHg3agaCu23SVE0Nk t+nVLDNrb8PVdCgUdJW/+bYKgHRbLBtC28CUeqX0I36QkRBTYLilwCZBmtJ82dFXUN8y 1HPsBj7tk0Kcoo1bdiZWLgkELwM+gBgsBCGo/3ZY2YqlEmx2kHFsYhtygd8arhGCXeUG ogCyU9DXLyUqwpoh4I64H/mV/NXiX4Q+BeZMdo3SDmnNUXOVS8GwMUD9cGTPvRvy4DyQ NHJQ== X-Gm-Message-State: AA+aEWbfQi1S0QE1TYCE8srxRJV7hEmY7qw69Fz0+DAP376JHd1KnkrS AS24oLn8UWduDNTTMmH/I+A5FZECUDo= X-Received: by 2002:a17:902:3124:: with SMTP id w33mr6976792plb.241.1544038892672; Wed, 05 Dec 2018 11:41:32 -0800 (PST) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id l85sm54703105pfg.161.2018.12.05.11.41.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 05 Dec 2018 11:41:31 -0800 (PST) Date: Wed, 5 Dec 2018 11:41:31 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Mel Gorman cc: Michal Hocko , Linus Torvalds , Andrea Arcangeli , ying.huang@intel.com, s.priebe@profihost.ag, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu, Vlastimil Babka Subject: Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions In-Reply-To: <20181205101512.GY23260@techsingularity.net> Message-ID: References: <20181204073850.GW31738@dhcp22.suse.cz> <20181205101512.GY23260@techsingularity.net> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 5 Dec 2018, Mel Gorman wrote: > > This is a single MADV_HUGEPAGE usecase, there is nothing special about it. > > It would be the same as if you did mmap(), madvise(MADV_HUGEPAGE), and > > faulted the memory with a fragmented local node and then measured the > > remote access latency to the remote hugepage that occurs without setting > > __GFP_THISNODE. You can also measure the remote allocation latency by > > fragmenting the entire system and then faulting. > > > > I'll make the same point as before, the form the fragmentation takes > matters as well as the types of pages that are resident and whether > they are active or not. It affects the level of work the system does > as well as the overall success rate of operations (be it reclaim, THP > allocation, compaction, whatever). This is why a reproduction case that is > representative of the problem you're facing on the real workload matters > would have been helpful because then any alternative proposal could have > taken your workload into account during testing. > We know from Andrea's report that compaction is failing, and repeatedly failing because otherwise we would not need excessive swapping to make it work. That can mean one of two things: (1) a general low-on-memory situation that causes us repeatedly to be under watermarks to deem compaction suitable (isolate_freepages() will be too painful) or (2) compaction has the memory that it needs but is failing to make a hugepage available because all pages from a pageblock cannot be migrated. If (1), perhaps in the presence of an antagonist that is quickly allocating the memory before compaction can pass watermark checks, further reclaim is not beneficial: the allocation is becoming too expensive and there is no guarantee that compaction can find this reclaimed memory in isolate_freepages(). I chose to duplicate (2) by synthetically introducing fragmentation (high-order slab, free every other one) locally to test the patch that does not set __GFP_THISNODE. The result is a remote transparent hugepage, but we do not even need to get to the point of local compaction for that fallback to happen. And this is where I measure the 13.9% access latency regression for the lifetime of the binary as a result of this patch. If local compaction works the first time, great! But that is not what is happening in Andrea's report and as a result of not setting __GFP_THISNODE we are *guaranteed* worse access latency and may encounter even worse allocation latency if the remote memory is fragmented as well. So while I'm only testing the functional behavior of the patch itself, I cannot speak to the nature of the local fragmentation on Andrea's systems.