Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp2673350ima; Mon, 22 Oct 2018 13:55:46 -0700 (PDT) X-Google-Smtp-Source: ACcGV63wDu4VI1SE/Wn/N5Kru38gGsg6dry2vAo3FyXfjHiTM8lLi/pzYxoiEPTUvMlVBEmZx3Ck X-Received: by 2002:a62:e091:: with SMTP id d17-v6mr46379960pfm.214.1540241745987; Mon, 22 Oct 2018 13:55:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540241745; cv=none; d=google.com; s=arc-20160816; b=DB89Dyg08Z+dnuyNumeJxIKRqnV66FQhNsCzkm3Kc+BUZUPq+Sge82xw9aRMbXQS68 OwbSFYjrpMM/nhX7iohUYrA8cSC7bDQwcyD/ffCiFJzB4hPnDt9d3lAEbR1wETzQKNz2 C65jUgixC0xnm7BLolGND4MsQeKSPWHDmfRmOM1Bj2rQ+cxx9ZUt92Z62tBVUtkr1i86 4TV5lr+ijcplgk5aGnGHcEAKn5GF2TzqFd3YfmYvBZv7nORyZdw2nPe40PZKQUJMiD9I e7YOR7dIQyuLJ4430H+IU9Y1V37i5QxoOzg76MXKJTkRMNEalqeC7FW9AVuvS0xmOwDE KvGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=S9B+aSN/dIMGBX39YxxuZ/oHFrMoDdlIR8nN13WzIB0=; b=qjJ7A+fJG4P4F5nCOqTzzCCFDJEm23YpPc0CNIfww8263UyQsrpyD7Clx1IOhN0hu6 hLE6CDaR5vKYTWAiT1rzL2ytSZl7ohmGXvJVs/6SV0WRT+0ZRPqHxu3jVe6qwvMco3vL RFKOJeE57kz+8RwgUQ/ZWVzmbqt8ux8SeWC+KgrPj01yD3jr+2NuWt2ZtJ9xXqdRQ3Dd ORLhP3xH11bR/e4M1JGBo5gBGXVjyBaIsa4WezLETHrzFf3XKrpQB4BhCePuVNcRljAT pXCjBBsmDKkkCsxyAItYIcIKPeJv+yQ3d9qJlAggJ+kCbAf9j7iduXjzZZ7ib9SfBKAf zuPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=UzTUXRck; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j24-v6si32469243pff.42.2018.10.22.13.55.30; Mon, 22 Oct 2018 13:55:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=UzTUXRck; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729105AbeJWFOn (ORCPT + 99 others); Tue, 23 Oct 2018 01:14:43 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:35986 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726885AbeJWFOn (ORCPT ); Tue, 23 Oct 2018 01:14:43 -0400 Received: by mail-pf1-f195.google.com with SMTP id l81-v6so20511507pfg.3 for ; Mon, 22 Oct 2018 13:54:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=S9B+aSN/dIMGBX39YxxuZ/oHFrMoDdlIR8nN13WzIB0=; b=UzTUXRckqG09UTcK26iNGeldrG1AYLKUD28TOzOpdsFjnaw2h4o6D2WT+7HdR2iRL7 ONlUTNZQ5+LWAxmABOeUPfvVQFxMAkFRySage/iIra4vHIB2kWqwWsgZQSDSXkXmYn4G wNVcKEJXPgqgD5fsaZjMFK0Y8Osd+yqlt2GZHnMWAosu0qG1FgGDKVdXtKBpEY3qjoLs 0ErT0MvqFOs2AcZYpJQkXUgnLkNF1BuaVmcGDXk/oD7M0t6qMN7U2Xavr6ZvJ/ZSgYwo NsNUjFhF3l1IYAlx4GYirzo4Oft/lPzU4DZS0rYCt4ekZHi60AwAsIiBn1OsSDD40yCX HOPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=S9B+aSN/dIMGBX39YxxuZ/oHFrMoDdlIR8nN13WzIB0=; b=PUK+Iu8Ug383FUg9YAt0FOR/ba+79scW5LqmhttF1k506dJ26g+MOV3+9+bG2gRBgH 5DPgthdiL/vHiW3fSZqn+O/Hdk2x5qKvYt0aDzDbisp8vWJE48fMcJwvPtjRhwkTLuyn eFGIPsg2eLTp27+//saidAfg7s43Br6woxOZCmBnsvBXfe0/yOHGOrbdYXIOqXHRdGrR lHLVbPV5tQK0qQmiFA3P6lnm6678OY0ZNpxYiJBZpCqc0OA2BngCrNE6VBPg1G56M1MZ SrQs6z2nNJvGNz6HcjjRAFmJQo+/cxo1SfcBDT3o7OaPWSZFLNtrAVW5wBgruw/elwy+ jRiw== X-Gm-Message-State: ABuFfojJin18x68NEN18BNotXVj0pnQrFCtwxrvSjUv0dCjDGwkWMAON EYhRMC+MAdvZX4WWAxuMGobOPA== X-Received: by 2002:a63:1a1c:: with SMTP id a28-v6mr43362200pga.157.1540241675409; Mon, 22 Oct 2018 13:54:35 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id p7-v6sm41989160pfb.101.2018.10.22.13.54.34 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 22 Oct 2018 13:54:34 -0700 (PDT) Date: Mon, 22 Oct 2018 13:54:33 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrea Arcangeli cc: Andrew Morton , Michal Hocko , Mel Gorman , Vlastimil Babka , Andrea Argangeli , Zi Yan , Stefan Priebe - Profihost AG , "Kirill A. Shutemov" , linux-mm@kvack.org, LKML , Stable tree Subject: Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings In-Reply-To: <20181015231953.GC30832@redhat.com> Message-ID: References: <20181005232155.GA2298@redhat.com> <20181009094825.GC6931@suse.de> <20181009122745.GN8528@dhcp22.suse.cz> <20181009130034.GD6931@suse.de> <20181009142510.GU8528@dhcp22.suse.cz> <20181009230352.GE9307@redhat.com> <20181015154459.e870c30df5c41966ffb4aed8@linux-foundation.org> <20181015231953.GC30832@redhat.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 15 Oct 2018, Andrea Arcangeli wrote: > > On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes wrote: > > > Would it be possible to test with my > > > patch[*] that does not try reclaim to address the thrashing issue? > > > > Yes please. > > It'd also be great if a testcase reproducing the 40% higher access > latency (with the one liner original fix) was available. > I never said 40% higher access latency, I said 40% higher fault latency. The higher access latency is 13.9% as measured on Haswell. The test case is rather trivial: fragment all memory with order-4 memory to replicate a fragmented local zone, use sched_setaffinity() to bind to that node, and fault a reasonable number of hugepages (128MB, 256, whatever). The cost of faulting remotely in this case was measured to be 40% higher than falling back to local small pages. This occurs quite obviously because you are thrashing the remote node trying to allocate thp. > We don't have a testcase for David's 40% latency increase problem, but > that's likely to only happen when the system is somewhat low on memory > globally. Well, yes, but that's most of our systems. We can't keep around gigabytes of memory free just to work around this patch. Removing __GFP_THISNODE to avoid thrashing the local node obviously will incur a substantial performance degradation if you thrash the remote node as well. This should be rather straight forward. > When there's 75% or more of the RAM free (not even allocated as easily > reclaimable pagecache) globally, you don't expect to hit heavy > swapping. > I agree there is no regression introduced by your patch when 75% of memory is free. > The 40% THP allocation latency increase if you use MADV_HUGEPAGE in > such window where all remote zones are fully fragmented is somehow > lesser of a concern in my view (plus there's the compact deferred > logic that should mitigate that scenario). Furthermore it is only a > concern for page faults in MADV_HUGEPAGE ranges. If MADV_HUGEPAGE is > set the userland allocation is long lived, so such higher allocation > latency won't risk to hit short lived allocations that don't set > MADV_HUGEPAGE (unless madvise=always, but that's not the default > precisely because not all allocations are long lived). > > If the MADV_HUGEPAGE using library was freely available it'd also be > nice. > You scan your mappings for .text segments, map a hugepage-aligned region sufficient in size, mremap() to that region, and do MADV_HUGEPAGE.