Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5783720ybe; Tue, 17 Sep 2019 13:29:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqwS7j5YEzTJ972eBnGdEILsvms6ooftcuTOzF9RVFXyceqbR0kWLgpgtfc6Blyqk9+5gjxF X-Received: by 2002:aa7:da8b:: with SMTP id q11mr6651980eds.19.1568752158334; Tue, 17 Sep 2019 13:29:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568752158; cv=none; d=google.com; s=arc-20160816; b=s5i4w0WgAW3pSM7qQkge2WLfsFpfDcFwQZSSJcIos5MS8DYI30QMgphavKfGjy9yL9 6RnnYSOhTqQ4HJC6Xg2sq+CzNhBBHLzDOPLQIHe9zK2FROHB5SCWoZ++AG6EWJ2d0tPp Lg49973ZijG/y+w4LKP47SWMtXq4GG202vFKuy7nyoDVpDqHXDI5rP3T9poApPCBrfLP Ld7qiQyZ8a2XyKNUJ/lbA9cfeCM/ePN7pFo679yzP/W/MheSvK6Cq4fQKb4ZyJ/2dYmD zgswQPmb1uRYvP8yevtkevY10vdjo7PctTBWwKJo0WwHHKcpVSOWSJWs11jBQj2B/7/r o1sA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=3hrsqxgYoLZLVUn9UcvGaDBzQEYsVePntZ0NXkBb8d8=; b=NLz3Bi5F8e/kv+92lySrT5M/N6f9FKuk1mF9oLLFe6zsps6pPsFNbnLNoEMSAkMPWO eFL617qLXbdKnh3H4JzLyzRBZ4sujoOuXVZOHSKOXuB93NH+enLrNN1SCWaqKGLQMHvl 6Oj/UF/jRnQud6NNzi1hie6z4MorYSeFnBOY2pWIAwo/IzApxp3QEAycJUpBsn0V6/RK LdNb/Z72oWXNh+TW7+HgRZgVbJA6Va3diVbL5tmfUPMx1GNaR8VzG4kta5hE+d7HyGB+ dWYGe6J1ezdlhGqCWchfd7bvr3LztL3lNwU+6VfjI6eEgbTap+9kA1FmTDdregvhmPl8 HM4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=o90CdJ6j; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b15si1976854edw.31.2019.09.17.13.28.53; Tue, 17 Sep 2019 13:29:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=o90CdJ6j; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726663AbfIQU0z (ORCPT + 99 others); Tue, 17 Sep 2019 16:26:55 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:37444 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726025AbfIQU0z (ORCPT ); Tue, 17 Sep 2019 16:26:55 -0400 Received: by mail-pg1-f196.google.com with SMTP id c17so2599714pgg.4 for ; Tue, 17 Sep 2019 13:26:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=3hrsqxgYoLZLVUn9UcvGaDBzQEYsVePntZ0NXkBb8d8=; b=o90CdJ6jULBoiMWu9aZxAJzFtYkecrPhyq2BBYGDGbsS7hue6MLYv4Sqo71W48kE/W hVIA63xDI9v8s2HFpYk8ralgVlEOvTjOPpvdJNZ1GYcfkdq6Th5pGE4YYTC/yI2ArebY paARUW7i9iG3YJrfoaEBv+sjJ6oPu7ibs5+iM7gaYLxA7VJmFWz1NkZgQdzdd6GOI7Nf kImNv4HkB5QdYulW7IkA/a2PVuAy0ACovaYzvOsNmJmhaQ9NVFR+m6+XqV9PymeHCQL0 2fcBSypr3wp7PPD1Dl9mm2Gdu8t+fTKxLK34ffpaFMrHM8wbz8W42VLD9m21APMoN+By u2JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=3hrsqxgYoLZLVUn9UcvGaDBzQEYsVePntZ0NXkBb8d8=; b=g3f8pnrAFsLSURLIbUmNDDNJwa1VfR09lPehxPuspZ1xhIpstJ5g6WwBZZxUbqpDua ZHDcuZuEGNEQ7akJAH1Z5gU2q/5EGoFVZmFlYyn6j3pC7aDIGjpMt6UQdq678dD/8r3s aPaMEEZGbqpK5D3hbiG0PypYtPzwdng4lxWlW5U3FaJCdCrupBAyoIOK53Ip8uCE+st6 P1nmCgWsZfwXKebzGe9cguzAgmZBJ+z3uqH9++yqhIwFm8RZSrQPUXwxrpc7ZuOSGRnb GLnp/bhuh4crj0XBBhOiUEZrU4dGwojxkmuVjYiejtX8+nC9qZJiGfn9WJ0u4wODqtID o3RQ== X-Gm-Message-State: APjAAAXGAjCKXbccCXyiyZIwfTxOENUuEtGZBzhMlPSG3Hgksv8CzCQr CuEYvRwF1NLzAKEdH54GOpZwmg== X-Received: by 2002:a63:cd04:: with SMTP id i4mr643565pgg.21.1568752013795; Tue, 17 Sep 2019 13:26:53 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id t8sm3049095pjq.30.2019.09.17.13.26.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Sep 2019 13:26:53 -0700 (PDT) Date: Tue, 17 Sep 2019 13:26:52 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: John Hubbard cc: Nitin Gupta , akpm@linux-foundation.org, vbabka@suse.cz, mgorman@techsingularity.net, mhocko@suse.com, dan.j.williams@intel.com, Yu Zhao , Matthew Wilcox , Qian Cai , Andrey Ryabinin , Roman Gushchin , Greg Kroah-Hartman , Kees Cook , Jann Horn , Johannes Weiner , Arun KS , Janne Huttunen , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC] mm: Proactive compaction In-Reply-To: Message-ID: References: <20190816214413.15006-1-nigupta@nvidia.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 17 Sep 2019, John Hubbard wrote: > > We've had good success with periodically compacting memory on a regular > > cadence on systems with hugepages enabled. The cadence itself is defined > > by the admin but it causes khugepaged[*] to periodically wakeup and invoke > > compaction in an attempt to keep zones as defragmented as possible > > That's an important data point, thanks for reporting it. > > And given that we have at least one data point validating it, I think we > should feel fairly comfortable with this approach. Because the sys admin > probably knows when are the best times to steal cpu cycles and recover > some huge pages. Unlike the kernel, the sys admin can actually see the > future sometimes, because he/she may know what is going to be run. > > It's still sounding like we can expect excellent results from simply > defragmenting from user space, via a chron job and/or before running > important tests, rather than trying to have the kernel guess whether > it's a performance win to defragment at some particular time. > > Are you using existing interfaces, or did you need to add something? How > exactly are you triggering compaction? > It's possible to do this through a cron job but there are a fre reasons that we preferred to do it through khugepaged: - we use a lighter variation of compaction, MIGRATE_SYNC_LIGHT, than what the per-node trigger provides since compact_node() forces MIGRATE_SYNC and can stall for minutes and become disruptive under some circumstances, - we do not ignore the pageblock skip hint which compact_node() hardcodes to ignore, and - we didn't want to do this in process context so that the cpu time is not taxed to any user cgroup since it's on behalf of the system as a whole. It seems much better to do this on a per-node basis rather than through the sysctl to do it for the whole system to partition the work. Extending the per-node interface to do MIGRATE_SYNC_LIGHT and not ignore pageblock skip is possible but the work done would still be done in process context so if done from userspace this would need to be attached to a cgroup that does not tax that cgroup for usage done on behalf of the entire system. Again, we're using khugepaged and allowing the period to be defined through /sys/kernel/mm/transparent_hugepage/khugepaged but that is because we only want to do this on systems where we want to dynamically allocate hugepages on a regular basis.