Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp6259844imm; Mon, 23 Jul 2018 14:34:16 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfSblX+HHdSUQybYx4eKD/X+kuvWp7eBAoElIYxSnGrlJNGr7tRv7Vayw5o/P2zlY7UwQpg X-Received: by 2002:a63:1f4d:: with SMTP id q13-v6mr13822113pgm.241.1532381656515; Mon, 23 Jul 2018 14:34:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532381656; cv=none; d=google.com; s=arc-20160816; b=W716UJ7Dupc9LYYu+hLJ2hVzN0+XKwBQuKiggg9X3O9nCO1Yue+61iIDpEDMaKVELu uqJP+3KrcsM6nU1BzI38gWaDAfyfriYHN3dygNXyOS90iOv4klCbOf98xw608XtW58TB mcacI99Q1c2AlZpEzCHMbE1nGGQGq4oqFMt3qnU+vs55OxURZzx0O6EUT6nj6JsqjE8J s6wFI+Ld75ButNy2S+dSpx0mhtUBbLXb7fSW2ZULuGCtZ6djVr7fYsS+LU4JAq2rtDy5 lEiYR7o4qfOzNyhky24u7IqUPkTWP8PGvRqdgqvai3cmVITxvF7rejwNuVWp3paXWvi4 jeKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=fyAHSROpRYFFVw5kEfjbZbLFe4lOE/YZUVuTP00qnj8=; b=uUCWIf6hZppqqQkW3lHBxkFgPhtIOqZd6bW4D8htG6hG0OsNcY+bB3hCqy6sLQ0/4n 8d9YJJpzmt0MBM4dXZUD9Qalh58Ya971LIhVg7G/3SBs9aiiXnSDDT2fPcEQVskL5PQn n/1HhnSs+tITM4F7v3UE/dwLXcNKvtFxvSIoMlobOCu1CylCrW4hpUM5ENaQgsJuFQDG wfDm38cyDk9vi47pyfKGxn/B82ZGVbaZRUk7J6ZaBgDiO/0R8JsiZpXJZl7ds4riM71m Us+ctYhMCZFrfKTLdbu8Gurpw92bYLfOwmmXG0d3Srm2pp+DLWhYjvHZXQLLhXSSBpnq kq0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OodYqxgS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h1-v6si9309492pll.416.2018.07.23.14.34.00; Mon, 23 Jul 2018 14:34:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OodYqxgS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388169AbeGWWgS (ORCPT + 99 others); Mon, 23 Jul 2018 18:36:18 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:41097 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388052AbeGWWgS (ORCPT ); Mon, 23 Jul 2018 18:36:18 -0400 Received: by mail-pf1-f195.google.com with SMTP id y10-v6so342158pfn.8 for ; Mon, 23 Jul 2018 14:33:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=fyAHSROpRYFFVw5kEfjbZbLFe4lOE/YZUVuTP00qnj8=; b=OodYqxgS3nB9EF+Qy+MeJ4s5Dp49TXSncJ3m3GIYOGMLuN0d+lfYTf6+rjgcjHdJdT inFNGZGJvKGi9o93QcYGQ/t4tKbZTb1Xt476GUul1zP1zhLr9ddgOlQxrv2iWT/zcgkD XYB1ax5LB/SNMq3Qdz530NcbO90S+AClE/A5HbJQTXKFf2HqvuzjjifJH0bxsSTBfAog V6HG8VZzMsR3UtWMe+5bB5vjnJj6dK7aAXAY/hVywKiXfcioDZgywDrGfMKHH6X1bDDu F3LLtsKDMYoelMrUzwZyAfR1kj4KMz7I4SPPN3S3H6+eXgzHN/cqfD+8TY1o2q35oimV rlGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=fyAHSROpRYFFVw5kEfjbZbLFe4lOE/YZUVuTP00qnj8=; b=fUowIBQsoStO8CoIPmQtdg61aAf43CtZOHjAcjeE5gxPH5ZnONjc55+XskueWN8Qff qiNx+rR7hUWkpglTEfcQ4g6dyw/Om6/tspAzYap8FAoKkfmsvXyIRJ/65k62wG2j0W6p 5bSUkCnKzxvPJikVGiz2IEqpDu2PODT59DUm/7OcpIykDZYO7Gh/iNsSHh1UHELHhOBS /EqBcst0L5XhnVRNLFjXjQotCp55GzXeUm+itRamWcNFEFdlx7UAoIzxc/kT2jWOFH9p xKRnLBgF9TyzAhp0kp3EcGuG3KTFXqBNWQj5qYbOy7gfhhX4tPVz/UqPqLB+db8jn/Xx pSgw== X-Gm-Message-State: AOUpUlH39xTZbiwxynUsuL2VPSbslYCwv4mcL+oIjqgU39q7O2QybN88 YHQQBsSkMVqyeghKc2WNLgPhrw== X-Received: by 2002:a63:4951:: with SMTP id y17-v6mr13956367pgk.32.1532381590170; Mon, 23 Jul 2018 14:33:10 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id y5-v6sm9032128pfj.169.2018.07.23.14.33.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 23 Jul 2018 14:33:09 -0700 (PDT) Date: Mon, 23 Jul 2018 14:33:08 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Matthew Wilcox cc: Yang Shi , Andrew Morton , kirill@shutemov.name, hughd@google.com, aaron.lu@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob In-Reply-To: Message-ID: References: <1532110430-115278-1-git-send-email-yang.shi@linux.alibaba.com> <20180720123243.6dfc95ba061cd06e05c0262e@linux-foundation.org> <3238b5d2-fd89-a6be-0382-027a24a4d3ad@linux.alibaba.com> <20180722035156.GA12125@bombadil.infradead.org> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 23 Jul 2018, David Rientjes wrote: > > > The huge zero page can be reclaimed under memory pressure and, if it is, > > > it is attempted to be allocted again with gfp flags that attempt memory > > > compaction that can become expensive. If we are constantly under memory > > > pressure, it gets freed and reallocated millions of times always trying to > > > compact memory both directly and by kicking kcompactd in the background. > > > > > > It likely should also be per node. > > > > Have you benchmarked making the non-huge zero page per-node? > > > > Not since we disable it :) I will, though. The more concerning issue for > us, modulo CVE-2017-1000405, is the cpu cost of constantly directly > compacting memory for allocating the hzp in real time after it has been > reclaimed. We've observed this happening tens or hundreds of thousands > of times on some systems. It will be 2MB per node on x86 if the data > suggests we should make it NUMA aware, I don't think the cost is too high > to leave it persistently available even under memory pressure if > use_zero_page is enabled. > Measuring access latency to 4GB of memory on Naples I observe ~6.7% slower access latency intrasocket and ~14% slower intersocket. use_zero_page is currently a simple thp flag, meaning it rejects writes where val != !!val, so perhaps it would be best to overload it with additional options? I can imagine 0x2 defining persistent allocation so that the hzp is not freed when the refcount goes to 0 and 0x4 defining if the hzp should be per node. Implementing persistent allocation fixes our concern with it, so I'd like to start there. Comments?