Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp594425rdb; Thu, 21 Dec 2023 21:32:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IFqq5Q8tAviMEVfNyeLVfW1WKI/IgchQ+LXZZPkCf7aj/ztV82dPC1hGqSp3IGbe7DqzbRe X-Received: by 2002:a92:cda4:0:b0:35f:8e55:29cd with SMTP id g4-20020a92cda4000000b0035f8e5529cdmr966568ild.83.1703223168308; Thu, 21 Dec 2023 21:32:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703223168; cv=none; d=google.com; s=arc-20160816; b=gM7tc8UzUTc60hhD8/o1Gj+Uj1J2nbltx02uGHRm8RdJX5AB1/BEHoS1W8/Vf08Ymn 1sEB3JxDdtslhxkIlK9FyEoAz9hxZVoiKibdPY7WszUuKCtmw+2v0XcgtuoiZIx4eRYo NUBBU2rJRgvB2e6vXVzEzbA1oXmxKVp15QUrQ0NgeBpsoaFvcs1n7ta1UFNf3vmgYh+l 7tq2u10x0+1c43TCC280JnZh0iQhsq8tFbz9y4mY3MGAgHqZtbgEpE5QblWjMI/QNs1B TDnR6z9y9r8pI6Zp33Mgxm/hskGmLQ3/eqehxx9bEytCu7O2ISTyBvL/tKIkOAJaM00U rpRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=aei49JaNshkuJJzrb21/FHbkROcvggHQaKMtSVNdNfI=; fh=xwUHAcblIzhNTankEvej/3kLZeeLU5LfFrnrmtZZ3+o=; b=zzwH0L7OvIvpqal1oTlXGJqLZ2Twwixt3sdmqUUmQ3MsppL4NJCRk1rPIT4KOVa/0r lJNzEI4phJFPaOAXLssg+d7Yc1rzf+jy7AANed+EuiUh+W7JQxEcUo9pJWGE3J693fMD IAIQ5CqL0k0aX9Q4a2Vd+lKLb/zK+1/e4PjdiM25q00jyND3EZS8ydLLseHbrDCflrnQ IseKdgPAXu9567XnGxGbqluApwHOwZ76F9+l+mGzL2gBhrtNcJTU7KnNp/iQYcPBwXF5 7JB3bZwvhenPGcWBWhyC+JUs9aUVVaWaFZIHe7kuLkNoKkd2G5QAtUSwqHSAEJMixbSG /iHA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=3csyLWMQ; spf=pass (google.com: domain of linux-kernel+bounces-9368-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-9368-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id c13-20020a65420d000000b005c622a6f3d8si2713041pgq.765.2023.12.21.21.32.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 21:32:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-9368-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=3csyLWMQ; spf=pass (google.com: domain of linux-kernel+bounces-9368-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-9368-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 8616D287642 for ; Fri, 22 Dec 2023 05:32:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4E3266126; Fri, 22 Dec 2023 05:32:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3csyLWMQ" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E49E4539A for ; Fri, 22 Dec 2023 05:32:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-40d3102d5d6so34885e9.0 for ; Thu, 21 Dec 2023 21:32:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1703223157; x=1703827957; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aei49JaNshkuJJzrb21/FHbkROcvggHQaKMtSVNdNfI=; b=3csyLWMQHZj8+S7+DD/W2J95An+k5Mco4O/Vwiz3LRe647IDsjMFk5sU5sqWVqc8jT n2obgHvv/AH/YuK/dRKrmQ8Lvv0H1zHcHJl0MyaLjYqQZ2laNvpD5BhMm/5IaS6CtZ2c RZp2PAWEs0Vje5nGKoxfweFUERgTsqwFVFj7lxzDFka21fILAuFHj36t6FzIlMLcUgP5 Q+FrdjCs80pPjrZJxTto4PeNWx8UZYfAV9HtL+DUfORYQTdgJyY5zpxrkHh1sGF7jPdH kicJQ/8GCoLfOm6goEmdWsefJgRuhvmjDz/e0f+ctXOiSJV1v9+LzFv/H1TflhncdWfZ XQTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703223157; x=1703827957; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aei49JaNshkuJJzrb21/FHbkROcvggHQaKMtSVNdNfI=; b=rmanBbBNF/BhfwidV3CziZGcEk/6X1KkEU/F9xjNNgB6xU7olrOOXBj5+Pg/209VZ/ aKEUIYO5k/Nn+gcDSctMJtMVXKg7blcqm2Bx/kz5O/8hwADWoa1Q12AQQhJotFQStllL hgD++kS7X9z/0uNrw/A4AXaIU1CAq962Wce0vMEzB+pwSZbIFah61hEf8ZD5voFWQBok 1wNw7kIrurSJisx6X34zTQNlSWXFGBiYZEeXznJ2TS+KMeLSwNbbrDpfEd8zxPf14v+a BkgU+VT0yBU30wPa8n0k6u6nadGtKnxCLgvBaG14UHgYnSTYJvan08T2pjLxdeuLbHMN Xhow== X-Gm-Message-State: AOJu0YxcmMfPrAgko8cTqY0QNRAAVvyW9+tRvUwwq2/DE0QXixMs94ys MFMUXCy60qiwxYsB6uSTbvXRkkVX34FGYkaff7kad7jbZeW+ X-Received: by 2002:a05:600c:1c85:b0:40b:4355:a04b with SMTP id k5-20020a05600c1c8500b0040b4355a04bmr70112wms.6.1703223156969; Thu, 21 Dec 2023 21:32:36 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231220152653.3273778-1-schatzberg.dan@gmail.com> <20231220152653.3273778-3-schatzberg.dan@gmail.com> In-Reply-To: <20231220152653.3273778-3-schatzberg.dan@gmail.com> From: Yu Zhao Date: Thu, 21 Dec 2023 22:31:59 -0700 Message-ID: Subject: Re: [PATCH v5 2/2] mm: add swapiness= arg to memory.reclaim To: Dan Schatzberg Cc: Johannes Weiner , Roman Gushchin , Yosry Ahmed , Huan Yang , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Tejun Heo , Zefan Li , Jonathan Corbet , Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , Kefeng Wang , SeongJae Park , "Vishal Moola (Oracle)" , Nhat Pham , Yue Zhao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Dec 20, 2023 at 8:27=E2=80=AFAM Dan Schatzberg wrote: > > Allow proactive reclaimers to submit an additional swappiness=3D > argument to memory.reclaim. This overrides the global or per-memcg > swappiness setting for that reclaim attempt. > > For example: > > echo "2M swappiness=3D0" > /sys/fs/cgroup/memory.reclaim > > will perform reclaim on the rootcg with a swappiness setting of 0 (no > swap) regardless of the vm.swappiness sysctl setting. > > Userspace proactive reclaimers use the memory.reclaim interface to > trigger reclaim. The memory.reclaim interface does not allow for any way > to effect the balance of file vs anon during proactive reclaim. The only > approach is to adjust the vm.swappiness setting. However, there are a > few reasons we look to control the balance of file vs anon during > proactive reclaim, separately from reactive reclaim: > > * Swapout should be limited to manage SSD write endurance. In near-OOM > situations we are fine with lots of swap-out to avoid OOMs. As these are > typically rare events, they have relatively little impact on write > endurance. However, proactive reclaim runs continuously and so its > impact on SSD write endurance is more significant. Therefore it is > desireable to control swap-out for proactive reclaim separately from > reactive reclaim > > * Some userspace OOM killers like systemd-oomd[1] support OOM killing on > swap exhaustion. This makes sense if the swap exhaustion is triggered > due to reactive reclaim but less so if it is triggered due to proactive > reclaim (e.g. one could see OOMs when free memory is ample but anon is > just particularly cold). Therefore, it's desireable to have proactive > reclaim reduce or stop swap-out before the threshold at which OOM > killing occurs. > > In the case of Meta's Senpai proactive reclaimer, we adjust > vm.swappiness before writes to memory.reclaim[2]. This has been in > production for nearly two years and has addressed our needs to control > proactive vs reactive reclaim behavior but is still not ideal for a > number of reasons: > > * vm.swappiness is a global setting, adjusting it can race/interfere > with other system administration that wishes to control vm.swappiness. > In our case, we need to disable Senpai before adjusting vm.swappiness. > > * vm.swappiness is stateful - so a crash or restart of Senpai can leave > a misconfigured setting. This requires some additional management to > record the "desired" setting and ensure Senpai always adjusts to it. > > With this patch, we avoid these downsides of adjusting vm.swappiness > globally. > > [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.s= ervice.html > [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/S= enpai.cpp#L585-L598 > > Signed-off-by: Dan Schatzberg The cover letter says: "Previously, this exact interface addition was proposed by Yosry[3]." So I think it should be acknowledged with a Suggested-by, based on: "A Suggested-by: tag indicates that the patch idea is suggested by the person named and ensures credit to the person for the idea." from https://docs.kernel.org/process/submitting-patches.html#using-reported-by-t= ested-by-reviewed-by-suggested-by-and-fixes > Documentation/admin-guide/cgroup-v2.rst | 18 ++++---- > include/linux/swap.h | 3 +- > mm/memcontrol.c | 56 ++++++++++++++++++++----- > mm/vmscan.c | 13 +++++- > 4 files changed, 69 insertions(+), 21 deletions(-) ... > diff --git a/mm/vmscan.c b/mm/vmscan.c > index d91963e2d47f..aa5666842c49 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -92,6 +92,9 @@ struct scan_control { > unsigned long anon_cost; > unsigned long file_cost; > > + /* Swappiness value for reclaim. NULL will fall back to per-memcg= /global value */ > + int *swappiness; Using a pointer to indicate whether the type it points to is overridden isn't really a good practice. A better alternative was suggested during the v2: "Perhaps the negative to avoid unnecessary dereferences." https://lore.kernel.org/linux-mm/dhhjw4h22q4ngwtxmhuyifv32zjd6z2relrcjgnxsw= 6zys3mod@o6dh5dy53ae3/ Since only proactive reclaim can override swappiness, meaning it only happens if sc->proactive is true, I think the best way to make it work without spending much effort is create a helper as Michal suggest but it should look like: sc_swappiness() { return sc->proactive ? sc->swappiness : mem_cgroup_swappiness(memcg); } In this patchset, sc->swappiness really means sc->proactive_swappiness. So it should be renamed accordingly.