Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp34401rdb; Thu, 21 Dec 2023 01:53:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IHHwDs/7zkY3r+oS3UuKTX0PadwUb65C6/vulMPBS/Gg2f5Yb4hmblkqajH3VABplNFE0q6 X-Received: by 2002:a17:903:451:b0:1d3:ee66:6a93 with SMTP id iw17-20020a170903045100b001d3ee666a93mr1531415plb.91.1703152427242; Thu, 21 Dec 2023 01:53:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703152427; cv=none; d=google.com; s=arc-20160816; b=vBtyt26QBTBNYQuXqF4DE+mqnOGuYHofEcOmDizqqG4WJjMeBxAatwgAzsAvkMfTku mqqFBJUEy1j678aFf/KTqGv0iY0t47Vyr3Vanih1I3JBGxcx56PGVsmE+tnbNJRHHLxC X6uz3q5c7k6GKD8tdY6yotWtgO07oS+rk5hEh4FpEh9BLw+anSxfHEydAk0uHM//yT4T 1bq8AP2xaAUEp/m7NSpk/xeN2xt5M/VqjW9PLpRkH2IP/29/LY8viV5W9MAwJELSMs2d LAbI2PkwhFvSaktxvB7b44XswzIWPItIK26ExpV1wxOlzt1LSVYnVqGSuWRk5ipgo3+4 jVAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature:dkim-signature; bh=H4HSDhkn3mbzmn+HYU4/3nbaXxmpHuloAKvKi9O9PuU=; fh=8wMjPanKEHjBSY/Z0HTjPsovVGUD32YDHSfKGzjmsic=; b=rFw7sIS6S2NgVKCyCcQ37sRT+Fpati0DmDCEdVOFNX1J2TyN343LTlgSq581l78yVL auQUfMi2elDhUQAHcScrELkP7kxRvSQ1XIkaNG+By1PPlNaP7yGphuBgoMoVgiPPlEvA vjmgBRjE24JE39i+4FZ6cWBy37jvc1hUltJ3AJ5neEd7HRZsUS3sW+TRu4OXmTnKXjBf NBpFDYX/qXzLf1YWLcaRMAK3cv1lCqt/mDyz5Nh34fee1unGCUcu46FYqx1J4fZkWx2+ UZ8Ntxi29vnZ54YepigcFy1wT2cLPkMxHV5rQk5UGVg/ItQUUZAlALTB1gj2qqLzvuAC JxtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=AgAxTj55; dkim=pass header.i=@suse.com header.s=susede1 header.b=AgAxTj55; spf=pass (google.com: domain of linux-kernel+bounces-8136-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8136-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id u1-20020a170902e5c100b001d3622d0c97si1192961plf.631.2023.12.21.01.53.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 01:53:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-8136-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=AgAxTj55; dkim=pass header.i=@suse.com header.s=susede1 header.b=AgAxTj55; spf=pass (google.com: domain of linux-kernel+bounces-8136-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8136-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 8FD73B26018 for ; Thu, 21 Dec 2023 09:36:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A353E4B153; Thu, 21 Dec 2023 09:30:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="AgAxTj55"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="AgAxTj55" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E420D4F8B4; Thu, 21 Dec 2023 09:30:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 3AA9F1FB5E; Thu, 21 Dec 2023 09:30:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1703151000; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H4HSDhkn3mbzmn+HYU4/3nbaXxmpHuloAKvKi9O9PuU=; b=AgAxTj55I08NGUtiw7xFFRpCJA9QY9GYswORVLiK6V8E/0xB8HJFqZg7nkUOPXh+xTjrtF itfvWm1l5RiV7Yp/cQ0k17lUCDF+jZh9y1zcEZsDNVwHBRFx0Qp4yApRMvZXqkpK1CabC7 JuC/t2PNhuMfCWcaCH3mDiJgqGnZyo0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1703151000; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H4HSDhkn3mbzmn+HYU4/3nbaXxmpHuloAKvKi9O9PuU=; b=AgAxTj55I08NGUtiw7xFFRpCJA9QY9GYswORVLiK6V8E/0xB8HJFqZg7nkUOPXh+xTjrtF itfvWm1l5RiV7Yp/cQ0k17lUCDF+jZh9y1zcEZsDNVwHBRFx0Qp4yApRMvZXqkpK1CabC7 JuC/t2PNhuMfCWcaCH3mDiJgqGnZyo0= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id EC01213725; Thu, 21 Dec 2023 09:29:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id MaiZNpcFhGU2MgAAD6G6ig (envelope-from ); Thu, 21 Dec 2023 09:29:59 +0000 Date: Thu, 21 Dec 2023 10:29:59 +0100 From: Michal Hocko To: Dan Schatzberg Cc: Johannes Weiner , Roman Gushchin , Yosry Ahmed , Huan Yang , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Tejun Heo , Zefan Li , Jonathan Corbet , Shakeel Butt , Muchun Song , Andrew Morton , Kefeng Wang , SeongJae Park , "Vishal Moola (Oracle)" , Nhat Pham , Yue Zhao Subject: Re: [PATCH v5 2/2] mm: add swapiness= arg to memory.reclaim Message-ID: References: <20231220152653.3273778-1-schatzberg.dan@gmail.com> <20231220152653.3273778-3-schatzberg.dan@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231220152653.3273778-3-schatzberg.dan@gmail.com> X-Spam-Level: X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Level: X-Spam-Flag: NO X-Spamd-Result: default: False [-1.31 / 50.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_RATELIMIT(0.00)[to_ip_from(RLsgd6kpfonsu388crrfsk7e3y)]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[suse.com:+]; MX_GOOD(-0.01)[]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; FROM_HAS_DN(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; RCPT_COUNT_TWELVE(0.00)[19]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:email]; MID_RHS_NOT_FQDN(0.50)[]; FREEMAIL_CC(0.00)[cmpxchg.org,linux.dev,google.com,vivo.com,vger.kernel.org,kvack.org,kernel.org,bytedance.com,lwn.net,linux-foundation.org,huawei.com,gmail.com]; RCVD_TLS_ALL(0.00)[]; SUSPICIOUS_RECIPS(1.50)[]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from] Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=AgAxTj55 X-Spam-Score: -1.31 X-Rspamd-Queue-Id: 3AA9F1FB5E On Wed 20-12-23 07:26:51, Dan Schatzberg wrote: > Allow proactive reclaimers to submit an additional swappiness= > argument to memory.reclaim. This overrides the global or per-memcg > swappiness setting for that reclaim attempt. > > For example: > > echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim > > will perform reclaim on the rootcg with a swappiness setting of 0 (no > swap) regardless of the vm.swappiness sysctl setting. > > Userspace proactive reclaimers use the memory.reclaim interface to > trigger reclaim. The memory.reclaim interface does not allow for any way > to effect the balance of file vs anon during proactive reclaim. The only > approach is to adjust the vm.swappiness setting. However, there are a > few reasons we look to control the balance of file vs anon during > proactive reclaim, separately from reactive reclaim: > > * Swapout should be limited to manage SSD write endurance. In near-OOM > situations we are fine with lots of swap-out to avoid OOMs. As these are > typically rare events, they have relatively little impact on write > endurance. However, proactive reclaim runs continuously and so its > impact on SSD write endurance is more significant. Therefore it is > desireable to control swap-out for proactive reclaim separately from > reactive reclaim > > * Some userspace OOM killers like systemd-oomd[1] support OOM killing on > swap exhaustion. This makes sense if the swap exhaustion is triggered > due to reactive reclaim but less so if it is triggered due to proactive > reclaim (e.g. one could see OOMs when free memory is ample but anon is > just particularly cold). Therefore, it's desireable to have proactive > reclaim reduce or stop swap-out before the threshold at which OOM > killing occurs. > > In the case of Meta's Senpai proactive reclaimer, we adjust > vm.swappiness before writes to memory.reclaim[2]. This has been in > production for nearly two years and has addressed our needs to control > proactive vs reactive reclaim behavior but is still not ideal for a > number of reasons: > > * vm.swappiness is a global setting, adjusting it can race/interfere > with other system administration that wishes to control vm.swappiness. > In our case, we need to disable Senpai before adjusting vm.swappiness. > > * vm.swappiness is stateful - so a crash or restart of Senpai can leave > a misconfigured setting. This requires some additional management to > record the "desired" setting and ensure Senpai always adjusts to it. > > With this patch, we avoid these downsides of adjusting vm.swappiness > globally. Thank you for extending the changelog with usecases! > [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html > [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598 > > Signed-off-by: Dan Schatzberg > --- > Documentation/admin-guide/cgroup-v2.rst | 18 ++++---- > include/linux/swap.h | 3 +- > mm/memcontrol.c | 56 ++++++++++++++++++++----- > mm/vmscan.c | 13 +++++- > 4 files changed, 69 insertions(+), 21 deletions(-) LGTM Acked-by: Michal Hocko swappiness) + return *sc->swappiness; + return mem_cgroup_swappiness(memcg); +} #else static bool cgroup_reclaim(struct scan_control *sc) { @@ -245,6 +254,10 @@ static bool writeback_throttling_sane(struct scan_control *sc) { return true; } +static int sc_swappiness(struct scan_control *sc, struct mem_cgroup *memcg) +{ + return READ_ONCE(vm_swappiness); +} #endif static void set_task_reclaim_state(struct task_struct *task, @@ -2330,8 +2343,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); unsigned long anon_cost, file_cost, total_cost; - int swappiness = sc->swappiness ? - *sc->swappiness : mem_cgroup_swappiness(memcg); + int swappiness = sc_swappiness(sc, memcg); u64 fraction[ANON_AND_FILE]; u64 denominator = 0; /* gcc */ enum scan_balance scan_balance; @@ -2612,10 +2624,7 @@ static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc) mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH) return 0; - if (sc->swappiness) - return *sc->swappiness; - - return mem_cgroup_swappiness(memcg); + return sc_swappiness(sc, memcg); } static int get_nr_gens(struct lruvec *lruvec, int type) -- Michal Hocko SUSE Labs