Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp452728rdb; Thu, 30 Nov 2023 08:57:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IGH8CVFtm5BhTUesCxDZuVY1RWTPch/EQSAWFVvJHF+eVrJzQnhhJu6STy3v+1kHfD6oJcx X-Received: by 2002:a17:90b:38ca:b0:285:bacb:775d with SMTP id nn10-20020a17090b38ca00b00285bacb775dmr13916494pjb.48.1701363438290; Thu, 30 Nov 2023 08:57:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701363438; cv=none; d=google.com; s=arc-20160816; b=SF8778f9BpO9mHR7toWh6yq1bw2TbBmgsgaQQxYmK+9fzaPexViU9xiJj27ZeFJaVR Cyhm+uBuwSXPQSYvCqyoReGKyoaOJCJ5eH4rIYuRdu6QlZpW+l0wdFFyjniQvKAhvADY 8KJc2IMfryDDoRe6Vfchc7XroLg00yFyhTuRCTVm4SU8bK8kUzhnhsrhjsM+T6cMLR/V 7DEhTkrkvv3xtpMSiHPQ0Ir7kfNBojqjqRSZZ4HUhFBcaB0ztEioejntwdrOBTK12B4K MnhzaCKF2+msrPZZ4XpZIY7ANXis1TNV5E0Xf3PL0E59z3MiIDEKRqYCuhRKC9sXnunY wJiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=/OnsmpUH3uPWb9adT3MVl22lO/c/KEyWJNg1p4mH6C4=; fh=A2+EB2fJr1mi0DM5boaEk/NqsHw0b0zooH+NWDHiYrM=; b=dTfB7kN4JBEnnX+AS9f/PxIWDD8b/eP9L+XGugScta2NOTHWYGTrNRbZsOsww17GNj 3cVljS7fgLeXhTEsv+0tLoNoacVa1yerCpeydsSYw2J7hRRVa0t8iGcFWv4TYtF8FYzZ W+ZVD2Pce2i1VX/vkDVOTM4LuWgnyP2rfuOHgDmZTFgNASWVIr9HcEoY6chI2nQHY6Sa g0DqohMw25w68Rmz8JLaMhCxYu8wNexOk36+kqVnQO2JK/0XpPSNLGN6f1Nu3JdvR4pw lOWIpquBBHObuSYAch+SNVTH60GCVXlznfqTNrpBj9w3mLqLFkUTo9z+iTTwyTZBpxD4 scOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=pD6VQ9MK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id y3-20020a17090a86c300b0028586737bbdsi3954885pjv.82.2023.11.30.08.57.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 08:57:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=pD6VQ9MK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 337378029121; Thu, 30 Nov 2023 08:57:14 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235178AbjK3Q4p (ORCPT + 99 others); Thu, 30 Nov 2023 11:56:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231837AbjK3Q4o (ORCPT ); Thu, 30 Nov 2023 11:56:44 -0500 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2B2E10D0 for ; Thu, 30 Nov 2023 08:56:49 -0800 (PST) Received: by mail-qv1-xf29.google.com with SMTP id 6a1803df08f44-67a91a373edso715376d6.1 for ; Thu, 30 Nov 2023 08:56:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1701363409; x=1701968209; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/OnsmpUH3uPWb9adT3MVl22lO/c/KEyWJNg1p4mH6C4=; b=pD6VQ9MKUnuFwBFsGpNoq1+j+vEQX30TouU48TFLZs3HCG35RBzE3NZiZSy/zYHy11 v0MqMXV72h9+abWZdhIt72lzhSB2z2XKcrpe6yyXB6Bdi+6L8nFB5DyWSrG6OGCPS99N IIf6Kx6KaujCQu9xYLMvaW++O47jLXi/QW8giWw9dTsYLWqnazhVlRnGp7JgQz97S2UQ e1Ol/ledN1c7peyHRM3SreP2byyLJ12e/fry9GxVEEDdunFAqTwD1lYotaQvFRLznfh4 2DYYYxBxYESN9HAis3xO23LCAHzB99282VoK5g4V0o6RuFJXYyO1wRYcUFUtjX7QsnEV oJQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701363409; x=1701968209; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/OnsmpUH3uPWb9adT3MVl22lO/c/KEyWJNg1p4mH6C4=; b=pFvJd3cYxR7fJxqQj7E8QYbv8gL+9y/quxZ8fuPjcPrvnqs9VMOIa7NGEiL3k4EON+ tHK7dMRA1FNHJ00c5SK7BuMM4C9MoXgyBFINcpPh3iyecwlkRupi4r6Py9f+oYS7E82v nme+XQzCil5FWsUGEQgQ0+0XtWuQOIHqLq5VX9HOveYeZ/dP+kbJCvWDucZ8F1sY0fqR LNWkm5JvwyUlVGzAli1lnLykW4c99IRGaCK9qSr0nKJtDIO9MUwEehhgvIA/VvXPM+yZ lKmGRPsHxZkXxFWMz/XMnJcKOxknygVVASdxJE1J4b3bbbT3vld1G56rL/19o9n/0Bf+ qy+Q== X-Gm-Message-State: AOJu0YzYlwewjKsGx9zJsitr8beiITdyFWWaTpj1wS5qEMFvlk5xqFho SWkYxn3N3CxZ8uAI0tZgz+e3IA== X-Received: by 2002:a05:6214:1387:b0:67a:4ab3:991a with SMTP id pp7-20020a056214138700b0067a4ab3991amr14101831qvb.60.1701363408836; Thu, 30 Nov 2023 08:56:48 -0800 (PST) Received: from localhost (2603-7000-0c01-2716-da5e-d3ff-fee7-26e7.res6.spectrum.com. [2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id n3-20020a0ce483000000b0067a33133420sm645030qvl.110.2023.11.30.08.56.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 08:56:48 -0800 (PST) Date: Thu, 30 Nov 2023 11:56:42 -0500 From: Johannes Weiner To: Michal Hocko Cc: Dan Schatzberg , Roman Gushchin , Yosry Ahmed , Huan Yang , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Kefeng Wang , Peter Xu , "Vishal Moola (Oracle)" , Yue Zhao , Hugh Dickins Subject: Re: [PATCH 0/1] Add swappiness argument to memory.reclaim Message-ID: <20231130165642.GA386439@cmpxchg.org> References: <20231130153658.527556-1-schatzberg.dan@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 30 Nov 2023 08:57:14 -0800 (PST) On Thu, Nov 30, 2023 at 04:57:41PM +0100, Michal Hocko wrote: > On Thu 30-11-23 07:36:53, Dan Schatzberg wrote: > [...] > > In contrast, I argue in favor of a swappiness setting not as a way to implement > > custom reclaim algorithms but rather to bias the balance of anon vs file due to > > differences of proactive vs reactive reclaim. In this context, swappiness is the > > existing interface for controlling this balance and this patch simply allows for > > it to be configured differently for proactive vs reactive reclaim. > > I do agree that swappiness is a better interface than explicit anon/file > but the problem with swappiness is that it is more of a hint for the reclaim > rather than a real control. Just look at get_scan_count and its history. > Not only its range has been extended also the extent when it is actually > used has been changing all the time and I think it is not a stretch to > assume that trend to continue. Right, we did tweak the edge behavior of e.g. swappiness=0. And we extended the range to express "anon is cheaper than file", which wasn't possible before, to support the compressed memory case. However, its meaning and impact has been remarkably stable over the years: it allows userspace to specify the relative cost of paging IO between file and anon pages. This comment is from 2.6.28: /* * With swappiness at 100, anonymous and file have the same priority. * This scanning priority is essentially the inverse of IO cost. */ anon_prio = sc->swappiness; file_prio = 200 - sc->swappiness; And this is it today: /* * Calculate the pressure balance between anon and file pages. * * The amount of pressure we put on each LRU is inversely * proportional to the cost of reclaiming each list, as * determined by the share of pages that are refaulting, times * the relative IO cost of bringing back a swapped out * anonymous page vs reloading a filesystem page (swappiness). * * Although we limit that influence to ensure no list gets * left behind completely: at least a third of the pressure is * applied, before swappiness. * * With swappiness at 100, anon and file have equal IO cost. */ total_cost = sc->anon_cost + sc->file_cost; anon_cost = total_cost + sc->anon_cost; file_cost = total_cost + sc->file_cost; total_cost = anon_cost + file_cost; ap = swappiness * (total_cost + 1); ap /= anon_cost + 1; fp = (200 - swappiness) * (total_cost + 1); fp /= file_cost + 1; So swappiness still means the same it did 15 years ago. We haven't changed the default swappiness setting, and we haven't broken any existing swappiness configurations through VM changes in that time. There are a few scenarios where swappiness doesn't apply: - No swap. Oh well, that seems reasonable. - Priority=0. This applies to near-OOM situations where the MM system tries to save itself. This isn't a range in which proactive reclaimers (should) operate. - sc->file_is_tiny. This doesn't apply to cgroup reclaim and thus proactive reclaim. - sc->cache_trim_mode. This implements clean cache dropbehind, and applies in the presence of large, non-refaulting inactive cache. The assumption there is that this data is reclaimable without involving IO to evict, and without the expectation of refault IO in the future. Without IO involvement, the relative IO cost isn't a factor. This will back off when refaults are observed, and the IO cost setting is then taken into account again as expected. If you consider swappiness to mean "reclaim what I ask you to", then this would override that, yes. But in the definition of relative IO cost, this decision making is permissible. Note that this applies to the global swappiness setting as well, and nobody has complained about it. So I wouldn't say it's merely a reclaim hint. It controls a very concrete and influential factor in VM decision making. And since the global swappiness is long-established ABI, I don't expect its meaning to change significantly any time soon.