Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2673308rdb; Fri, 8 Dec 2023 15:56:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IE9JLrxEiaYeb5U/r6UVTrFARf4iLzJMOFNx0ZPXYzQoWrbCPXeTrlP+1teIT9epav6CDXS X-Received: by 2002:a05:6a00:2d09:b0:6b1:c1c4:ae98 with SMTP id fa9-20020a056a002d0900b006b1c1c4ae98mr1083526pfb.18.1702079807968; Fri, 08 Dec 2023 15:56:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702079807; cv=none; d=google.com; s=arc-20160816; b=kgkS+uYXrNJOHw5eGkegvAu2IpxqtZuRlFfa/cPtWHwvrZ4oycvdQ4dSs+SzjuQ/1J EEHWDarsu/Q5MtxrCzXmZy2ya06eKNi1IL+zP0fw/O2XnbFR9H/zO5asPH56rKXFXnAg XnQAS9h/n/yWC32MObMbQgNL/3kVexMQZMyO+NWAbM5ktM7S0ASJmfUAv0b+d8ecvL+6 9yvCiMAuJBCNTStucbX6Y8+B1nONdnJ8p8GQZNY59OIU+27YHiBxONWS9ZLowUwmhMID H22qOes0RbdyoLu/E6hAXJKrAIrXNH70JN+ipCSA9ro3iUSChS16knB9dD8wtz9GuEs8 VJGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=peCMXDo9eaMOOydZ/8L8pLC6/eRpJHpJZfXn1ST7dfg=; fh=WlwejR/FnNyp7KlhgBeiXHy4JWhj59aGG7mLmTrLayM=; b=G/bnRDITzK0EBEEgfoQSu5iPkHXZ7vjSQ2fuDgtqb1Kznd9O/2uyD9992Xuy4Vbqxe z6iyP2M+PfOA0ariwFSX9DpfrqoeWeEYuisowgpiAzRRmJJMhnPFa6U0ZgBMnzczxX3q 96LX1o6bXWkkeQDSU4m4EmB/k1oG58vsUApQtDQ53q87FaSHVXMmTsveTg3KvaH+JQN6 dZ3oyPi5FMw433PvTPs1rZsh1vxhj+XQl39ROsUE6iXiotPSKsKlcjy+vXx046ShLUmi dud9uMW611dnd4DsQGwPPZuy7vA0wDanMsrk6o02aHvoLG0GtQSb6aSgrjykmqHH/Kuv 4wgg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hj3PU0v9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id f34-20020a635562000000b005bdbd32d09bsi2099610pgm.436.2023.12.08.15.56.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 15:56:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hj3PU0v9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id CA29C83A159A; Fri, 8 Dec 2023 15:56:44 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234087AbjLHX41 (ORCPT + 99 others); Fri, 8 Dec 2023 18:56:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229525AbjLHX40 (ORCPT ); Fri, 8 Dec 2023 18:56:26 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C84AE1723 for ; Fri, 8 Dec 2023 15:56:32 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 45101C433C7 for ; Fri, 8 Dec 2023 23:56:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702079792; bh=peCMXDo9eaMOOydZ/8L8pLC6/eRpJHpJZfXn1ST7dfg=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=hj3PU0v9aOgdl1M+5LrdlvjEIv0aHOALa5VsbLO2gpBPFsTBAkbjGBij+QuCNlyh6 B1xrzSU1DT24SfGrcjFly7a4KIYhwfbOChnqZDDC5pdRmc9+4dJV06u39OHqmwL2cm 5mqAbsRRXWKu1ThcfooNuYa/5GAHsrI4KApagQWgmnHbhHx+d0mriSM39KZBhuvdd/ LebmirYNPbNLgtiSycbxc7Sgr42B6cENzERQ1rSgK1J1sbrl7nqiXe6CyyU7dnJ2Ac nrmM0vFhg6c85QFv0pliX7DYK8MPCEC0VBB4Uvn3YfII6vww7Xl8+bll3Z8AxzvY11 87l5pAcVEV1LA== Received: by mail-oi1-f171.google.com with SMTP id 5614622812f47-3b9e6262fccso1116043b6e.3 for ; Fri, 08 Dec 2023 15:56:32 -0800 (PST) X-Gm-Message-State: AOJu0YxJKk6nXgMUp3sM7gnhQlGnoXPqg5nrbJg5zoGna86fyjRYYvJq 4Gh81snGz2x/oOIuGmk7UkHxCl/bHjCxHyQkTyJQlg== X-Received: by 2002:a05:6a21:2715:b0:18f:97c:9789 with SMTP id rm21-20020a056a21271500b0018f097c9789mr846669pzb.113.1702079771090; Fri, 08 Dec 2023 15:56:11 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> In-Reply-To: From: Chris Li Date: Fri, 8 Dec 2023 15:55:59 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Nhat Pham Cc: akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Kairui Song , Minchan Kim , Zhongkun He Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 08 Dec 2023 15:56:44 -0800 (PST) Hi Nhat, On Thu, Dec 7, 2023 at 5:03=E2=80=AFPM Nhat Pham wrote: > > On Thu, Dec 7, 2023 at 4:19=E2=80=AFPM Chris Li wrote= : > > > > Hi Nhat, > > > > > > On Thu, Dec 7, 2023 at 11:24=E2=80=AFAM Nhat Pham w= rote: > > > > > > During our experiment with zswap, we sometimes observe swap IOs due t= o > > > occasional zswap store failures and writebacks-to-swap. These swappin= g > > > IOs prevent many users who cannot tolerate swapping from adopting zsw= ap > > > to save memory and improve performance where possible. > > > > > > This patch adds the option to disable this behavior entirely: do not > > > writeback to backing swapping device when a zswap store attempt fail, > > > and do not write pages in the zswap pool back to the backing swap > > > device (both when the pool is full, and when the new zswap shrinker i= s > > > called). > > > > > > This new behavior can be opted-in/out on a per-cgroup basis via a new > > > cgroup file. By default, writebacks to swap device is enabled, which = is > > > the previous behavior. Initially, writeback is enabled for the root > > > cgroup, and a newly created cgroup will inherit the current setting o= f > > > its parent. > > > > > > Note that this is subtly different from setting memory.swap.max to 0,= as > > > it still allows for pages to be stored in the zswap pool (which itsel= f > > > consumes swap space in its current form). > > > > > > This patch should be applied on top of the zswap shrinker series: > > > > > > https://lore.kernel.org/linux-mm/20231130194023.4102148-1-nphamcs@gma= il.com/ > > > > > > as it also disables the zswap shrinker, a major source of zswap > > > writebacks. > > > > I am wondering about the status of "memory.swap.tiers" proof of concept= patch? > > Are we still on board to have this two patch merge together somehow so > > we can have > > "memory.swap.tiers" =3D=3D "all" and "memory.swap.tiers" =3D=3D "zswap"= cover the > > memory.zswap.writeback =3D=3D 1 and memory.zswap.writeback =3D=3D 0 cas= e? > > > > Thanks > > > > Chris > > > > Hi Chris, > > I briefly summarized my recent discussion with Johannes here: > > https://lore.kernel.org/all/CAKEwX=3DNwGGRAtXoNPfq63YnNLBCF0ZDOdLVRsvzUmY= hK4jxzHA@mail.gmail.com/ Sorry I am traveling in a different time zone so not able to get to that email sooner. That email is only sent out less than one day before the V6 patch right? > > TL;DR is we acknowledge the potential usefulness of swap.tiers > interface, but the use case is not quite there yet, so it does not I disagree about no use case. No use case for Meta !=3D no usage case for the rest of the linux kernel community. That mindset really needs to shift to do Linux kernel development. Respect other's usage cases. It is not just Meta's Linux kernel. It is everybody's Linux kernel. I can give you three usage cases right now: 1) Google producting kernel uses SSD only swap, it is currently on pilot. This is not expressible by the memory.zswap.writeback. You can set the memory.zswap.max =3D 0 and memory.zswap.writeback =3D 1, then SSD backed swapfile. But the whole thing feels very clunky, especially what you really want is SSD only swap, you need to do all this zswap config dance. Google has an internal memory.swapfile feature implemented per cgroup swap file type by "zswap only", "real swap file only", "both", "none" (the exact keyword might be different). running in the production for almost 10 years. The need for more than zswap type of per cgroup control is really there. 2) As indicated by this discussion, Tencent has a usage case for SSD and hard disk swap as overflow. https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmail.com/ +Kairui 3) Android has some fancy swap ideas led by those patches. https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan@kernel.or= g/ It got shot down due to removal of frontswap. But the usage case and product requirement is there. +Minchan > make too much sense to build up that heavy machinery now. Does my minimal memory.swap.tiers patch to support "zswap" and "all" sound heavy machinery to you? It is the same implementation under the hood. I am only trying to avoid introducing something that will be foreseeable obsolete. > zswap.writeback is a more urgent need, and does not prevent swap.tiers > if we do decide to implement it. I respect that urgent need, that is why I Ack on the V5 path, under the understanding that this zswap.writeback is not carved into stones. When a better interface comes alone, that interface can be obsolete. Frankly speaking I would much prefer not introducing the cgroup API which will be obsolete soon. If you think zswap.writeback is not removable when another better alternative is available, please voice it now. If you squash my minimal memory.swap.tiers patch, it will also address your urgent need for merging the "zswap.writeback", no? Chris