Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp6145799rdb; Thu, 14 Dec 2023 09:24:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IF/J1YqtIOlJ/Ap+jX0RmvN7pvMAW74QaIHZ39kJJLrQIc1c4f8e40nAgJDmB3+006BsJbS X-Received: by 2002:a92:ca4a:0:b0:35f:727b:2193 with SMTP id q10-20020a92ca4a000000b0035f727b2193mr4672663ilo.25.1702574673716; Thu, 14 Dec 2023 09:24:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702574673; cv=none; d=google.com; s=arc-20160816; b=DZDqwK8bbyuZ+ImUg6uGWO0IJczEjTpTfbYgZoN19djp/bpzsGgM/UVOuuOTmQldLj Gnkq35FDwRXFdN70aMbhZr/lSTW7KEmwcB/EJISX/5InQhQF8/5OnkyATcz6kf469spT fChygjWhMPOqucVsHSoVhTPoaGi4M7BNqBnO1DsbJ1IzLL1AyOPLyloYkz1kfBn4jKFZ 3CnoJYxud99mdVFCwhPM5y4eYmQ9+D5Bg0QrN3SDDoxzZLjXD8W7ZpSZ4qrMT2O4rJ6M G3sWBBZP3Vl5uSy0n7QX4DLIRSDQVxIVHbBpoCOHsgGfu4g5Jv9LqEF2HePvWiJBRWmM Q7nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=p6qaAK6u0d98nEr7ylliI4pEhXdrFGU2M5INaU0c7rY=; fh=EH9lHzNANCcYwr4i17Yae55tox0r4aVgbsf+hCbv27M=; b=yaOfED7wxsrHfwnp5L2dW1KHLzBOyXvww5oacT0ees9BqwuUt1kUtP+XT/iB9mq2gZ xn1v7igT0Bdf6wXgDR9OhCK54InkryVsvX1fX/y3P+3LhjBx/cd0ZqEzpWccIN6i1bBx bOjr6Abba0J4x0r2op2rEA6b0Tt/escbdyuTYmnpjCaexDRjIKHkz+7SKWRoQvu7rk6U jIjExMgj3uRhq9t10P1a/dam24C/7VrV4lbD3GCJwPkWiHIjBDNlcGaZDe12khv9orfU HGb5H16ZXOQWu/R/CmY2HZZghwJEBBbd7TfBwdtpDX0aGnWS3AbyWg7O5q2HUzSc69eN OmZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="Hu/tCLuM"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id v33-20020a631521000000b005c5e214d722si11678776pgl.80.2023.12.14.09.24.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 09:24:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="Hu/tCLuM"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 885C382CB878; Thu, 14 Dec 2023 09:24:30 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1444011AbjLNRYL (ORCPT + 99 others); Thu, 14 Dec 2023 12:24:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229446AbjLNRYJ (ORCPT ); Thu, 14 Dec 2023 12:24:09 -0500 Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AFF593 for ; Thu, 14 Dec 2023 09:24:15 -0800 (PST) Received: by mail-qt1-x829.google.com with SMTP id d75a77b69052e-425c1d7d72eso294061cf.1 for ; Thu, 14 Dec 2023 09:24:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702574654; x=1703179454; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=p6qaAK6u0d98nEr7ylliI4pEhXdrFGU2M5INaU0c7rY=; b=Hu/tCLuMZea+GPFy5N6w3+gppz05ZUevnftdQnv6N810loh3XsbvYyjTMb438tIhMr UM6hwq4G3g2GuO0UT4WMBYzPRHxp/eE2v21gomNeoBe31RAab4r7YyVVwkH90tTSkFJU SkF4oh6TUvq+pT2oo1fmG1PB4PHfGsygJkQqWPSAIXG6KltbalNUQYGclAM7CSrZ+unH 4Znj8+M9Tc2MW8+wgAZj9kw6zF79V6hSaQDG3SFaf/+f6vp8wJcnHxRlcM3STJz9iY9q 51uDzyY5P/sx1WMdbnUM2SSXWoYOuvEq4i3DK1h1hvcSSh4OANVA+XodVWTtT/H//AeI zzzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702574654; x=1703179454; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p6qaAK6u0d98nEr7ylliI4pEhXdrFGU2M5INaU0c7rY=; b=s3JTIXrR3a4A8paLire9wHIX/+ZHjTAC2Paui+RtGzVZCjn/xCQyrrYI/dFovrvYuu qXdT8+2WM6HpUz80iupbCP6X0Vg9rynm0gpLLXqLdgsFJexYmzy7kUxUdyPsG8AGbWqx Xd5tkLzOclNqOXu8iNyN6IzwOLMN71VNlgqINMcTspdnM4YvubL1gO6kkPFOgbRalmnN 3fSY9jn6RiVsvpWsAm+BD/jqPCrW229Hdf7n2sm6IRzOVkEkCjvU8my/8SXsk67j1AbW 03uODALDQMQQvgYxkv56RxQUAEHzESbLpc1HQNhi08ZWidTW/shGnrxXFNgaWPu+6nz4 UEiA== X-Gm-Message-State: AOJu0YwZS84MyL/xU6Nsdp+Mn592mN0soXu/kCliEt9lae6IvdCnD3l1 AvW6ok78Q3gVvvDxeuJqBo3Oee+TZgWlCh9/JFV5AA== X-Received: by 2002:a05:622a:1486:b0:421:c3a9:1e47 with SMTP id t6-20020a05622a148600b00421c3a91e47mr1836477qtx.20.1702574654261; Thu, 14 Dec 2023 09:24:14 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231209034229.GA1001962@cmpxchg.org> <20231214171137.GA261942@cmpxchg.org> In-Reply-To: <20231214171137.GA261942@cmpxchg.org> From: Yu Zhao Date: Thu, 14 Dec 2023 10:23:35 -0700 Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Johannes Weiner Cc: Minchan Kim , Chris Li , Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Kairui Song , Zhongkun He , Fabian Deutsch Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 14 Dec 2023 09:24:30 -0800 (PST) On Thu, Dec 14, 2023 at 10:11=E2=80=AFAM Johannes Weiner wrote: > > On Mon, Dec 11, 2023 at 02:55:43PM -0800, Minchan Kim wrote: > > On Fri, Dec 08, 2023 at 10:42:29PM -0500, Johannes Weiner wrote: > > > On Fri, Dec 08, 2023 at 03:55:59PM -0800, Chris Li wrote: > > > > I can give you three usage cases right now: > > > > 1) Google producting kernel uses SSD only swap, it is currently on > > > > pilot. This is not expressible by the memory.zswap.writeback. You c= an > > > > set the memory.zswap.max =3D 0 and memory.zswap.writeback =3D 1, th= en SSD > > > > backed swapfile. But the whole thing feels very clunky, especially > > > > what you really want is SSD only swap, you need to do all this zswa= p > > > > config dance. Google has an internal memory.swapfile feature > > > > implemented per cgroup swap file type by "zswap only", "real swap f= ile > > > > only", "both", "none" (the exact keyword might be different). runni= ng > > > > in the production for almost 10 years. The need for more than zswap > > > > type of per cgroup control is really there. > > > > > > We use regular swap on SSD without zswap just fine. Of course it's > > > expressible. > > > > > > On dedicated systems, zswap is disabled in sysfs. On shared hosts > > > where it's determined based on which workload is scheduled, zswap is > > > generally enabled through sysfs, and individual cgroup access is > > > controlled via memory.zswap.max - which is what this knob is for. > > > > > > This is analogous to enabling swap globally, and then opting > > > individual cgroups in and out with memory.swap.max. > > > > > > So this usecase is very much already supported, and it's expressed in > > > a way that's pretty natural for how cgroups express access and lack o= f > > > access to certain resources. > > > > > > I don't see how memory.swap.type or memory.swap.tiers would improve > > > this in any way. On the contrary, it would overlap and conflict with > > > existing controls to manage swap and zswap on a per-cgroup basis. > > > > > > > 2) As indicated by this discussion, Tencent has a usage case for SS= D > > > > and hard disk swap as overflow. > > > > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmai= l.com/ > > > > +Kairui > > > > > > Multiple swap devices for round robin or with different priorities > > > aren't new, they have been supported for a very, very long time. So > > > far nobody has proposed to control the exact behavior on a per-cgroup > > > basis, and I didn't see anybody in this thread asking for it either. > > > > > > So I don't see how this counts as an obvious and automatic usecase fo= r > > > memory.swap.tiers. > > > > > > > 3) Android has some fancy swap ideas led by those patches. > > > > https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan@k= ernel.org/ > > > > It got shot down due to removal of frontswap. But the usage case an= d > > > > product requirement is there. > > > > +Minchan > > > > > > This looks like an optimization for zram to bypass the block layer an= d > > > hook directly into the swap code. Correct me if I'm wrong, but this > > > doesn't appear to have anything to do with per-cgroup backend control= . > > > > Hi Johannes, > > > > I haven't been following the thread closely, but I noticed the discussi= on > > about potential use cases for zram with memcg. > > > > One interesting idea I have is to implement a swap controller per cgrou= p. > > This would allow us to tailor the zram swap behavior to the specific ne= eds of > > different groups. > > > > For example, Group A, which is sensitive to swap latency, could use zra= m swap > > with a fast compression setting, even if it sacrifices some compression= ratio. > > This would prioritize quick access to swapped data, even if it takes up= more space. > > > > On the other hand, Group B, which can tolerate higher swap latency, cou= ld benefit > > from a slower compression setting that achieves a higher compression ra= tio. > > This would maximize memory efficiency at the cost of slightly slower da= ta access. > > > > This approach could provide a more nuanced and flexible way to manage s= wap usage > > within different cgroups. > > That makes sense to me. > > It sounds to me like per-cgroup swapfiles would be the easiest > solution to this. Someone posted it about 10 years ago :) https://lwn.net/Articles/592923/ +fdeutsch@redhat.com Fabian recently asked me about its status.