Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp6137967rdb; Thu, 14 Dec 2023 09:12:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IEsQ/Zt08d0atsxLqHy19tkUNbUSf4bdXAe3jrgPW/t8K3mrjRh4Occuy97WwpdNfSH8hk8 X-Received: by 2002:a17:902:d4cf:b0:1d0:8abf:3f3 with SMTP id o15-20020a170902d4cf00b001d08abf03f3mr6474858plg.35.1702573975805; Thu, 14 Dec 2023 09:12:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702573975; cv=none; d=google.com; s=arc-20160816; b=ML8m9OyGdUDtNITeu+//SZ5y7v+cUMp+ueeJzV4d2S9NaOIW6+ejAE+imh9yRB0FjG gmkPwrLOUIHqgS38Vm+Iw+5qtPDSxNXKwRMDsAmJYa/C1Kwn/ueO/K+lHJiX613CUzHq E0BHGdoLkbPGQ9bkv8CQolPIe6IJrcgf1jdwb0LsfrvmHMtCPKjygBiEUGosyMooyZ96 VPooGX1EIzIUaxkrNJZTyndKwWkFXAQN91h2uMg+ujPuef5y1Jd4FZxT2sHl5gnMNpZ/ NgXuHh/75tcf9lQ7cKna58KFxaOI9ko38CXYEoscqrjkDNiPS1EBDiiHmSnuMEcEkt3V 3+sA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=mxUe6pUm6l0Cf+AsboBSISPAY2O3/ATI4QcyYk76Qaw=; fh=ocVt73Y9CkiHmSoGgpCXWgnJkp3B0M+TnTzMioTjCZw=; b=D6+nPjvJyfX4VyljXAZplA+hrGft75xaBh574BiZawG4DM8tjVHcX8nMwCF3+iowZr JtJjXC7BDw8yIU1FlsnaF8X6MgYPIi0Sz88kdW7a8A9AzwnHtrIzCMaCjColrnfpj2el XqjWMlD6M1pzyqYNZhqno8zG1FOAMK6wOQQmnkAo2ii+QbJulU/or6bupVO9eHhLK4wG IMqKqUarmlpF5skF4shQLjcg+Fp5A64FgsUPdRLg4w8q0R+X+ZZDvHtsagnvO+xKlEJN JAT3BekhXGd7V3pjV9cJKJhlRp4uGHT5LPiOCUx1dzHxbn/9IazNsDI4RnZwsNEJ046d apBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=ZuWmgDP2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id w2-20020a170902e88200b001b045d65aedsi11599259plg.228.2023.12.14.09.12.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 09:12:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=ZuWmgDP2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id E0E218088A9F; Thu, 14 Dec 2023 09:12:52 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1443890AbjLNRL7 (ORCPT + 99 others); Thu, 14 Dec 2023 12:11:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230344AbjLNRLj (ORCPT ); Thu, 14 Dec 2023 12:11:39 -0500 Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5659BD43 for ; Thu, 14 Dec 2023 09:11:44 -0800 (PST) Received: by mail-qt1-x830.google.com with SMTP id d75a77b69052e-42542b1ed5dso62934621cf.1 for ; Thu, 14 Dec 2023 09:11:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1702573903; x=1703178703; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=mxUe6pUm6l0Cf+AsboBSISPAY2O3/ATI4QcyYk76Qaw=; b=ZuWmgDP2RPaxeToLI4GdiMy0n9VxgleimNq2Ve4UflR5VmtBU/ObovLmEhb8iV6WCX q08ZFqn9iBh0SPTx7tbVeCSrRH8RtzvpeMMo6TU06LNRldgfAavMySNRqRL9s7ZULgGx PNOprntlR9j+rTVN1mELsQ8Bv+ZniUijGGoeLamYbtTPesAS9ZejDsxiLcpJScYpIvJR W8Tiw4m0H8xjZVbglShYPEol5K7oNPazMeni8ExOpd7QHTmFCku4cf2QMCD/EhKynml+ qEcOp1Aj1ihSJHusQt7/SeHiDKo9rJspkAgJzUlZ5Lw0/BvlY2xR/fnonBUrYWZWzcjz jNzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702573903; x=1703178703; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mxUe6pUm6l0Cf+AsboBSISPAY2O3/ATI4QcyYk76Qaw=; b=b3cipJnPxuIIOxOncC9v5SesX8/b4hSbVILeHYlVvpmabfkq9vrtZTEq4ZfjWgcLNB HL5osy/eY7e0aWA0CBjk2zd+sBidr6ZQikjIEwJCkpuGBKy1otQhJJ8NPqjZEoal8i76 wZqMgR9cRK0xFzIOqV1cUwPRKYo9aiH5QZP6LopgMMPQXpX8zCjVgz5u8RTzUPY/ZhQB wRXcpowBdk2f+JZgXUPPT2z0ZLMkmGTVu6b0sqz7IFQfJ3z7XosKpGz03LGftORlUz9g LhrNC6zAD929GLubNKBWsnGyOAoqgR61FOZ8PsY9l4FQNSvSBVmQTGz7uhhFzUBNjdnE aXhg== X-Gm-Message-State: AOJu0YzF9QvW8mt7XoSogdxrfj71mUFw5uKDtGuVpf5OiicNKtkPkto6 KhOPVqRaQ1Ccxbybyq4j61bMuA== X-Received: by 2002:ac8:7d52:0:b0:425:4043:96e2 with SMTP id h18-20020ac87d52000000b00425404396e2mr15224897qtb.111.1702573903122; Thu, 14 Dec 2023 09:11:43 -0800 (PST) Received: from localhost ([2620:10d:c091:400::5:a0a6]) by smtp.gmail.com with ESMTPSA id e7-20020ac845c7000000b00418122186ccsm5911083qto.12.2023.12.14.09.11.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 09:11:42 -0800 (PST) Date: Thu, 14 Dec 2023 12:11:37 -0500 From: Johannes Weiner To: Minchan Kim Cc: Chris Li , Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Kairui Song , Zhongkun He Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling Message-ID: <20231214171137.GA261942@cmpxchg.org> References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231209034229.GA1001962@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Thu, 14 Dec 2023 09:12:53 -0800 (PST) On Mon, Dec 11, 2023 at 02:55:43PM -0800, Minchan Kim wrote: > On Fri, Dec 08, 2023 at 10:42:29PM -0500, Johannes Weiner wrote: > > On Fri, Dec 08, 2023 at 03:55:59PM -0800, Chris Li wrote: > > > I can give you three usage cases right now: > > > 1) Google producting kernel uses SSD only swap, it is currently on > > > pilot. This is not expressible by the memory.zswap.writeback. You can > > > set the memory.zswap.max = 0 and memory.zswap.writeback = 1, then SSD > > > backed swapfile. But the whole thing feels very clunky, especially > > > what you really want is SSD only swap, you need to do all this zswap > > > config dance. Google has an internal memory.swapfile feature > > > implemented per cgroup swap file type by "zswap only", "real swap file > > > only", "both", "none" (the exact keyword might be different). running > > > in the production for almost 10 years. The need for more than zswap > > > type of per cgroup control is really there. > > > > We use regular swap on SSD without zswap just fine. Of course it's > > expressible. > > > > On dedicated systems, zswap is disabled in sysfs. On shared hosts > > where it's determined based on which workload is scheduled, zswap is > > generally enabled through sysfs, and individual cgroup access is > > controlled via memory.zswap.max - which is what this knob is for. > > > > This is analogous to enabling swap globally, and then opting > > individual cgroups in and out with memory.swap.max. > > > > So this usecase is very much already supported, and it's expressed in > > a way that's pretty natural for how cgroups express access and lack of > > access to certain resources. > > > > I don't see how memory.swap.type or memory.swap.tiers would improve > > this in any way. On the contrary, it would overlap and conflict with > > existing controls to manage swap and zswap on a per-cgroup basis. > > > > > 2) As indicated by this discussion, Tencent has a usage case for SSD > > > and hard disk swap as overflow. > > > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmail.com/ > > > +Kairui > > > > Multiple swap devices for round robin or with different priorities > > aren't new, they have been supported for a very, very long time. So > > far nobody has proposed to control the exact behavior on a per-cgroup > > basis, and I didn't see anybody in this thread asking for it either. > > > > So I don't see how this counts as an obvious and automatic usecase for > > memory.swap.tiers. > > > > > 3) Android has some fancy swap ideas led by those patches. > > > https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan@kernel.org/ > > > It got shot down due to removal of frontswap. But the usage case and > > > product requirement is there. > > > +Minchan > > > > This looks like an optimization for zram to bypass the block layer and > > hook directly into the swap code. Correct me if I'm wrong, but this > > doesn't appear to have anything to do with per-cgroup backend control. > > Hi Johannes, > > I haven't been following the thread closely, but I noticed the discussion > about potential use cases for zram with memcg. > > One interesting idea I have is to implement a swap controller per cgroup. > This would allow us to tailor the zram swap behavior to the specific needs of > different groups. > > For example, Group A, which is sensitive to swap latency, could use zram swap > with a fast compression setting, even if it sacrifices some compression ratio. > This would prioritize quick access to swapped data, even if it takes up more space. > > On the other hand, Group B, which can tolerate higher swap latency, could benefit > from a slower compression setting that achieves a higher compression ratio. > This would maximize memory efficiency at the cost of slightly slower data access. > > This approach could provide a more nuanced and flexible way to manage swap usage > within different cgroups. That makes sense to me. It sounds to me like per-cgroup swapfiles would be the easiest solution to this. Then you can create zram devices with different configurations and assign them to individual cgroups. This would also apply to Kairu's usecase: assign zrams and hdd backups as needed on a per-cgroup basis. In addition, it would naturally solve scalability and isolation problems when multiple containers would otherwise be hammering on the same swap backends and locks. It would also only require one, relatively simple new interface, such as a cgroup parameter to swapon(). That's highly preferable over a complex configuration file like memory.swap.tiers that needs to solve all sorts of visibility and namespace issues and duplicate the full configuration interface of every backend in some new, custom syntax.