Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp6170629rdb; Thu, 14 Dec 2023 10:04:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IE+zlbj3w3lL4Fm8OTJzjLH8Y6J7SWqNJ8k/z+3lGkQcv0jwvUiYXNZTfjr4lSwMrLDhhcW X-Received: by 2002:a05:6a20:8410:b0:190:eb70:c34c with SMTP id c16-20020a056a20841000b00190eb70c34cmr9810702pzd.72.1702577041179; Thu, 14 Dec 2023 10:04:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702577041; cv=none; d=google.com; s=arc-20160816; b=shLdZhBFir6o1l1cmy6Zqci6xfvI+51wIsisG1sRyT6s1Q6DGxwpmtZQqw0CB4A/4d 1mQiaDG8ZLM4df1aqWRdLe9HbKqiXWCuAFr9fH/012EQMdKt9UbnsQqIL1DLtkvC5yzZ iU2ZhjiKPfEe2w0HDq69XrhxRHR6l70bd4XzQbPnxpPzhtd3W7OOyv0AJjqZsFTmistA naMxd60xNm3mz0vpRQmaRwZ/dThKEt4ZUyI/SBTr8Wzt1qtcsFzf1gKvQOBUmxIq9ZdA 3/rRHNnmLyrIPVdOMN1La7yWd1IP0uYyKw5HOV9ZwJ/EFgtUmjSWMLhZhuuirrjZQB3j Kxqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=KV6ZB3o2DsttdivkrpbVEQo8ahsA1p8sk0He110IETY=; fh=SJVb6sYQC094RXCJ16W5q7wAuNOHfKpsgjXatOXqdO0=; b=BsFtMl1UYeWR6BKYDD4QxkxobS5HKZX504+/SG7dbhZiER1SErZB+jex+MkBofNavO gKym9JooT5/rn5xbApO6ZDo7TwAc4byawn6/Bbe4w63gHpRDFLJSowZ2CvPBrnmt+lFi HiSX7h78068uXv9DPWh6mL5DOP8cHuoQSLV9oK29LFT6PYSOAv4TPOL+pgb8/I8whY1J OweNcXcaRjnf9KiQQLGWwmfAdUbadFyG361SzbQbsRTLGIxbqk81IDPDGVmgy8epUsVe 8PXVCD81bQvZmi2N4YvhkUwaScPp552P2kgsw1aaBQn925F9MVDmahlPur45wyolHwxZ f3Jg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="F4/J1NV8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id h7-20020a63e147000000b005c626c40a55si11546181pgk.134.2023.12.14.10.04.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 10:04:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="F4/J1NV8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 9785F82CD613; Thu, 14 Dec 2023 10:03:57 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230332AbjLNSDn (ORCPT + 99 others); Thu, 14 Dec 2023 13:03:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229705AbjLNSDm (ORCPT ); Thu, 14 Dec 2023 13:03:42 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEEC510E for ; Thu, 14 Dec 2023 10:03:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702577027; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KV6ZB3o2DsttdivkrpbVEQo8ahsA1p8sk0He110IETY=; b=F4/J1NV8Z88pVOp4Fnhk7GLtmwL2Isn2VIiPmao+bgxn0c7/QckUja9TLx++eNUTSPsuwc cqsOvM4+Sre7tucS/3y6qu4J7CxJQl6FbZyaHPoper+Okv/S1d4GOM6xVlk/XIToEjETmS lmRb74FCRjywaA0p0FXLKGe5qisCawI= Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com [209.85.219.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-447-psXwVSLfO7uzR_aE_zUGjw-1; Thu, 14 Dec 2023 13:03:46 -0500 X-MC-Unique: psXwVSLfO7uzR_aE_zUGjw-1 Received: by mail-yb1-f199.google.com with SMTP id 3f1490d57ef6-dbcd44cdca3so2021240276.1 for ; Thu, 14 Dec 2023 10:03:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702577026; x=1703181826; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KV6ZB3o2DsttdivkrpbVEQo8ahsA1p8sk0He110IETY=; b=Ga4Bl2ZYa4GozD3b2iWtq5t+NBm+PSXRqBF8RjwzPSJF3Hcm0lY1aoYrNom5a4u4qn yPgznu/p7oEvSxs63nedek/FMaLV72vRPowHyOimj0PD+84kUI1wpyzbeGycyPwzBEGE o0errYP1gDRTbEauKHWaV93y3zEZcpiZQ8Gx5eaGVxAdG8/d0CT+rx02hHupw2xhqMLo PNoc6P0Ns//jW5OY/LMw/jsc6anwVa/mvDbkt8BpqIVLMWEeE8i6WVpg3Y2sbkCIGSUJ PU622cyJEIYM4aTcy0OTOogBz+y+0LtETd7yOxkKKlAZ12d4kslpiVZZbIQtw27Uck71 x0uQ== X-Gm-Message-State: AOJu0YzreboQwBYJoktIhCJ04ENY+P1GY+3iIdQW1TUABg7MsVTgO6qf 0ecZMCklyJCsmaw9ojpFdT4YR/V2oq0RWTVul4+CsjgdYnH0UCyFpiXI6OqnYEfbTHecQwAUiJy ykq7OEDP175arFQSw5KrjeNene39ssU6DjvgWCJiU X-Received: by 2002:a25:8b82:0:b0:da0:7d1e:6e0 with SMTP id j2-20020a258b82000000b00da07d1e06e0mr6217157ybl.20.1702577025475; Thu, 14 Dec 2023 10:03:45 -0800 (PST) X-Received: by 2002:a25:8b82:0:b0:da0:7d1e:6e0 with SMTP id j2-20020a258b82000000b00da07d1e06e0mr6217125ybl.20.1702577024965; Thu, 14 Dec 2023 10:03:44 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231209034229.GA1001962@cmpxchg.org> <20231214171137.GA261942@cmpxchg.org> In-Reply-To: From: Fabian Deutsch Date: Thu, 14 Dec 2023 19:03:28 +0100 Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Yu Zhao Cc: Johannes Weiner , Minchan Kim , Chris Li , Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Kairui Song , Zhongkun He Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 14 Dec 2023 10:03:57 -0800 (PST) On Thu, Dec 14, 2023 at 6:24=E2=80=AFPM Yu Zhao wrote: > > On Thu, Dec 14, 2023 at 10:11=E2=80=AFAM Johannes Weiner wrote: > > > > On Mon, Dec 11, 2023 at 02:55:43PM -0800, Minchan Kim wrote: > > > On Fri, Dec 08, 2023 at 10:42:29PM -0500, Johannes Weiner wrote: > > > > On Fri, Dec 08, 2023 at 03:55:59PM -0800, Chris Li wrote: > > > > > I can give you three usage cases right now: > > > > > 1) Google producting kernel uses SSD only swap, it is currently o= n > > > > > pilot. This is not expressible by the memory.zswap.writeback. You= can > > > > > set the memory.zswap.max =3D 0 and memory.zswap.writeback =3D 1, = then SSD > > > > > backed swapfile. But the whole thing feels very clunky, especiall= y > > > > > what you really want is SSD only swap, you need to do all this zs= wap > > > > > config dance. Google has an internal memory.swapfile feature > > > > > implemented per cgroup swap file type by "zswap only", "real swap= file > > > > > only", "both", "none" (the exact keyword might be different). run= ning > > > > > in the production for almost 10 years. The need for more than zsw= ap > > > > > type of per cgroup control is really there. > > > > > > > > We use regular swap on SSD without zswap just fine. Of course it's > > > > expressible. > > > > > > > > On dedicated systems, zswap is disabled in sysfs. On shared hosts > > > > where it's determined based on which workload is scheduled, zswap i= s > > > > generally enabled through sysfs, and individual cgroup access is > > > > controlled via memory.zswap.max - which is what this knob is for. > > > > > > > > This is analogous to enabling swap globally, and then opting > > > > individual cgroups in and out with memory.swap.max. > > > > > > > > So this usecase is very much already supported, and it's expressed = in > > > > a way that's pretty natural for how cgroups express access and lack= of > > > > access to certain resources. > > > > > > > > I don't see how memory.swap.type or memory.swap.tiers would improve > > > > this in any way. On the contrary, it would overlap and conflict wit= h > > > > existing controls to manage swap and zswap on a per-cgroup basis. > > > > > > > > > 2) As indicated by this discussion, Tencent has a usage case for = SSD > > > > > and hard disk swap as overflow. > > > > > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gm= ail.com/ > > > > > +Kairui > > > > > > > > Multiple swap devices for round robin or with different priorities > > > > aren't new, they have been supported for a very, very long time. So > > > > far nobody has proposed to control the exact behavior on a per-cgro= up > > > > basis, and I didn't see anybody in this thread asking for it either= . > > > > > > > > So I don't see how this counts as an obvious and automatic usecase = for > > > > memory.swap.tiers. > > > > > > > > > 3) Android has some fancy swap ideas led by those patches. > > > > > https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan= @kernel.org/ > > > > > It got shot down due to removal of frontswap. But the usage case = and > > > > > product requirement is there. > > > > > +Minchan > > > > > > > > This looks like an optimization for zram to bypass the block layer = and > > > > hook directly into the swap code. Correct me if I'm wrong, but this > > > > doesn't appear to have anything to do with per-cgroup backend contr= ol. > > > > > > Hi Johannes, > > > > > > I haven't been following the thread closely, but I noticed the discus= sion > > > about potential use cases for zram with memcg. > > > > > > One interesting idea I have is to implement a swap controller per cgr= oup. > > > This would allow us to tailor the zram swap behavior to the specific = needs of > > > different groups. > > > > > > For example, Group A, which is sensitive to swap latency, could use z= ram swap > > > with a fast compression setting, even if it sacrifices some compressi= on ratio. > > > This would prioritize quick access to swapped data, even if it takes = up more space. > > > > > > On the other hand, Group B, which can tolerate higher swap latency, c= ould benefit > > > from a slower compression setting that achieves a higher compression = ratio. > > > This would maximize memory efficiency at the cost of slightly slower = data access. > > > > > > This approach could provide a more nuanced and flexible way to manage= swap usage > > > within different cgroups. > > > > That makes sense to me. > > > > It sounds to me like per-cgroup swapfiles would be the easiest > > solution to this. > > Someone posted it about 10 years ago :) > https://lwn.net/Articles/592923/ > > +fdeutsch@redhat.com > Fabian recently asked me about its status. Thanks Yu! Yes, I was interested due to container use-cases. Now a few thoughts in this direction: - With swap per cgroup you loose the big "statistical" benefit of having swap on a node level. well, it depends on the size of the cgroup (i.e. system.slice is quite large). - With todays node level swap, and setting memory.swap.max=3D0 for all cgroups allows you toachieve a similar behavior (only opt-in cgroups will get swap). - the above approach however will still have a shared swap backend for all cgroups.