Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp1555651rdb; Sun, 19 Nov 2023 00:24:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IEXckr/VNVhXdE/3vzSDrsmCIJrhbQHr7+ai7OH/8QjGMIR0OTFhSHHd5nPNs3tlZLzPVDk X-Received: by 2002:a05:6808:120d:b0:3ad:f86a:878e with SMTP id a13-20020a056808120d00b003adf86a878emr5576323oil.13.1700382260751; Sun, 19 Nov 2023 00:24:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700382260; cv=none; d=google.com; s=arc-20160816; b=w0TnWejJWimvVrwztdfncEG2u915YLY55c2jI2ZNRhKnlqwk9vYfNir8T+ugz9kcmZ L+apHreC+3qrRCFJpUWyGUR/LA8RXCjoufGdfIsU2gn0D7x7A0ybN8XcBYcLMf9tUTyq jz3saJMvip18xnhWMXrIqh0/R0CwI4LjWIvgT1Cxs1eQ4rqqtuTQ5r7hyD/WZkPMqagC QUquRQ4fmOe7z69HNC5cKm8fbj/610fR24OghUP+Wzja/83SkX4y1M2TciUuZLhZpm2Z WxGyRsxlWYy31oqjSwnHKUABT+HiJmKGZw1upZohWK8tH3PSBXEuNMixdaeMV33dhGmG Q9HQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Ir59+ODpi3UfExU2mSknCfpLaAB+LTgdGg5EIrCW0/0=; fh=sRVozG699yxGhEfTTlgokEZDcURq4BK+eGYnTInTOnY=; b=feboW8cZFy97f9Lo4F4T6D9QP0C612U4PRU/IZKRbjg2zdKh5EAwpyimKLIkth2PjW FZUtTe40mrOyWAK0CoaBhIYtvvaAnlwMBUgcWEGgr5t35ZZJHrHdDjV2LrQY0R9kUdLY VwOMMzoQm6DrPfG+RueYaUOH6c5qzk9Ug1jfVtgokkVwHGvRZU3X+p3LFB2p2GNs+X5j yDC9jYKAKcK3X/GdWBIGKuOUELG44p6iKkOkWnSByc8oeNQvHaDYoaD+0hbPafoAeTEZ NeE5JAoZXwiL9QhGedrOSIIogYSzZDGzo6LQeHmJzPajey8FujZvOwpvxg4rKl5mVJjR dC6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gQrg63Yf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id h2-20020a636c02000000b005b7d9aace98si5448718pgc.109.2023.11.19.00.24.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 00:24:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gQrg63Yf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id C3D3E806034C; Sun, 19 Nov 2023 00:24:17 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229659AbjKSIYI (ORCPT + 99 others); Sun, 19 Nov 2023 03:24:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229454AbjKSIYG (ORCPT ); Sun, 19 Nov 2023 03:24:06 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F54A12D for ; Sun, 19 Nov 2023 00:24:02 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0F96DC43397 for ; Sun, 19 Nov 2023 08:24:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700382242; bh=l6wzEvSiwERrHXBTWg1Y8xIWZNoPbKJW6k4RmsePHHc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=gQrg63YffUTH3IPzixKPkxn1LijWTEsdL0kt3J+D1lRKHhGpI224qMp2MBcopLdnH 18hVOiI3ySgwsiLnypTmMPjDE0q6JDoBYlW2qOtXM9lpsKy0VhoL4JlGbgl6xJlmpB wECVCTnl5Faziq269KLxq1rRh8ad6vVUqc4ytzEWkSzRxYK6W9HkEJBKV0l3g4DPP4 LCbm2LX/4CLQD731kshiW4zaKF2dFqmiMg3R3mZ3jecmznCBmfzxnAfeiKy/qVLkBV hW/Ur1kRmX1tbrUuFgU2tuhrwrBB8cFRKNIBuQYYPX2H1UAklw5mo7t4X/IzWEKEyj rxCPD2TJlKyGA== Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-53fbf2c42bfso2660837a12.3 for ; Sun, 19 Nov 2023 00:24:02 -0800 (PST) X-Gm-Message-State: AOJu0YwRaQbp7BP+neEY6jEuxILOjCHKR9WNxMkJGr/VCYRdftNkK6Nt vgyBxKG8wZNGC1KhO7kQK0lYZ5ywin/QyLbHZDrS4w== X-Received: by 2002:a17:90b:1e02:b0:27d:7887:ddc5 with SMTP id pg2-20020a17090b1e0200b0027d7887ddc5mr4927636pjb.32.1700382241288; Sun, 19 Nov 2023 00:24:01 -0800 (PST) MIME-Version: 1.0 References: <20231113130601.3350915-1-hezhongkun.hzk@bytedance.com> In-Reply-To: From: Chris Li Date: Sun, 19 Nov 2023 00:23:49 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios To: Zhongkun He Cc: Yosry Ahmed , Andrew Morton , Johannes Weiner , Nhat Pham , Seth Jennings , Dan Streetman , Vitaly Wool , linux-mm , LKML , Ying Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sun, 19 Nov 2023 00:24:18 -0800 (PST) Hi Zhongkun, On Fri, Nov 17, 2023 at 5:46=E2=80=AFPM Zhongkun He wrote: > > Can you help me understand how much memory you can free from this > > patch? For example, are we talking about a few pages or a few GB? > > > > Where does the freed memory come from? > > If the memory comes from zswap entry struct. Due to the slab allocator > > fragmentation. It would take a lot of zswap entries to have meaningful > > memory reclaimed from the slab allocator. > > > > If the memory comes from the swap cached pages, that would be much > > more meaningful. But that is not what this patch is doing, right? > > > > Chris > > It's my bad for putting two cases together. The memory released in both > cases comes from zswap entry struct and zswap compressed page. Thanks for the clarification. Keep in mind that memory freeing from and zswap entry and zpool does not directly translate into page free. If the page has other none freed zswap entry or zsmalloc usage, those pages will not be free to the system. That is the fragmentation cost I was talking about. With this consideration, do you know many extra pages it can release back to the system by this patch in your usage case? If the difference is very small, it might not be worth the extra complexity to release those. > The original intention of this patch is to solve the problem that > shrink_work() fails to reclaim memory in two situations. > > For case (1), the zswap_writeback_entry() will failed for the > __read_swap_cache_async return NULL because the swap has been > freed but cached in swap_slots_cache, so the memory come from > the zswap entry struct and compressed page. In those cases, if we drop the swap_slots_cache, it will also free those zswap entries and compressed pages (zpool), right? > Count =3D SWAP_BATCH * ncpu. That is the upper limit. Not all CPUs have swap batches fully loaded. > Solution: move the zswap_invalidate() out of batches, free it once the sw= ap > count equal to 0. Per previous discussion, this will have an impact on the swap_slot_cache behavior. We need some data points for cost benefit analysis. > For case (2), the zswap_writeback_entry() will failed for !page_was_allo= cated > because zswap_load will have two copies of the same page in memory > (compressed and uncompressed) after faulting in a page from zswap when > zswap_exclusive_loads disabled. The amount of memory is greater but depen= ds > on the usage. That is basically disable the future swap out page IO write optimization that skip the write if the page hasn't changed. If the system are low on memory, that is justifiable. Again, it seems we can have a pass to drop the compressed memory if the swap count is zero (and mark page dirty). > > Why do we need to release them? > Consider this scenario,there is a lot of data cached in memory and zswap, > hit the limit=EF=BC=8Cand shrink_worker will fail. The new coming data wi= ll be written Yes, the shrink_worker will need to allocate a page to store uncompressed data for write back. > directly to swap due to zswap_store failure. Should we free the last one > to store the latest one in zswap. The "last one" you mean the most recent zswap entry written into zswap? Well, you need to allocate a page to write it out, that is an async process= . Shrink the zpool now is kind of too late already. > According to the previous discussion, the writeback is inevitable. > So I want to make zswap_exclusive_loads_enabled the default behavior > or make it the only way to do zswap loads. It only makes sense when We need some data point for how often we swap it out to zswap again, where the zswap out write can be saved by using the existing compressed dat= a. It is totally possible this page IO write out optimization is not worthwhile for zswap. We need some data to support that claim. > the page is read and no longer dirty. If the page is read frequently, it > should stay in cache rather than zswap. The benefit of doing this is > very small, i.e. two copies of the same page in memory. If the benefit of doing this is very small, that seems to be the argument against this patch? Again we need some data points for cost and benefit analysis. Chris