Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1113247rdb; Wed, 6 Dec 2023 08:57:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IG4nMqXuaOh8h7O94rUZK9T5PWffeFRGwk8wr5uCOk1pxGCjFZ5bDkEuiIF6HtPww+qvLCd X-Received: by 2002:a17:903:2808:b0:1d0:75d8:6ce2 with SMTP id kp8-20020a170903280800b001d075d86ce2mr799642plb.82.1701881837026; Wed, 06 Dec 2023 08:57:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701881837; cv=none; d=google.com; s=arc-20160816; b=O6xFu7vtJr0VW5u77yteIWicPVB+M0DcAnT7nEUNMbYD5Y3zUg1yllL3n9RPKqve3G nCKu4evOc3rlqIIwo/rnFu5jSlp5kACUMdSMAZRIFQMPasrQa+2axQoe5tK0glzOsDeX 4TyfwrWFJmNfez/s0M56/HgFk7gUpOjxoPqqXdgtZKlH+KxitpeDmEiG7gw9ACUmwID3 BodSox6Z8fkQ0eb49MNNnpFDJwr0+EVDqOBGprNBaAk4iaD21OfqOtzQ17Ef/2qSE9cg kH+kWoBBLuJhbuYqkla1B4zGKnNlqKEQLtM9Por38Ij+/hGrb7XqJ+8TYVHLJSRj/tU2 kmvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=yIrgXBCmrJsEmx/2lceqqp9Xwm1JeHuqctE9dAlll+0=; fh=p+bph8ZSkgv7DqU3Li1X/m7icDVogsyR+I1yxRl+emw=; b=J4rbd1WruH/xeLkcgKhbEWWD+CyWAfC+3TywewM0iyItPumHT23TCohDA3vJ4STEqi PRV8v42wmoiKi0bFZC5LEWv5D8+Bkgq/FHyausFgLLTao+8L+icBOG+qtZRmR37kzlQo tv08RTw723BNfzPuKrbHDD/SmzlJgF0neWZAknQ94aRzeMLDQDoG7qdq4BXOchdRWoJy 5dcg+dpgbObknk+m+JczBFG+TN28NC1UTHTCOv4CYYGWY8ZiTC0Mxh5R4/mAuVvDDaVJ /zqcTdOPgxuezDX5wj/HPTQJ+/Fbpx8BsvnCHeM0JhRmAGm5oWKt3y9PHUJi9NM4ncfi 7KnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=aU35HanS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id a10-20020a170902ee8a00b001d0176127a9si69535pld.307.2023.12.06.08.57.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 08:57:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=aU35HanS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id A5BA88080D63; Wed, 6 Dec 2023 08:57:13 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442621AbjLFQ4u (ORCPT + 99 others); Wed, 6 Dec 2023 11:56:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378531AbjLFQ4t (ORCPT ); Wed, 6 Dec 2023 11:56:49 -0500 Received: from mail-il1-x131.google.com (mail-il1-x131.google.com [IPv6:2607:f8b0:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A541B2; Wed, 6 Dec 2023 08:56:55 -0800 (PST) Received: by mail-il1-x131.google.com with SMTP id e9e14a558f8ab-35d725ac060so12692295ab.2; Wed, 06 Dec 2023 08:56:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701881815; x=1702486615; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yIrgXBCmrJsEmx/2lceqqp9Xwm1JeHuqctE9dAlll+0=; b=aU35HanSNxKXTrTQ4kjEHWAZLFS/tBF3j+fPpW+VrMq78Cqd6wx6n1ee3jUw3O1rr9 ALTKgvLZBjgCF7Nde2nUL0mYEAbOqVQYzijEM5FT8tVfvuDICeKsIWmukHSFh8ZzcsYB TTfQgRw83idj5raYHzvEgULsDnpjX2V5EAF1oXFxvL9wHxRDKS8hW2v9vj2QfyxdlKBf rsjA5qadoS4E07v3YnWJ+MaY+3FkG25dIhXpm5ev3DmJdUr2vvSV3olRavFgr3YjxF74 C+oFhmdbQl3wR2hCKKOnQSimPiAOHWIlcpR6ktaJ+EZhJXrgozMTAQOd16XCVibcSEd2 10UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701881815; x=1702486615; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yIrgXBCmrJsEmx/2lceqqp9Xwm1JeHuqctE9dAlll+0=; b=rduf33fc/QRBnPHX5In1m5A+wNchwJUEESGduvgSWou8BD8oBYAo/jZYE3K2BYsVk7 e8b1QV3IDa+CtJKZC4JqA+OOvohykW7wFPOO8FzF9+7SPbo/eoMFFv9bkmCSO2DHHb/N bj+sW2EytpyNkTVyVT2lNcHBr9bR+kvUabPiNywxh8uEOD2/nb5J4lhZWUeqL85WXngJ m8nHlYK9FgQlLeW4/syA3FOeWE/txNzPIO+pwvAaKzkZA8mh8gOxsli/GFnplZvOtsPr Zkii0SjRc/mbs6VJF02N6YqLRG1yvRd+B3QiGF1MuyX97Dia25a0DhSlORLHnqEDjv83 xe9g== X-Gm-Message-State: AOJu0YzoMkDakOQh5a6D6OwghWcSGJ3zSJCeDp3eb/C9EimcaWJ9/luy op5n2ZHrdE/KMGLUpXzPTtf4mN2DFLVavuB9jD8= X-Received: by 2002:a05:6e02:1050:b0:35d:59a2:1281 with SMTP id p16-20020a056e02105000b0035d59a21281mr1435226ilj.45.1701881814603; Wed, 06 Dec 2023 08:56:54 -0800 (PST) MIME-Version: 1.0 References: <20231130194023.4102148-1-nphamcs@gmail.com> <20231130194023.4102148-7-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Wed, 6 Dec 2023 08:56:43 -0800 Message-ID: Subject: Re: [PATCH v8 6/6] zswap: shrinks zswap pool based on memory pressure To: Yosry Ahmed Cc: Chengming Zhou , akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, shuah@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 06 Dec 2023 08:57:13 -0800 (PST) On Tue, Dec 5, 2023 at 10:00=E2=80=AFPM Yosry Ahmed = wrote: > > [..] > > > @@ -526,6 +582,102 @@ static struct zswap_entry *zswap_entry_find_get= (struct rb_root *root, > > > return entry; > > > } > > > > > > +/********************************* > > > +* shrinker functions > > > +**********************************/ > > > +static enum lru_status shrink_memcg_cb(struct list_head *item, struc= t list_lru_one *l, > > > + spinlock_t *lock, void *arg); > > > + > > > +static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, > > > + struct shrink_control *sc) > > > +{ > > > + struct lruvec *lruvec =3D mem_cgroup_lruvec(sc->memcg, NODE_DAT= A(sc->nid)); > > > + unsigned long shrink_ret, nr_protected, lru_size; > > > + struct zswap_pool *pool =3D shrinker->private_data; > > > + bool encountered_page_in_swapcache =3D false; > > > + > > > + nr_protected =3D > > > + atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_p= rotected); > > > + lru_size =3D list_lru_shrink_count(&pool->list_lru, sc); > > > + > > > + /* > > > + * Abort if the shrinker is disabled or if we are shrinking int= o the > > > + * protected region. > > > + * > > > + * This short-circuiting is necessary because if we have too ma= ny multiple > > > + * concurrent reclaimers getting the freeable zswap object coun= ts at the > > > + * same time (before any of them made reasonable progress), the= total > > > + * number of reclaimed objects might be more than the number of= unprotected > > > + * objects (i.e the reclaimers will reclaim into the protected = area of the > > > + * zswap LRU). > > > + */ > > > + if (!zswap_shrinker_enabled || nr_protected >=3D lru_size - sc-= >nr_to_scan) { > > > + sc->nr_scanned =3D 0; > > > + return SHRINK_STOP; > > > + } > > > + > > > + shrink_ret =3D list_lru_shrink_walk(&pool->list_lru, sc, &shrin= k_memcg_cb, > > > + &encountered_page_in_swapcache); > > > + > > > + if (encountered_page_in_swapcache) > > > + return SHRINK_STOP; > > > + > > > + return shrink_ret ? shrink_ret : SHRINK_STOP; > > > +} > > > + > > > +static unsigned long zswap_shrinker_count(struct shrinker *shrinker, > > > + struct shrink_control *sc) > > > +{ > > > + struct zswap_pool *pool =3D shrinker->private_data; > > > + struct mem_cgroup *memcg =3D sc->memcg; > > > + struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(sc= ->nid)); > > > + unsigned long nr_backing, nr_stored, nr_freeable, nr_protected; > > > + > > > +#ifdef CONFIG_MEMCG_KMEM > > > + cgroup_rstat_flush(memcg->css.cgroup); > > > + nr_backing =3D memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_S= HIFT; > > > + nr_stored =3D memcg_page_state(memcg, MEMCG_ZSWAPPED); > > > +#else > > > + /* use pool stats instead of memcg stats */ > > > + nr_backing =3D get_zswap_pool_size(pool) >> PAGE_SHIFT; > > > + nr_stored =3D atomic_read(&pool->nr_stored); > > > +#endif > > > + > > > + if (!zswap_shrinker_enabled || !nr_stored) > > When I tested with this series, with !zswap_shrinker_enabled in the def= ault case, > > I found the performance is much worse than that without this patch. > > > > Testcase: memory.max=3D2G, zswap enabled, kernel build -j32 in a tmpfs = directory. > > > > The reason seems the above cgroup_rstat_flush(), caused much rstat lock= contention > > to the zswap_store() path. And if I put the "zswap_shrinker_enabled" ch= eck above > > the cgroup_rstat_flush(), the performance become much better. > > > > Maybe we can put the "zswap_shrinker_enabled" check above cgroup_rstat_= flush()? > > Yes, we should do nothing if !zswap_shrinker_enabled. We should also > use mem_cgroup_flush_stats() here like other places unless accuracy is > crucial, which I doubt given that reclaim uses > mem_cgroup_flush_stats(). Ah, good points on both suggestions. We should not do extra work for non-user. And, this is a best-effort approximation of the memory saving factor, so as long as it is not *too* far off I think it's acceptable. > > mem_cgroup_flush_stats() has some thresholding to make sure we don't > do flushes unnecessarily, and I have a pending series in mm-unstable > that makes that thresholding per-memcg. Keep in mind that adding a > call to mem_cgroup_flush_stats() will cause a conflict in mm-unstable, > because the series there adds a memcg argument to > mem_cgroup_flush_stats(). That should be easily amenable though, I can > post a fixlet for my series to add the memcg argument there on top of > users if needed. Hmm so how should we proceed from here? How about this: a) I can send a fixlet to move the enablement check above the stats flushing + use mem_cgroup_flush_stats b) Then maybe, you can send a fixlet to update this new callsite? Does that sound reasonable? > > > > > Thanks! > > > > > + return 0; > > > + > > > + nr_protected =3D > > > + atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_p= rotected); > > > + nr_freeable =3D list_lru_shrink_count(&pool->list_lru, sc); > > > + /* > > > + * Subtract the lru size by an estimate of the number of pages > > > + * that should be protected. > > > + */ > > > + nr_freeable =3D nr_freeable > nr_protected ? nr_freeable - nr_p= rotected : 0; > > > + > > > + /* > > > + * Scale the number of freeable pages by the memory saving fact= or. > > > + * This ensures that the better zswap compresses memory, the fe= wer > > > + * pages we will evict to swap (as it will otherwise incur IO f= or > > > + * relatively small memory saving). > > > + */ > > > + return mult_frac(nr_freeable, nr_backing, nr_stored); > > > +} > > > + > > > +static void zswap_alloc_shrinker(struct zswap_pool *pool) > > > +{ > > > + pool->shrinker =3D > > > + shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWA= RE, "mm-zswap"); > > > + if (!pool->shrinker) > > > + return; > > > + > > > + pool->shrinker->private_data =3D pool; > > > + pool->shrinker->scan_objects =3D zswap_shrinker_scan; > > > + pool->shrinker->count_objects =3D zswap_shrinker_count; > > > + pool->shrinker->batch =3D 0; > > > + pool->shrinker->seeks =3D DEFAULT_SEEKS; > > > +} > > > + > > > /********************************* > > > * per-cpu code > > > **********************************/ > [..]