Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp492973rdh; Thu, 23 Nov 2023 09:19:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IGFeTE7Lu16IeneA2IhzSrcinnHXYIhNdHAXO+wfn+C4gKizN2t53y3b5x3XFlpuPbaeA5m X-Received: by 2002:a05:6a00:4396:b0:6cb:84d0:d003 with SMTP id bt22-20020a056a00439600b006cb84d0d003mr127783pfb.23.1700759995849; Thu, 23 Nov 2023 09:19:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700759995; cv=none; d=google.com; s=arc-20160816; b=qwBVm9EIby911h3XROVtQvNGX7kNcAG27rqG3qqH2Tp1W4wAjDEjWqBGKnG3PRqV4C f6UAFvtxoxg0UWmDILgzmzs7J0WQGAVbAZXrbq2NamCq0HDr7lzGMFc4OAi6KAdcy+IT g87rcWwdUyIAG2GgGVr6CKZkG/NCPe1RU3I5RPdMC2qNN1qznSdVmvRPgH3OZQ9bdghO sMFp6iWIyEgMju5DPALl97l+lO7ix7/3TT7+kez0bUpN/bPu1B7Hoe6IRlOGatcrFdjc ghUXDJNDY08R4r8v+pECgK4bgVhf0T1MA4QLjDFoV/sioufWGCnp2om8ea4EC3ucIR9p Sfbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=fPpJ2QrJUEGgW/k94SDUjYvfNCM1OcypubpbYnCTs88=; fh=r4u/qatWUL3LYNgbV6w30HZvvlNFM3Gt9t9/hAW4tr8=; b=fCADdFfnmTrgTONWzDPycTWhGUvwYc548U+ivSo/f1uYXKfP+WRI+x7INhhcecRsgT C+CIPUwGQ4urqVUoeWHesMe82kFwXvytd/Kume0eWl5257E6+TbxVsxGIpr5Mrv5lJk4 jddfd9PbKh78IcW4BR+GjahnzAHjAw2fVJvwDpGufoGk8cAAjyZetnpYGWrNGp5p6KVz BJOKl57J7QYySJAeGK9wJ0Uh61IAXufxKnMogWzyTdlWmOkmi14ZQaFlMHJ227W2G+0i esCQQtx2q1PTHXr0a5TKJx1pnNlbjLfxzP7ZmINznQG+yyrODueLZ0N5YYZ1UfR1amAx WFAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ZuPn6na8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id a38-20020a056a001d2600b006cbf1b8bf38si402409pfx.303.2023.11.23.09.19.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Nov 2023 09:19:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ZuPn6na8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 7E3948338421; Thu, 23 Nov 2023 09:19:52 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229686AbjKWRTY (ORCPT + 99 others); Thu, 23 Nov 2023 12:19:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229960AbjKWRTM (ORCPT ); Thu, 23 Nov 2023 12:19:12 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9FF21725 for ; Thu, 23 Nov 2023 09:19:15 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9296FC116A4 for ; Thu, 23 Nov 2023 17:19:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700759955; bh=IdgHI2lDjcmt4qEFhrLPvoD8tNuqkVTcDgzNfvZ85sc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ZuPn6na8swcN5XZ87KuRw8VlUEvGyFw4xJWJoFCixX95Ila96E2rGljhlyRXKIpU6 XPHCeBAiqw8KqqxLoD2ZURqP4BB2cq9o2XPeI9dwYYx6MOy0AI0PwRbBcuTjxlzHFf IeY1u/FostorSZJYlmbbIFp1QI2c+3JRYJhU8nywhJ+Th9L1qDHc32QNaiTezSdYEC 4Y/TsxpwtntcLo7g+1wxrHO9bSyMgm4n5YWULvHTpkVcjsVSlnDk73qotD5LH5iDzw 745G/cvwvkWhLbM5fVF3+2qlkKeTY+GIjhcjeUs+lE+rozOfbJCowoBTwOMxlmqUm6 s2e7GiMNzqDPA== Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-28396255b81so850856a91.0 for ; Thu, 23 Nov 2023 09:19:15 -0800 (PST) X-Gm-Message-State: AOJu0YxjLIt/qPjbiLI2KMen5ULIi/zGyaVng+IuoUk6p5pLiWUCQKNs s5axZxRwIJrqQAZVL5rwkYC6CIqRTI1GRYMrDWvsEA== X-Received: by 2002:a17:90b:4b4f:b0:27d:8ad:69f9 with SMTP id mi15-20020a17090b4b4f00b0027d08ad69f9mr144759pjb.2.1700759954942; Thu, 23 Nov 2023 09:19:14 -0800 (PST) MIME-Version: 1.0 References: <20231121090624.1814733-1-liushixin2@huawei.com> In-Reply-To: <20231121090624.1814733-1-liushixin2@huawei.com> From: Chris Li Date: Thu, 23 Nov 2023 09:19:00 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space To: Liu Shixin Cc: Yu Zhao , Andrew Morton , Yosry Ahmed , Huang Ying , Sachin Sant , Michal Hocko , Johannes Weiner , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.3 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 23 Nov 2023 09:19:52 -0800 (PST) Hi Shixin, On Tue, Nov 21, 2023 at 12:08=E2=80=AFAM Liu Shixin = wrote: > > When spaces of swap devices are exhausted, only file pages can be > reclaimed. But there are still some swapcache pages in anon lru list. > This can lead to a premature out-of-memory. > > The problem is found with such step: > > Firstly, set a 9MB disk swap space, then create a cgroup with 10MB > memory limit, then runs an program to allocates about 15MB memory. > > The problem occurs occasionally, which may need about 100 times [1]. Just out of my curiosity, in your usage case, how much additional memory in terms of pages or MB can be freed by this patch, using current code as base line? Does the swap cache page reclaimed in swapcache_only mode, all swap count drop to zero, and the only reason to stay in swap cache is to void page IO write if we need to swap that page out again? > Fix it by checking number of swapcache pages in can_reclaim_anon_pages(). > If the number is not zero, return true and set swapcache_only to 1. > When scan anon lru list in swapcache_only mode, non-swapcache pages will > be skipped to isolate in order to accelerate reclaim efficiency. Here you said non-swapcache will be skipped if swapcache_only =3D=3D 1 > > However, in swapcache_only mode, the scan count still increased when scan > non-swapcache pages because there are large number of non-swapcache pages > and rare swapcache pages in swapcache_only mode, and if the non-swapcache Here you suggest non-swapcache pages will also be scanned even when swapcache_only =3D=3D 1. It seems to contradict what you said above. I feel that I am missing something here. > is skipped and do not count, the scan of pages in isolate_lru_folios() ca= n Can you clarify which "scan of pages", are those pages swapcache pages or non-swapcache pages? > eventually lead to hung task, just as Sachin reported [2]. > > By the way, since there are enough times of memory reclaim before OOM, it > is not need to isolate too much swapcache pages in one times. > > [1]. https://lore.kernel.org/lkml/CAJD7tkZAfgncV+KbKr36=3DeDzMnT=3D9dZOT0= dpMWcurHLr6Do+GA@mail.gmail.com/ > [2]. https://lore.kernel.org/linux-mm/CAJD7tkafz_2XAuqE8tGLPEcpLngewhUo= =3D5US14PAtSM9tLBUQg@mail.gmail.com/ > > Signed-off-by: Liu Shixin > Tested-by: Yosry Ahmed > Reviewed-by: "Huang, Ying" > Reviewed-by: Yosry Ahmed > --- > v9->v10: Use per-node swapcache suggested by Yu Zhao. > v8->v9: Move the swapcache check after can_demote() and refector > can_reclaim_anon_pages() a bit. > v7->v8: Reset swapcache_only at the beginning of can_reclaim_anon_pages()= . > v6->v7: Reset swapcache_only to zero after there are swap spaces. > v5->v6: Fix NULL pointing derefence and hung task problem reported by Sac= hin. > > mm/vmscan.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 49 insertions(+), 1 deletion(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 506f8220c5fe..1fcc94717370 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -136,6 +136,9 @@ struct scan_control { > /* Always discard instead of demoting to lower tier memory */ > unsigned int no_demotion:1; > > + /* Swap space is exhausted, only reclaim swapcache for anon LRU *= / > + unsigned int swapcache_only:1; > + > /* Allocation order */ > s8 order; > > @@ -308,10 +311,36 @@ static bool can_demote(int nid, struct scan_control= *sc) > return true; > } > > +#ifdef CONFIG_SWAP > +static bool can_reclaim_swapcache(struct mem_cgroup *memcg, int nid) > +{ > + struct pglist_data *pgdat =3D NODE_DATA(nid); > + unsigned long nr_swapcache; > + > + if (!memcg) { > + nr_swapcache =3D node_page_state(pgdat, NR_SWAPCACHE); > + } else { > + struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat)= ; > + > + nr_swapcache =3D lruvec_page_state_local(lruvec, NR_SWAPC= ACHE); > + } > + > + return nr_swapcache > 0; > +} > +#else > +static bool can_reclaim_swapcache(struct mem_cgroup *memcg, int nid) > +{ > + return false; > +} > +#endif > + > static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, > int nid, > struct scan_control *sc) > { > + if (sc) > + sc->swapcache_only =3D 0; > + Minor nitpick. The sc->swapcache_only is first set to 0 then later set to 1. Better use a local variable then write to sc->swapcache_only in one go. If the scan_control has more than one thread accessing it, the threads can see the flicker of 0->1 change. I don't think that is the case in our current code, sc is created on stack. There are other minor benefits as The "if (sc) test" only needs to be done once, one store instruction. Chris > if (memcg =3D=3D NULL) { > /* > * For non-memcg reclaim, is there > @@ -330,7 +359,17 @@ static inline bool can_reclaim_anon_pages(struct mem= _cgroup *memcg, > * > * Can it be reclaimed from this node via demotion? > */ > - return can_demote(nid, sc); > + if (can_demote(nid, sc)) > + return true; > + > + /* Is there any swapcache pages to reclaim in this node? */ > + if (can_reclaim_swapcache(memcg, nid)) { > + if (sc) > + sc->swapcache_only =3D 1; > + return true; > + } > + > + return false; > } > > /* > @@ -1642,6 +1681,15 @@ static unsigned long isolate_lru_folios(unsigned l= ong nr_to_scan, > */ > scan +=3D nr_pages; > > + /* > + * Count non-swapcache too because the swapcache pages ma= y > + * be rare and it takes too much times here if not count > + * the non-swapcache pages. > + */ > + if (unlikely(sc->swapcache_only && !is_file_lru(lru) && > + !folio_test_swapcache(folio))) > + goto move; > + > if (!folio_test_lru(folio)) > goto move; > if (!sc->may_unmap && folio_mapped(folio)) > -- > 2.25.1 > >