Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp1943361rdf; Sun, 5 Nov 2023 22:46:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IH7o1v9eF0dJ59Sdx6mB//sXBu+ZDp7FU2lzZoPryUcG/i9EJ3lOK8S6xisGb8BMSxyFcV8 X-Received: by 2002:a05:6870:241f:b0:1ea:4338:209d with SMTP id n31-20020a056870241f00b001ea4338209dmr32629853oap.31.1699253182253; Sun, 05 Nov 2023 22:46:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699253182; cv=none; d=google.com; s=arc-20160816; b=AGDOjwCVYAerYsA2wzauDRFVzcdjbcJRM7LKDOLr2mDeqlkcPsMnZ+YMGayTMl6oYa rzNk0M6gQaelAUQ90puiqdv5+EvGx2TbiqGh9fyPvJ9D8bddSxTS0pAI8QqvgDPoXqkV yIbVI6aG767FUmzOpWeGuEgM2YjPXk8+f/cAd92AsIG0SvEYiDps/FLyuzKZhrF/k1PN PaTVXlYNukq7/h4deSJgU8MovKq+IHIYv62FOr2sHvbaxTHHqWf/pzQz6a9AZllX2rv2 wMASlIytA0r78G9zdE2O3jvuKrfTWah2mGQARd81mbUzqytSZHO25McXi9Ng4RNEDIMS 9MHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=DRnix/5erLbGgb2S61sKGMVj+9h1q/7QYOYwZAjGFEc=; fh=ZD7x9bmNwVsKHKyEulU1gCJd8/IkjcPl5/+Un0xV2wI=; b=bpQJk/gXmGAGblTveooUndZb2VjTyUQLILJ6cDvddYnMJi3uVfgUfqSoszwp9VPRlw SKTveFdEKI5KqdeKMWRpovjE/xyvXS6yeIJdRowxBPOnjyY9XIg96MDB8F+JYmAHJFCT aJ1bKfoqiINkyrVkQDlF03kbUg+7rDX8V5ptEqME6nryNqq3R57fmWTE7ri1+iwdZXkM pNT6xj1TdZh/xq9fWT24TlDqZ/JdCOOxVlSXiRZkHr3N6cQt8Efxi+FSxGvCTCSOJ2cg 6dFhnYuXHYJK+ls0UwK2cZL+u1DpcDZPtUaVZxSkI3yz7vgW4DZ0rR+XOOLiGJCB9kL9 VcmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id g64-20020a636b43000000b005b90b2d917asi7083411pgc.278.2023.11.05.22.46.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 22:46:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id DB189807F2B2; Sun, 5 Nov 2023 22:46:18 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231354AbjKFGpe (ORCPT + 99 others); Mon, 6 Nov 2023 01:45:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231308AbjKFGpO (ORCPT ); Mon, 6 Nov 2023 01:45:14 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9ED631B2 for ; Sun, 5 Nov 2023 22:44:45 -0800 (PST) Received: from dggpemd200004.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4SP1wH3mPFz1P7xl; Mon, 6 Nov 2023 14:41:35 +0800 (CST) Received: from [10.174.179.24] (10.174.179.24) by dggpemd200004.china.huawei.com (7.185.36.141) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1258.23; Mon, 6 Nov 2023 14:43:39 +0800 Subject: Re: [PATCH v7] mm: vmscan: try to reclaim swapcache pages if no swap space To: "Huang, Ying" References: <20231104140313.3418001-1-liushixin2@huawei.com> <87h6lzy68z.fsf@yhuang6-desk2.ccr.corp.intel.com> CC: Andrew Morton , Yosry Ahmed , Sachin Sant , Michal Hocko , Johannes Weiner , Kefeng Wang , , From: Liu Shixin Message-ID: Date: Mon, 6 Nov 2023 14:43:39 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <87h6lzy68z.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.24] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemd200004.china.huawei.com (7.185.36.141) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-3.9 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sun, 05 Nov 2023 22:46:19 -0800 (PST) On 2023/11/6 10:18, Huang, Ying wrote: > Liu Shixin writes: > >> When spaces of swap devices are exhausted, only file pages can be >> reclaimed. But there are still some swapcache pages in anon lru list. >> This can lead to a premature out-of-memory. >> >> The problem is found with such step: >> >> Firstly, set a 9MB disk swap space, then create a cgroup with 10MB >> memory limit, then runs an program to allocates about 15MB memory. >> >> The problem occurs occasionally, which may need about 100 times [1]. >> >> Fix it by checking number of swapcache pages in can_reclaim_anon_pages(). >> If the number is not zero, return true and set swapcache_only to 1. >> When scan anon lru list in swapcache_only mode, non-swapcache pages will >> be skipped to isolate in order to accelerate reclaim efficiency. >> >> However, in swapcache_only mode, the scan count still increased when scan >> non-swapcache pages because there are large number of non-swapcache pages >> and rare swapcache pages in swapcache_only mode, and if the non-swapcache >> is skipped and do not count, the scan of pages in isolate_lru_folios() can >> eventually lead to hung task, just as Sachin reported [2]. >> >> By the way, since there are enough times of memory reclaim before OOM, it >> is not need to isolate too much swapcache pages in one times. >> >> [1]. https://lore.kernel.org/lkml/CAJD7tkZAfgncV+KbKr36=eDzMnT=9dZOT0dpMWcurHLr6Do+GA@mail.gmail.com/ >> [2]. https://lore.kernel.org/linux-mm/CAJD7tkafz_2XAuqE8tGLPEcpLngewhUo=5US14PAtSM9tLBUQg@mail.gmail.com/ >> >> Signed-off-by: Liu Shixin >> Tested-by: Yosry Ahmed >> Reviewed-by: "Huang, Ying" >> Reviewed-by: Yosry Ahmed >> --- >> v6->v7: Reset swapcache_only to zero after there are swap spaces. >> v5->v6: Fix NULL pointing derefence and hung task problem reported by Sachin. >> >> include/linux/swap.h | 6 ++++++ >> mm/memcontrol.c | 8 ++++++++ >> mm/vmscan.c | 36 ++++++++++++++++++++++++++++++++++-- >> 3 files changed, 48 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index f6dd6575b905..3ba146ae7cf5 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -659,6 +659,7 @@ static inline void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_p >> } >> >> extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg); >> +extern long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg); >> extern bool mem_cgroup_swap_full(struct folio *folio); >> #else >> static inline void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) >> @@ -681,6 +682,11 @@ static inline long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) >> return get_nr_swap_pages(); >> } >> >> +static inline long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) >> +{ >> + return total_swapcache_pages(); >> +} >> + >> static inline bool mem_cgroup_swap_full(struct folio *folio) >> { >> return vm_swap_full(); >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 5b009b233ab8..29e34c06ca83 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -7584,6 +7584,14 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) >> return nr_swap_pages; >> } >> >> +long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) >> +{ >> + if (mem_cgroup_disabled()) >> + return total_swapcache_pages(); >> + >> + return memcg_page_state(memcg, NR_SWAPCACHE); >> +} >> + >> bool mem_cgroup_swap_full(struct folio *folio) >> { >> struct mem_cgroup *memcg; >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 6f13394b112e..a5e04291662f 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -137,6 +137,9 @@ struct scan_control { >> /* Always discard instead of demoting to lower tier memory */ >> unsigned int no_demotion:1; >> >> + /* Swap space is exhausted, only reclaim swapcache for anon LRU */ >> + unsigned int swapcache_only:1; >> + >> /* Allocation order */ >> s8 order; >> >> @@ -602,6 +605,12 @@ static bool can_demote(int nid, struct scan_control *sc) >> return true; >> } >> >> +static void set_swapcache_mode(struct scan_control *sc, bool swapcache_only) >> +{ >> + if (sc) >> + sc->swapcache_only = swapcache_only; >> +} >> + > I think that it's unnecessary to introduce a new function. I understand > that you want to reduce the code duplication. We can add > > sc->swapcache_only = false; > > at the beginning of can_reclaim_anon_pages() to reduce code duplication. > That can cover even more cases IIUC. OK, it?s more appropriate, I will resend v8, thank you. >> static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, >> int nid, >> struct scan_control *sc) >> @@ -611,12 +620,26 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, >> * For non-memcg reclaim, is there >> * space in any swap device? >> */ >> - if (get_nr_swap_pages() > 0) >> + if (get_nr_swap_pages() > 0) { >> + set_swapcache_mode(sc, false); >> return true; >> + } >> + /* Is there any swapcache pages to reclaim? */ >> + if (total_swapcache_pages() > 0) { >> + set_swapcache_mode(sc, true); >> + return true; >> + } >> } else { >> /* Is the memcg below its swap limit? */ >> - if (mem_cgroup_get_nr_swap_pages(memcg) > 0) >> + if (mem_cgroup_get_nr_swap_pages(memcg) > 0) { >> + set_swapcache_mode(sc, false); >> return true; >> + } >> + /* Is there any swapcache pages in memcg to reclaim? */ >> + if (mem_cgroup_get_nr_swapcache_pages(memcg) > 0) { >> + set_swapcache_mode(sc, true); >> + return true; >> + } >> } > If can_demote() returns true, we shouldn't scan swapcache only. > > -- > Best Regards, > Huang, Ying > >> /* >> @@ -2342,6 +2365,15 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan, >> */ >> scan += nr_pages; >> >> + /* >> + * Count non-swapcache too because the swapcache pages may >> + * be rare and it takes too much times here if not count >> + * the non-swapcache pages. >> + */ >> + if (unlikely(sc->swapcache_only && !is_file_lru(lru) && >> + !folio_test_swapcache(folio))) >> + goto move; >> + >> if (!folio_test_lru(folio)) >> goto move; >> if (!sc->may_unmap && folio_mapped(folio)) > . >