Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp1947793rdf; Sun, 5 Nov 2023 23:00:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IHmVMY28ockhosyA92z4mHOxFc3CKE3Q8kl6UII4c6kAQhD9NFGmYsr+/+FFERWmBOKE9Ox X-Received: by 2002:a17:902:ea0a:b0:1cc:56b3:efcb with SMTP id s10-20020a170902ea0a00b001cc56b3efcbmr18848968plg.1.1699254058508; Sun, 05 Nov 2023 23:00:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699254058; cv=none; d=google.com; s=arc-20160816; b=mdMdi9nWBTR/Kx0hgY4ih5RRS5Xs+Grrf2jgBV9ki1LVYZV7j6I+rMCzqN+nOrvMTT U48s+czc6d9R8TFlnbQB43/pqm/eqwlhRkQUni1tF4rH/jPA/HyOUD7+0HOEK9DEw9C6 A09Zv66NgZ/RpN3PLhMk57USE/VQUy1Ms+RjQaUL+QgkorjLsh3QDACjG6ty5qOXTXSL 0ScWjjGcl3NkC9HF9j81sIFP3ja1XwuobZEdMZLCt7kqjhDCeloLLpU3JvSa4G7yoeqB O1YjbQwumWLIIXJhiZ5uDj1lBMvEUD854oDUxCoYyBOTjZKGhS3LuEr9Z5YjIWrCjrWo 4cRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:dkim-signature; bh=HxQao/a5oHfhY8wpu6BRSUOvdgwN6AhROqzhmNUotRI=; fh=3OLdiabcX+WJSvVq7X+i2qxcitsFbjF4m7oz/rxSejY=; b=yP13rWTVOFzO/lUsqcU9GLNWknoSbyYDvjjJs01M+d2JrEVHntR10kISL98XPszjPV +hmOIJ6s10mYOdr+Vq2WgEX8ls8uez55EKDTQSoYTt7fKToCIFt9pP6BxiPRn+RucgGq OnctOAE8LkAVOHUoSitp1Ddd40a4iY5GvAPunse5uQqtFjOZjWgyDrw8n/Ku/bRxzK4x ZvTGtfL3Ncta5vifpa2iPvtePowrLw7P6zU983Zg1upFOixpsyzSacznzmP9VaiznXJJ Zg+lqQ9uMm1nVrceiQx/119T4JSeu1mo2eMXaYX3zzYIYxnFyGU+AHkmMUCHxqcmoS0g k2Tw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=PxCWOVtT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id d15-20020a170903230f00b001c74d1da69csi8410510plh.362.2023.11.05.23.00.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 23:00:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=PxCWOVtT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id B1BC080870EA; Sun, 5 Nov 2023 23:00:25 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230490AbjKFHAA (ORCPT + 99 others); Mon, 6 Nov 2023 02:00:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230424AbjKFG75 (ORCPT ); Mon, 6 Nov 2023 01:59:57 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CEACBB for ; Sun, 5 Nov 2023 22:59:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699253995; x=1730789995; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=dbh2CWMpd+icdilLpDkSpVgYFt/E0AyxSgbvKFOsvd8=; b=PxCWOVtTXPeaHNbYErXHu0k2E0cf7I6GaZc9//up2WOiwOns9SZBcMBn NJOGZI82sOY1rf2Ze2/xJ5dcgfll4NQ7fjPwEZEGbw6GH91sXc2DwS5HA x4XFX2FrQPLnGc1FFp7j+ZMvXEfOwfqIoBs7Huic5ogG3ilYKtbA06LtN 68+SLnUW39WjkfodAWEgtn/1h+tfc8xUOBbF3F96dOiUuMUw0IHtCMbny 1ENg2QM6fdWAeWYyJDOfJEhAYF3rElYRO/JRLRyWnZc1v1OggezTP1/DO D0CYKsHtIS4zxv+aK+bUF7GHaxJSQTfSleaz6N1vV6c4Kn2T7hqDh3+5n g==; X-IronPort-AV: E=McAfee;i="6600,9927,10885"; a="10757439" X-IronPort-AV: E=Sophos;i="6.03,280,1694761200"; d="scan'208";a="10757439" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Nov 2023 22:59:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10885"; a="762236548" X-IronPort-AV: E=Sophos;i="6.03,280,1694761200"; d="scan'208";a="762236548" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Nov 2023 22:59:51 -0800 From: "Huang, Ying" To: Liu Shixin Cc: Andrew Morton , Yosry Ahmed , Sachin Sant , Michal Hocko , Johannes Weiner , Kefeng Wang , , Subject: Re: [PATCH v8] mm: vmscan: try to reclaim swapcache pages if no swap space In-Reply-To: <20231106074452.2581835-1-liushixin2@huawei.com> (Liu Shixin's message of "Mon, 6 Nov 2023 15:44:52 +0800") References: <20231106074452.2581835-1-liushixin2@huawei.com> Date: Mon, 06 Nov 2023 14:57:49 +0800 Message-ID: <87pm0nv05u.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-1.4 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Sun, 05 Nov 2023 23:00:25 -0800 (PST) Liu Shixin writes: > When spaces of swap devices are exhausted, only file pages can be > reclaimed. But there are still some swapcache pages in anon lru list. > This can lead to a premature out-of-memory. > > The problem is found with such step: > > Firstly, set a 9MB disk swap space, then create a cgroup with 10MB > memory limit, then runs an program to allocates about 15MB memory. > > The problem occurs occasionally, which may need about 100 times [1]. > > Fix it by checking number of swapcache pages in can_reclaim_anon_pages(). > If the number is not zero, return true and set swapcache_only to 1. > When scan anon lru list in swapcache_only mode, non-swapcache pages will > be skipped to isolate in order to accelerate reclaim efficiency. > > However, in swapcache_only mode, the scan count still increased when scan > non-swapcache pages because there are large number of non-swapcache pages > and rare swapcache pages in swapcache_only mode, and if the non-swapcache > is skipped and do not count, the scan of pages in isolate_lru_folios() can > eventually lead to hung task, just as Sachin reported [2]. > > By the way, since there are enough times of memory reclaim before OOM, it > is not need to isolate too much swapcache pages in one times. > > [1]. https://lore.kernel.org/lkml/CAJD7tkZAfgncV+KbKr36=eDzMnT=9dZOT0dpMWcurHLr6Do+GA@mail.gmail.com/ > [2]. https://lore.kernel.org/linux-mm/CAJD7tkafz_2XAuqE8tGLPEcpLngewhUo=5US14PAtSM9tLBUQg@mail.gmail.com/ > > Signed-off-by: Liu Shixin > Tested-by: Yosry Ahmed > Reviewed-by: "Huang, Ying" > Reviewed-by: Yosry Ahmed > --- > v7->v8: Reset swapcache_only at the beginning of can_reclaim_anon_pages(). > v6->v7: Reset swapcache_only to zero after there are swap spaces. > v5->v6: Fix NULL pointing derefence and hung task problem reported by Sachin. > > include/linux/swap.h | 6 ++++++ > mm/memcontrol.c | 8 ++++++++ > mm/vmscan.c | 27 +++++++++++++++++++++++++++ > 3 files changed, 41 insertions(+) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index f6dd6575b905..3ba146ae7cf5 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -659,6 +659,7 @@ static inline void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_p > } > > extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg); > +extern long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg); > extern bool mem_cgroup_swap_full(struct folio *folio); > #else > static inline void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) > @@ -681,6 +682,11 @@ static inline long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) > return get_nr_swap_pages(); > } > > +static inline long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) > +{ > + return total_swapcache_pages(); > +} > + > static inline bool mem_cgroup_swap_full(struct folio *folio) > { > return vm_swap_full(); > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5b009b233ab8..29e34c06ca83 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -7584,6 +7584,14 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) > return nr_swap_pages; > } > > +long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) > +{ > + if (mem_cgroup_disabled()) > + return total_swapcache_pages(); > + > + return memcg_page_state(memcg, NR_SWAPCACHE); > +} > + > bool mem_cgroup_swap_full(struct folio *folio) > { > struct mem_cgroup *memcg; > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 6f13394b112e..5d5a169ec98c 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -137,6 +137,9 @@ struct scan_control { > /* Always discard instead of demoting to lower tier memory */ > unsigned int no_demotion:1; > > + /* Swap space is exhausted, only reclaim swapcache for anon LRU */ > + unsigned int swapcache_only:1; > + > /* Allocation order */ > s8 order; > > @@ -606,6 +609,9 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, > int nid, > struct scan_control *sc) > { > + if (sc) > + sc->swapcache_only = 0; > + > if (memcg == NULL) { > /* > * For non-memcg reclaim, is there > @@ -613,10 +619,22 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, > */ > if (get_nr_swap_pages() > 0) > return true; > + /* Is there any swapcache pages to reclaim? */ > + if (total_swapcache_pages() > 0) { > + if (sc) > + sc->swapcache_only = 1; > + return true; > + } > } else { > /* Is the memcg below its swap limit? */ > if (mem_cgroup_get_nr_swap_pages(memcg) > 0) > return true; > + /* Is there any swapcache pages in memcg to reclaim? */ > + if (mem_cgroup_get_nr_swapcache_pages(memcg) > 0) { > + if (sc) > + sc->swapcache_only = 1; > + return true; > + } > } I understand that this is only possible in theory. But if can_demote() == true, get_nr_swap_pages() == 0, total_swapcache_pages() > 0, we will demote only anonymous pages in swapcache. I think that this isn't reasonable. So, swapcache pages should be checked after can_demote() checking. -- Best Regards, Huang, Ying > /* > @@ -2342,6 +2360,15 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan, > */ > scan += nr_pages; > > + /* > + * Count non-swapcache too because the swapcache pages may > + * be rare and it takes too much times here if not count > + * the non-swapcache pages. > + */ > + if (unlikely(sc->swapcache_only && !is_file_lru(lru) && > + !folio_test_swapcache(folio))) > + goto move; > + > if (!folio_test_lru(folio)) > goto move; > if (!sc->may_unmap && folio_mapped(folio))