Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp461359yba; Wed, 15 May 2019 04:31:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqxf4dTqqRFEYgpgcfbXj2xezl1FsRYZo8ujyHlQy+0JBgqfJxgVqrGEpsMAlXSTQHIOWyhs X-Received: by 2002:a63:58b:: with SMTP id 133mr43922827pgf.138.1557919892269; Wed, 15 May 2019 04:31:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557919892; cv=none; d=google.com; s=arc-20160816; b=hrPlLyR2u0ujPYaYeeaHjSruV72vMofBCyoXjatMlmi0sRVFkoKk763Ba0GCj3HvEc GGe387pAgXwgFv5ZK/Gb0kqcIyl6zbQ+fLIWohITp/jESicgu9H3F77wlB5FaZvQzeFb qmxxZotSVhyMy10R/hNe2pVt4fQp8e0sF2I8vThcDSuPY8HoZtDnKpBTpA42Lq0uMgB0 gvCKX6m9ZMgP/DUTF66mWAT3NtaBP2LSmXIDHEGiiHV7NRh7wMdmBtzK4lNvk5pD5CfH Ha+uGn7Qds4PUNr0Kgn9Q3AFDRuStrBnvxFpWtYm88Neq70CPABiW0C5PcsCZhQ6NAY+ fO8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=IDjb2TT1r9bhec7IB18hbfyA2elLDhHZ4JwDovl97t8=; b=C4JY+iQKcTpUa5cY/Gh14JG62Pckg7cBa9w4YgfllP0lxkmj7GVPmUv7p4LbAQBfTb jF8HXWhtJkQ+tK7tinFfwKu0TVVMQ+brePydHMLIqPnMdky5Hh8f/Fcjlg+w2L19z5zX kh0gADYKd0mCoKsodKdRDF/ATp1KxbBNMt/ViUgEkGDsZw3Z9Rtj8kXoveNxM0UUM3R3 PEa26sfi6Gbi93GMLrdYjIiLszYiiMR6s4Hsl+xSVBOv03ZYXcVIHztOi/lAqJLA2IdF UKmcpoRxX4nDcF3dhl69+zpGqyZdvd/1RnsQJiIVnP15BGfogJ9io3z0M40pMcUMW5v6 T8GQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="sBF1/7n6"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 31si1565368pli.242.2019.05.15.04.31.17; Wed, 15 May 2019 04:31:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="sBF1/7n6"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732024AbfEOL23 (ORCPT + 99 others); Wed, 15 May 2019 07:28:29 -0400 Received: from mail.kernel.org ([198.145.29.99]:39490 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732395AbfEOL20 (ORCPT ); Wed, 15 May 2019 07:28:26 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 21B672084F; Wed, 15 May 2019 11:28:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1557919704; bh=tP/OVEYcS1Z+hedFvdTXtUtRievtQUzy9tUv2flpNPQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sBF1/7n6RhQDfVDWFRFIF+Zewy/0V1JL3mrCB/MsS5v6u4Kkeso87Id2CrPR5MUdf uLYpZXmlGDSp7O+pe2GUBDKpbSJkqy4eFXsB5CEy078WDA5Ld4rWxUH7CoMFm2qnvs EFcnQooLYWKVBOMIHz1Tr85T8N4LkntO/vSdyGZY= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Johannes Weiner , Shakeel Butt , Roman Gushchin , Michal Hocko , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 5.0 061/137] mm: fix inactive list balancing between NUMA nodes and cgroups Date: Wed, 15 May 2019 12:55:42 +0200 Message-Id: <20190515090657.915361996@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190515090651.633556783@linuxfoundation.org> References: <20190515090651.633556783@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [ Upstream commit 3b991208b897f52507168374033771a984b947b1 ] During !CONFIG_CGROUP reclaim, we expand the inactive list size if it's thrashing on the node that is about to be reclaimed. But when cgroups are enabled, we suddenly ignore the node scope and use the cgroup scope only. The result is that pressure bleeds between NUMA nodes depending on whether cgroups are merely compiled into Linux. This behavioral difference is unexpected and undesirable. When the refault adaptivity of the inactive list was first introduced, there were no statistics at the lruvec level - the intersection of node and memcg - so it was better than nothing. But now that we have that infrastructure, use lruvec_page_state() to make the list balancing decision always NUMA aware. [hannes@cmpxchg.org: fix bisection hole] Link: http://lkml.kernel.org/r/20190417155241.GB23013@cmpxchg.org Link: http://lkml.kernel.org/r/20190412144438.2645-1-hannes@cmpxchg.org Fixes: 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition") Signed-off-by: Johannes Weiner Reviewed-by: Shakeel Butt Cc: Roman Gushchin Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/vmscan.c | 29 +++++++++-------------------- 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index e979705bbf325..022afabac3f69 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2199,7 +2199,6 @@ static void shrink_active_list(unsigned long nr_to_scan, * 10TB 320 32GB */ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, - struct mem_cgroup *memcg, struct scan_control *sc, bool actual_reclaim) { enum lru_list active_lru = file * LRU_FILE + LRU_ACTIVE; @@ -2220,16 +2219,12 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, inactive = lruvec_lru_size(lruvec, inactive_lru, sc->reclaim_idx); active = lruvec_lru_size(lruvec, active_lru, sc->reclaim_idx); - if (memcg) - refaults = memcg_page_state(memcg, WORKINGSET_ACTIVATE); - else - refaults = node_page_state(pgdat, WORKINGSET_ACTIVATE); - /* * When refaults are being observed, it means a new workingset * is being established. Disable active list protection to get * rid of the stale workingset quickly. */ + refaults = lruvec_page_state(lruvec, WORKINGSET_ACTIVATE); if (file && actual_reclaim && lruvec->refaults != refaults) { inactive_ratio = 0; } else { @@ -2250,12 +2245,10 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, } static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, - struct lruvec *lruvec, struct mem_cgroup *memcg, - struct scan_control *sc) + struct lruvec *lruvec, struct scan_control *sc) { if (is_active_lru(lru)) { - if (inactive_list_is_low(lruvec, is_file_lru(lru), - memcg, sc, true)) + if (inactive_list_is_low(lruvec, is_file_lru(lru), sc, true)) shrink_active_list(nr_to_scan, lruvec, sc, lru); return 0; } @@ -2355,7 +2348,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, * anonymous pages on the LRU in eligible zones. * Otherwise, the small LRU gets thrashed. */ - if (!inactive_list_is_low(lruvec, false, memcg, sc, false) && + if (!inactive_list_is_low(lruvec, false, sc, false) && lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx) >> sc->priority) { scan_balance = SCAN_ANON; @@ -2373,7 +2366,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, * lruvec even if it has plenty of old anonymous pages unless the * system is under heavy pressure. */ - if (!inactive_list_is_low(lruvec, true, memcg, sc, false) && + if (!inactive_list_is_low(lruvec, true, sc, false) && lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) { scan_balance = SCAN_FILE; goto out; @@ -2526,7 +2519,7 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc nr[lru] -= nr_to_scan; nr_reclaimed += shrink_list(lru, nr_to_scan, - lruvec, memcg, sc); + lruvec, sc); } } @@ -2593,7 +2586,7 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc * Even if we did not try to evict anon pages at all, we want to * rebalance the anon lru active/inactive ratio. */ - if (inactive_list_is_low(lruvec, false, memcg, sc, true)) + if (inactive_list_is_low(lruvec, false, sc, true)) shrink_active_list(SWAP_CLUSTER_MAX, lruvec, sc, LRU_ACTIVE_ANON); } @@ -2993,12 +2986,8 @@ static void snapshot_refaults(struct mem_cgroup *root_memcg, pg_data_t *pgdat) unsigned long refaults; struct lruvec *lruvec; - if (memcg) - refaults = memcg_page_state(memcg, WORKINGSET_ACTIVATE); - else - refaults = node_page_state(pgdat, WORKINGSET_ACTIVATE); - lruvec = mem_cgroup_lruvec(pgdat, memcg); + refaults = lruvec_page_state(lruvec, WORKINGSET_ACTIVATE); lruvec->refaults = refaults; } while ((memcg = mem_cgroup_iter(root_memcg, memcg, NULL))); } @@ -3363,7 +3352,7 @@ static void age_active_anon(struct pglist_data *pgdat, do { struct lruvec *lruvec = mem_cgroup_lruvec(pgdat, memcg); - if (inactive_list_is_low(lruvec, false, memcg, sc, true)) + if (inactive_list_is_low(lruvec, false, sc, true)) shrink_active_list(SWAP_CLUSTER_MAX, lruvec, sc, LRU_ACTIVE_ANON); -- 2.20.1