Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp252174pxb; Tue, 19 Oct 2021 02:04:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwq21QdrMLXooKw5ECFeYaFtXcFLAcz3BQb4iOwS/x1dvcZzCcg8Juwyr9YE+N4TAGma1Rr X-Received: by 2002:a17:90b:3b4c:: with SMTP id ot12mr5021035pjb.51.1634634262357; Tue, 19 Oct 2021 02:04:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634634262; cv=none; d=google.com; s=arc-20160816; b=SfvkstZdrHN9d3jru5hyOMWW/sUHu7fF5TI8fz8FRxFhIcafcCSa1AwslVI/FGRZ2I m0reyJO8BKeV3ys7vLqfkaqBaJF5r1ZgfE0WebN6V8S5bIAgExOFZOAjk5bkWPZBTH70 GZ2rQ+2L5RzuTaGiNnIjuFNu2EW1QHTcnIHIXvIXmpTjRy0aGiB5x1Qa8O9J3R0y83WN wAWvpoRH4IIJRK1CaW4CTxmnuAKMqYUSSgo5UC74GJXntxrPwM2eZPMeRS2mLYgCV2Dq 9oud0VY3zbUQTqG/kHMMLDxLu0lWUQJj10dCS0RQ3uG6H7BFv3QYbpjEqchnujfTqvw4 iYig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=2DDYqUJurHcWWMAMqu5PD6GYr22uVnjEfovu1ws5MOc=; b=Q7pc4BaawcgEmuAMVhb8BM5yM2EtSPmmFxdI5CbkxR411fgMXJ7QYnsQP3F6muMMlZ +Jc4bVVIPERm9W4Kqzclx5bO5QxgnTtdY6blUGKTTgYAIvEjlolQSUC0sLSJXmxgypkd j6k0KwPIgG5XI0v1DTYFU7DiBGlXEhZXLuXTu+IykmPr52Ulk74uuJdEd99uNA97a7kS LaP1CXjVbpgwDpiDTSDSi4lKTreOGs6wztlE9pXd6HjIBe6AkxlZ/x6kA4g3ew204+7+ JxHIkbCBtQgyb70pBie722sSmr22zANhPo+j8rY3cD4LgzSFgP2TSIWT/T2SRyhoO1Hu z5Xg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w19si1386434pjq.94.2021.10.19.02.04.04; Tue, 19 Oct 2021 02:04:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234961AbhJSJEg (ORCPT + 99 others); Tue, 19 Oct 2021 05:04:36 -0400 Received: from outbound-smtp44.blacknight.com ([46.22.136.52]:49873 "EHLO outbound-smtp44.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234957AbhJSJEf (ORCPT ); Tue, 19 Oct 2021 05:04:35 -0400 Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp44.blacknight.com (Postfix) with ESMTPS id 8ED68F8382 for ; Tue, 19 Oct 2021 10:02:20 +0100 (IST) Received: (qmail 9401 invoked from network); 19 Oct 2021 09:02:20 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.17.29]) by 81.17.254.9 with ESMTPA; 19 Oct 2021 09:02:20 -0000 From: Mel Gorman To: Andrew Morton Cc: NeilBrown , Theodore Ts'o , Andreas Dilger , "Darrick J . Wong" , Matthew Wilcox , Michal Hocko , Dave Chinner , Rik van Riel , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Linux-MM , Linux-fsdevel , LKML , Mel Gorman Subject: [PATCH 6/8] mm/vmscan: Centralise timeout values for reclaim_throttle Date: Tue, 19 Oct 2021 10:01:06 +0100 Message-Id: <20211019090108.25501-7-mgorman@techsingularity.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211019090108.25501-1-mgorman@techsingularity.net> References: <20211019090108.25501-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Neil Brown raised concerns about callers of reclaim_throttle specifying a timeout value. The original timeout values to congestion_wait() were probably pulled out of thin air or copy&pasted from somewhere else. This patch centralises the timeout values and selects a timeout based on the reason for reclaim throttling. These figures are also pulled out of the same thin air but better values may be derived Running a workload that is throttling for inappropriate periods and tracing mm_vmscan_throttled can be used to pick a more appropriate value. Excessive throttling would pick a lower timeout where as excessive CPU usage in reclaim context would select a larger timeout. Ideally a large value would always be used and the wakeups would occur before a timeout but that requires careful testing. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 2 +- mm/internal.h | 3 +-- mm/page-writeback.c | 2 +- mm/vmscan.c | 48 +++++++++++++++++++++++++++++++++------------ 4 files changed, 38 insertions(+), 17 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7359093d8ac0..151b04c4dab3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -828,7 +828,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (cc->mode == MIGRATE_ASYNC) return -EAGAIN; - reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED); if (fatal_signal_pending(current)) return -EINTR; diff --git a/mm/internal.h b/mm/internal.h index 3461a1055975..63d8ebbc5a6d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -129,8 +129,7 @@ extern unsigned long highest_memmap_pfn; */ extern int isolate_lru_page(struct page *page); extern void putback_lru_page(struct page *page); -extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, - long timeout); +extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason); /* * in mm/rmap.c: diff --git a/mm/page-writeback.c b/mm/page-writeback.c index f34f54fcd5b4..4b01a6872f9e 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2374,7 +2374,7 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc) * guess as any. */ reclaim_throttle(NODE_DATA(numa_node_id()), - VMSCAN_THROTTLE_WRITEBACK, HZ/50); + VMSCAN_THROTTLE_WRITEBACK); } /* * Usually few pages are written by now from those we've just submitted diff --git a/mm/vmscan.c b/mm/vmscan.c index 14127bbf2c3b..1f5c467dc83c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1006,12 +1006,10 @@ static void handle_write_error(struct address_space *mapping, unlock_page(page); } -void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, - long timeout) +void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason) { wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason]; - long ret; - bool acct_writeback = (reason == VMSCAN_THROTTLE_WRITEBACK); + long timeout, ret; DEFINE_WAIT(wait); /* @@ -1023,17 +1021,41 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, current->flags & (PF_IO_WORKER|PF_KTHREAD)) return; - if (acct_writeback && - atomic_inc_return(&pgdat->nr_writeback_throttled) == 1) { - WRITE_ONCE(pgdat->nr_reclaim_start, - node_page_state(pgdat, NR_THROTTLED_WRITTEN)); + /* + * These figures are pulled out of thin air. + * VMSCAN_THROTTLE_ISOLATED is a transient condition based on too many + * parallel reclaimers which is a short-lived event so the timeout is + * short. Failing to make progress or waiting on writeback are + * potentially long-lived events so use a longer timeout. This is shaky + * logic as a failure to make progress could be due to anything from + * writeback to a slow device to excessive references pages at the tail + * of the inactive LRU. + */ + switch(reason) { + case VMSCAN_THROTTLE_NOPROGRESS: + case VMSCAN_THROTTLE_WRITEBACK: + timeout = HZ/10; + + if (atomic_inc_return(&pgdat->nr_writeback_throttled) == 1) { + WRITE_ONCE(pgdat->nr_reclaim_start, + node_page_state(pgdat, NR_THROTTLED_WRITTEN)); + } + + break; + case VMSCAN_THROTTLE_ISOLATED: + timeout = HZ/50; + break; + default: + WARN_ON_ONCE(1); + timeout = HZ; + break; } prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); ret = schedule_timeout(timeout); finish_wait(wqh, &wait); - if (acct_writeback) + if (reason == VMSCAN_THROTTLE_ISOLATED) atomic_dec(&pgdat->nr_writeback_throttled); trace_mm_vmscan_throttled(pgdat->node_id, jiffies_to_usecs(timeout), @@ -2319,7 +2341,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, /* wait a bit for the reclaimer. */ stalled = true; - reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED); /* We are about to die and free our memory. Return now. */ if (fatal_signal_pending(current)) @@ -3251,7 +3273,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) * until some pages complete writeback. */ if (sc->nr.immediate) - reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); } /* @@ -3275,7 +3297,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) if (!current_is_kswapd() && current_may_throttle() && !sc->hibernation_mode && test_bit(LRUVEC_CONGESTED, &target_lruvec->flags)) - reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, sc)) @@ -3347,7 +3369,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc) /* Throttle if making no progress at high prioities. */ if (sc->priority < DEF_PRIORITY - 2) - reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS); } /* -- 2.31.1