Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756327Ab3CQNFn (ORCPT ); Sun, 17 Mar 2013 09:05:43 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51631 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756233Ab3CQNE2 (ORCPT ); Sun, 17 Mar 2013 09:04:28 -0400 From: Mel Gorman To: Linux-MM Cc: Jiri Slaby , Valdis Kletnieks , Rik van Riel , Zlatko Calusic , Johannes Weiner , dormando , Satoru Moriya , Michal Hocko , LKML , Mel Gorman Subject: [PATCH 07/10] mm: vmscan: Block kswapd if it is encountering pages under writeback Date: Sun, 17 Mar 2013 13:04:13 +0000 Message-Id: <1363525456-10448-8-git-send-email-mgorman@suse.de> X-Mailer: git-send-email 1.8.1.4 In-Reply-To: <1363525456-10448-1-git-send-email-mgorman@suse.de> References: <1363525456-10448-1-git-send-email-mgorman@suse.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5493 Lines: 142 Historically, kswapd used to congestion_wait() at higher priorities if it was not making forward progress. This made no sense as the failure to make progress could be completely independent of IO. It was later replaced by wait_iff_congested() and removed entirely by commit 258401a6 (mm: don't wait on congested zones in balance_pgdat()) as it was duplicating logic in shrink_inactive_list(). This is problematic. If kswapd encounters many pages under writeback and it continues to scan until it reaches the high watermark then it will quickly skip over the pages under writeback and reclaim clean young pages or push applications out to swap. The use of wait_iff_congested() is not suited to kswapd as it will only stall if the underlying BDI is really congested or a direct reclaimer was unable to write to the underlying BDI. kswapd bypasses the BDI congestion as it sets PF_SWAPWRITE but even if this was taken into account then it would cause direct reclaimers to stall on writeback which is not desirable. This patch sets a ZONE_WRITEBACK flag if direct reclaim or kswapd is encountering too many pages under writeback. If this flag is set and kswapd encounters a PageReclaim page under writeback then it'll assume that the LRU lists are being recycled too quickly before IO can complete and block waiting for some IO to complete. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 8 ++++++++ mm/vmscan.c | 29 ++++++++++++++++++++++++----- 2 files changed, 32 insertions(+), 5 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index edd6b98..c758fb7 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -498,6 +498,9 @@ typedef enum { ZONE_DIRTY, /* reclaim scanning has recently found * many dirty file pages */ + ZONE_WRITEBACK, /* reclaim scanning has recently found + * many pages under writeback + */ } zone_flags_t; static inline void zone_set_flag(struct zone *zone, zone_flags_t flag) @@ -525,6 +528,11 @@ static inline int zone_is_reclaim_dirty(const struct zone *zone) return test_bit(ZONE_DIRTY, &zone->flags); } +static inline int zone_is_reclaim_writeback(const struct zone *zone) +{ + return test_bit(ZONE_WRITEBACK, &zone->flags); +} + static inline int zone_is_reclaim_locked(const struct zone *zone) { return test_bit(ZONE_RECLAIM_LOCKED, &zone->flags); diff --git a/mm/vmscan.c b/mm/vmscan.c index 493728b..7d5a932 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -725,6 +725,19 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (PageWriteback(page)) { /* + * If reclaim is encountering an excessive number of + * pages under writeback and this page is both under + * writeback and PageReclaim then it indicates that + * pages are being queued for IO but are being + * recycled through the LRU before the IO can complete. + * is useless CPU work so wait on the IO to complete. + */ + if (current_is_kswapd() && + zone_is_reclaim_writeback(zone)) { + wait_on_page_writeback(page); + zone_clear_flag(zone, ZONE_WRITEBACK); + + /* * memcg doesn't have any dirty pages throttling so we * could easily OOM just because too many pages are in * writeback and there is nothing else to reclaim. @@ -741,7 +754,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * grab_cache_page_write_begin(,,AOP_FLAG_NOFS), so * testing may_enter_fs here is liable to OOM on them. */ - if (global_reclaim(sc) || + } else if (global_reclaim(sc) || !PageReclaim(page) || !(sc->gfp_mask & __GFP_IO)) { /* * This is slightly racy - end_page_writeback() @@ -756,9 +769,11 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ SetPageReclaim(page); nr_writeback++; + goto keep_locked; + } else { + wait_on_page_writeback(page); } - wait_on_page_writeback(page); } if (!force_reclaim) @@ -1373,8 +1388,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, * isolated page is PageWriteback */ if (nr_writeback && nr_writeback >= - (nr_taken >> (DEF_PRIORITY - sc->priority))) + (nr_taken >> (DEF_PRIORITY - sc->priority))) { wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10); + zone_set_flag(zone, ZONE_WRITEBACK); + } /* * Similarly, if many dirty pages are encountered that are not @@ -2639,8 +2656,8 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, * kswapd shrinks the zone by the number of pages required to reach * the high watermark. * - * Returns true if kswapd scanned at least the requested number of - * pages to reclaim. + * Returns true if kswapd scanned at least the requested number of pages to + * reclaim or if the lack of process was due to pages under writeback. */ static bool kswapd_shrink_zone(struct zone *zone, struct scan_control *sc, @@ -2663,6 +2680,8 @@ static bool kswapd_shrink_zone(struct zone *zone, if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; + zone_clear_flag(zone, ZONE_WRITEBACK); + return sc->nr_scanned >= sc->nr_to_reclaim; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/