Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2379642pxb; Mon, 20 Sep 2021 20:47:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJydIArs1kILxzRgiEgTEiOsN9mLDGPVDLLIVEiMNAB7DWxkibmfLeVx8mVQWTRBDrZY/2Y7 X-Received: by 2002:a17:906:1451:: with SMTP id q17mr31342360ejc.214.1632196025794; Mon, 20 Sep 2021 20:47:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632196025; cv=none; d=google.com; s=arc-20160816; b=f4FjcurZwQdDLD0lEjul+XWdggxeKY5bmi25KuInqkiCSU/PO+ciHKjLXNEc+ufarp 0m6QPvwpHeGX6uM7b/u72EaKfBFRpNBiOW7dOrckGU66nKMaoBpzXct5lIHNaf6Ukcov Ofyx1kdqwZuZdjtD6ETAImV7QEkXEJD1KwYOQ5hNLrUK0WXMzOyBOjYyMd00kkIwL+MM CNoBXXGvVRtfYWPeurtx3/q25LDvzmeh0iolFyh/xd/VXOqYp78u+IY5yNmyKn6E41s8 qfrcUIfOVKPSO7i8RcDKdVYYw6TBjR6KIoMn256sVQZrQhux8jaX/Bsgpz1FN+UIeY5W jdbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=MTG5leen4lsheGEtRl+XIP6c1M4zrXLbsnzmWtAP2fg=; b=jn9kNQCOP9RFbHgyaJqtSnWGQZfoJnTIgPVeKq07cx37nMrYqxjf16jzQxrqoGdLFc RDIzbH0rp+do2C/d7YlbhAW8zEjpKhdH9hE4t0oA/8atXt7+zlbXsQINkam9nQdBx1br 7vKfkQHocVHhzpWZSzbdkO85BuA7Qcu/TAuw+icFlb4TBnkz/nX3GJ82AgSOhZ3ktW4u onkHBo2F2hv4z63d+LPRbyoAX4lb21puYGN/+dSxtVBLE+pM59AKaysxJqRmzcd/mu7B IWNzFpV/YiGTrbtnOFUcQH64jVTkui2DmM0JxK3z6fStzrjLLqag1z/hkR45GZHvIZAp EZLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=0Pmnifex; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k24si6985687edk.120.2021.09.20.20.46.41; Mon, 20 Sep 2021 20:47:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=0Pmnifex; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239524AbhITXWo (ORCPT + 99 others); Mon, 20 Sep 2021 19:22:44 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:44216 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238939AbhITXUn (ORCPT ); Mon, 20 Sep 2021 19:20:43 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8602D220BF; Mon, 20 Sep 2021 23:19:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1632179954; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MTG5leen4lsheGEtRl+XIP6c1M4zrXLbsnzmWtAP2fg=; b=0PmnifexSNW1CkM5Wvo8xI+UTLtvD1h/brYNC9wlrEXICRvuaD41l7d/xPXLmWBuQIzevP 8OtmGkeRuTHSl5TG6Ze9tLzkeoPYDP7PC4lNLiYrXYNfaI+b+WLO4ZVR2jC+E53Sy31C+g jreTclLqJ1O+a3xXLWB/gMXRjBYBudI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1632179954; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MTG5leen4lsheGEtRl+XIP6c1M4zrXLbsnzmWtAP2fg=; b=6efB7MK0zg99DdxQyQNC5nu0tx+KnQdjfTcYNA5gLMMlp8yu0SqI0xDLsW8MkwvvdVE3Qo eV+PlbVFs6OSTEBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9F5E213B38; Mon, 20 Sep 2021 23:19:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EFA3F+4WSWGxZQAAMHmgww (envelope-from ); Mon, 20 Sep 2021 23:19:10 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "NeilBrown" To: "Mel Gorman" Cc: "Linux-MM" , "Theodore Ts'o" , "Andreas Dilger" , "Darrick J . Wong" , "Matthew Wilcox" , "Michal Hocko" , "Dave Chinner" , "Rik van Riel" , "Vlastimil Babka" , "Johannes Weiner" , "Jonathan Corbet" , "Linux-fsdevel" , "LKML" , "Mel Gorman" Subject: Re: [PATCH 1/5] mm/vmscan: Throttle reclaim until some writeback completes if congested In-reply-to: <20210920085436.20939-2-mgorman@techsingularity.net> References: <20210920085436.20939-1-mgorman@techsingularity.net>, <20210920085436.20939-2-mgorman@techsingularity.net> Date: Tue, 21 Sep 2021 09:19:07 +1000 Message-id: <163217994752.3992.5443677201798473600@noble.neil.brown.name> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 20 Sep 2021, Mel Gorman wrote: > =20 > +void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page); > +static inline void acct_reclaim_writeback(struct page *page) > +{ > + pg_data_t *pgdat =3D page_pgdat(page); > + > + if (atomic_read(&pgdat->nr_reclaim_throttled)) > + __acct_reclaim_writeback(pgdat, page); The first thing __acct_reclaim_writeback() does is repeat that atomic_read(). Should we read it once and pass the value in to __acct_reclaim_writeback(), or is that an unnecessary micro-optimisation? > +/* > + * Account for pages written if tasks are throttled waiting on dirty > + * pages to clean. If enough pages have been cleaned since throttling > + * started then wakeup the throttled tasks. > + */ > +void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page) > +{ > + unsigned long nr_written; > + int nr_throttled =3D atomic_read(&pgdat->nr_reclaim_throttled); > + > + __inc_node_page_state(page, NR_THROTTLED_WRITTEN); > + nr_written =3D node_page_state(pgdat, NR_THROTTLED_WRITTEN) - > + READ_ONCE(pgdat->nr_reclaim_start); > + > + if (nr_written > SWAP_CLUSTER_MAX * nr_throttled) > + wake_up_interruptible_all(&pgdat->reclaim_wait); A simple wake_up() could be used here. "interruptible" is only needed if non-interruptible waiters should be left alone. "_all" is only needed if there are some exclusive waiters. Neither of these apply, so I think the simpler interface is best. > +} > + > /* possible outcome of pageout() */ > typedef enum { > /* failed to write page out, page is locked */ > @@ -1412,9 +1453,8 @@ static unsigned int shrink_page_list(struct list_head= *page_list, > =20 > /* > * The number of dirty pages determines if a node is marked > - * reclaim_congested which affects wait_iff_congested. kswapd > - * will stall and start writing pages if the tail of the LRU > - * is all dirty unqueued pages. > + * reclaim_congested. kswapd will stall and start writing > + * pages if the tail of the LRU is all dirty unqueued pages. > */ > page_check_dirty_writeback(page, &dirty, &writeback); > if (dirty || writeback) > @@ -3180,19 +3220,20 @@ static void shrink_node(pg_data_t *pgdat, struct sc= an_control *sc) > * If kswapd scans pages marked for immediate > * reclaim and under writeback (nr_immediate), it > * implies that pages are cycling through the LRU > - * faster than they are written so also forcibly stall. > + * faster than they are written so forcibly stall > + * until some pages complete writeback. > */ > if (sc->nr.immediate) > - congestion_wait(BLK_RW_ASYNC, HZ/10); > + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK, HZ/10); > } > =20 > /* > * Tag a node/memcg as congested if all the dirty pages > * scanned were backed by a congested BDI and "congested BDI" doesn't mean anything any more. Is this a good time to correct that comment. This comment seems to refer to the test sc->nr.dirty && sc->nr.dirty =3D=3D sc->nr.congested) a few lines down. But nr.congested is set from nr_congested which counts when inode_write_congested() is true - almost never - and when=20 "writeback and PageReclaim()". Is that last test the sign that we are cycling through the LRU to fast? So the comment could become: Tag a node/memcg as congested if all the dirty page were already marked for writeback and immediate reclaim (counted in nr.congested). ?? Patch seems to make sense to me, but I'm not expert in this area. Thanks! NeilBrown