Date: Tue, 19 Sep 2017 16:18:40 -0400
From: Johannes Weiner <hannes@cmpxchg.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        linux-mm@kvack.org, clm@fb.com, jack@suse.cz
Subject: Re: [PATCH 6/6] fs-writeback: only allow one inflight and pending
 !nr_pages flush
Message-ID: <20170919201840.GF11873@cmpxchg.org>
References: <1505850787-18311-1-git-send-email-axboe@kernel.dk>
 <1505850787-18311-7-git-send-email-axboe@kernel.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1505850787-18311-7-git-send-email-axboe@kernel.dk>
User-Agent: Mutt/1.8.3 (2017-05-23)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2421
Lines: 59

On Tue, Sep 19, 2017 at 01:53:07PM -0600, Jens Axboe wrote:
> A few callers pass in nr_pages == 0 when they wakeup the flusher
> threads, which means that the flusher should just flush everything
> that was currently dirty. If we are tight on memory, we can get
> tons of these queued from kswapd/vmscan. This causes (at least)
> two problems:
> 
> 1) We consume a ton of memory just allocating writeback work items.
> 2) We spend so much time processing these work items, that we
>    introduce a softlockup in writeback processing.
> 
> Fix this by adding a 'zero_pages' bit to the writeback structure,
> and set that when someone queues a nr_pages==0 flusher thread
> wakeup. The bit is cleared when we start writeback on that work
> item. If the bit is already set when we attempt to queue !nr_pages
> writeback, then we simply ignore it.
> 
> This provides us one of full flush in flight, with one pending as
> well, and makes for more efficient handling of this type of
> writeback.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Just a nitpick:

> @@ -948,15 +949,25 @@ static void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
>  			       bool range_cyclic, enum wb_reason reason)
>  {
>  	struct wb_writeback_work *work;
> +	bool zero_pages = false;
>  
>  	if (!wb_has_dirty_io(wb))
>  		return;
>  
>  	/*
> -	 * If someone asked for zero pages, we write out the WORLD
> +	 * If someone asked for zero pages, we write out the WORLD.
> +	 * Places like vmscan and laptop mode want to queue a wakeup to
> +	 * the flusher threads to clean out everything. To avoid potentially
> +	 * having tons of these pending, ensure that we only allow one of
> +	 * them pending and inflight at the time
>  	 */
> -	if (!nr_pages)
> +	if (!nr_pages) {
> +		if (test_bit(WB_zero_pages, &wb->state))
> +			return;
> +		set_bit(WB_zero_pages, &wb->state);
>  		nr_pages = get_nr_dirty_pages();

We could rely on the work->older_than_this and pass LONG_MAX here
instead to write out the world as it was at the time wb commences.

get_nr_dirty_pages() is somewhat clearer on intent, but on the other
hand it returns global state and is used here in a split-bdi context,
and we can end up in sum requesting the system-wide dirty pages
several times over. It'll work fine, relying on work->older_than_this
to contain it also, it just seems a little ugly and subtle.