Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751533AbdISUSr (ORCPT ); Tue, 19 Sep 2017 16:18:47 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:50102 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751237AbdISUSp (ORCPT ); Tue, 19 Sep 2017 16:18:45 -0400 Date: Tue, 19 Sep 2017 16:18:40 -0400 From: Johannes Weiner To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, clm@fb.com, jack@suse.cz Subject: Re: [PATCH 6/6] fs-writeback: only allow one inflight and pending !nr_pages flush Message-ID: <20170919201840.GF11873@cmpxchg.org> References: <1505850787-18311-1-git-send-email-axboe@kernel.dk> <1505850787-18311-7-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505850787-18311-7-git-send-email-axboe@kernel.dk> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2421 Lines: 59 On Tue, Sep 19, 2017 at 01:53:07PM -0600, Jens Axboe wrote: > A few callers pass in nr_pages == 0 when they wakeup the flusher > threads, which means that the flusher should just flush everything > that was currently dirty. If we are tight on memory, we can get > tons of these queued from kswapd/vmscan. This causes (at least) > two problems: > > 1) We consume a ton of memory just allocating writeback work items. > 2) We spend so much time processing these work items, that we > introduce a softlockup in writeback processing. > > Fix this by adding a 'zero_pages' bit to the writeback structure, > and set that when someone queues a nr_pages==0 flusher thread > wakeup. The bit is cleared when we start writeback on that work > item. If the bit is already set when we attempt to queue !nr_pages > writeback, then we simply ignore it. > > This provides us one of full flush in flight, with one pending as > well, and makes for more efficient handling of this type of > writeback. > > Signed-off-by: Jens Axboe Acked-by: Johannes Weiner Just a nitpick: > @@ -948,15 +949,25 @@ static void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, > bool range_cyclic, enum wb_reason reason) > { > struct wb_writeback_work *work; > + bool zero_pages = false; > > if (!wb_has_dirty_io(wb)) > return; > > /* > - * If someone asked for zero pages, we write out the WORLD > + * If someone asked for zero pages, we write out the WORLD. > + * Places like vmscan and laptop mode want to queue a wakeup to > + * the flusher threads to clean out everything. To avoid potentially > + * having tons of these pending, ensure that we only allow one of > + * them pending and inflight at the time > */ > - if (!nr_pages) > + if (!nr_pages) { > + if (test_bit(WB_zero_pages, &wb->state)) > + return; > + set_bit(WB_zero_pages, &wb->state); > nr_pages = get_nr_dirty_pages(); We could rely on the work->older_than_this and pass LONG_MAX here instead to write out the world as it was at the time wb commences. get_nr_dirty_pages() is somewhat clearer on intent, but on the other hand it returns global state and is used here in a split-bdi context, and we can end up in sum requesting the system-wide dirty pages several times over. It'll work fine, relying on work->older_than_this to contain it also, it just seems a little ugly and subtle.