Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751850AbdI1VlD (ORCPT ); Thu, 28 Sep 2017 17:41:03 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:52232 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751554AbdI1VlC (ORCPT ); Thu, 28 Sep 2017 17:41:02 -0400 Date: Thu, 28 Sep 2017 14:41:00 -0700 From: Andrew Morton To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, jack@suse.cz, torvalds@linux-foundation.org Subject: Re: [PATCH 10/12] writeback: only allow one inflight and pending full flush Message-Id: <20170928144100.e11801ef742521e0e3f4b8df@linux-foundation.org> In-Reply-To: <1506543239-31470-11-git-send-email-axboe@kernel.dk> References: <1506543239-31470-1-git-send-email-axboe@kernel.dk> <1506543239-31470-11-git-send-email-axboe@kernel.dk> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2154 Lines: 50 On Wed, 27 Sep 2017 14:13:57 -0600 Jens Axboe wrote: > When someone calls wakeup_flusher_threads() or > wakeup_flusher_threads_bdi(), they schedule writeback of all dirty > pages in the system (or on that bdi). If we are tight on memory, we > can get tons of these queued from kswapd/vmscan. This causes (at > least) two problems: > > 1) We consume a ton of memory just allocating writeback work items. > We've seen as much as 600 million of these writeback work items > pending. That's a lot of memory to pointlessly hold hostage, > while the box is under memory pressure. > > 2) We spend so much time processing these work items, that we > introduce a softlockup in writeback processing. This is because > each of the writeback work items don't end up doing any work (it's > hard when you have millions of identical ones coming in to the > flush machinery), so we just sit in a tight loop pulling work > items and deleting/freeing them. > > Fix this by adding a 'start_all' bit to the writeback structure, and > set that when someone attempts to flush all dirty pages. The bit is > cleared when we start writeback on that work item. If the bit is > already set when we attempt to queue !nr_pages writeback, then we > simply ignore it. > > This provides us one full flush in flight, with one pending as well, > and makes for more efficient handling of this type of writeback. > > ... > > @@ -953,12 +954,27 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, > return; > > /* > + * All callers of this function want to start writeback of all > + * dirty pages. Places like vmscan can call this at a very > + * high frequency, causing pointless allocations of tons of > + * work items and keeping the flusher threads busy retrieving > + * that work. Ensure that we only allow one of them pending and > + * inflight at the time. It doesn't matter if we race a little > + * bit on this, so use the faster separate test/set bit variants. > + */ > + if (test_bit(WB_start_all, &wb->state)) > + return; > + > + set_bit(WB_start_all, &wb->state); test_and_set_bit()?