Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753318AbZIYDT2 (ORCPT ); Thu, 24 Sep 2009 23:19:28 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753224AbZIYDT2 (ORCPT ); Thu, 24 Sep 2009 23:19:28 -0400 Received: from mga14.intel.com ([143.182.124.37]:52891 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753218AbZIYDT1 (ORCPT ); Thu, 24 Sep 2009 23:19:27 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.44,449,1249282800"; d="scan'208";a="191489191" Date: Fri, 25 Sep 2009 11:19:20 +0800 From: Wu Fengguang To: Dave Chinner Cc: Chris Mason , Andrew Morton , Peter Zijlstra , "Li, Shaohua" , "linux-kernel@vger.kernel.org" , "richard@rsk.demon.co.uk" , "jens.axboe@oracle.com" Subject: Re: regression in page writeback Message-ID: <20090925031920.GB10487@localhost> References: <20090922175452.d66400dd.akpm@linux-foundation.org> <20090923011758.GC6382@localhost> <20090922182832.28e7f73a.akpm@linux-foundation.org> <20090923014500.GA11076@localhost> <20090922185941.1118e011.akpm@linux-foundation.org> <20090923022622.GB11918@localhost> <20090922193622.42c00012.akpm@linux-foundation.org> <20090923140058.GA2794@think> <20090924031508.GD6456@localhost> <20090925001117.GA9464@discord.disaster> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090925001117.GA9464@discord.disaster> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3062 Lines: 59 On Fri, Sep 25, 2009 at 08:11:17AM +0800, Dave Chinner wrote: > On Thu, Sep 24, 2009 at 11:15:08AM +0800, Wu Fengguang wrote: > > On Wed, Sep 23, 2009 at 10:00:58PM +0800, Chris Mason wrote: > > > The only place that actually honors the congestion flag is pdflush. > > > It's trivial to get pdflush backed up and make it sit down without > > > making any progress because once the queue congests, pdflush goes away. > > > > Right. I guess that's more or less intentional - to give lowest priority > > to periodic/background writeback. > > IMO, this is the wrong design. Background writeback should > have higher CPU/scheduler priority than normal tasks. If there is > sufficient dirty pages in the system for background writeback to > be active, it should be running *now* to start as much IO as it can > without being held up by other, lower priority tasks. > > Cleaning pages is important to keeping the system running smoothly. > Given that IO takes time to clean pages, it is therefore important > to issue as much as possible as quickly as possible without delays > before going back to sleep. Delaying issue of the IO or doing > sub-optimal issue simply reduces performance of the system because > it takes longer to clean the same number of dirty pages. > > > > Nothing stops other procs from keeping the queue congested forever. > > > This can only be fixed by making everyone wait for congestion, at which > > > point we might as well wait for requests. > > > > Yes. That gives everyone somehow equal opportunity, this is a policy change > > that may lead to interesting effects, as well as present a challenge to > > get_request_wait(). That said, I'm not against the change to a wait queue > > in general. > > If you block all threads doing _writebehind caching_ (synchronous IO > is self-throttling) to the same BDI on the same queue as the bdi > flusher then when congestion clears the higher priority background > flusher thread should run first and issue more IO. This should > happen as a natural side-effect of our scheduling algorithms and it > gives preference to efficient background writeback over in-efficient > foreground writeback. Indeed, with this approach we can even avoid > foreground writeback altogether... I don't see how balance_dirty_pages() writeout is less efficient than pdflush writeout. They all called the same routines to do the job. balance_dirty_pages() sets nr_to_write=1536 at least for ext4 and xfs (unless memory is tight; btrfs is 1540), which is in fact 50% bigger than the 1024 pages used by pdflush. And it won't back off on congestion. The s_io/b_io queues are shared, so a balance_dirty_pages() will just continue from where the last sync thread exited. So it does not make much difference who initiates the IO. Did I missed something? Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/