Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753361AbZIIOiJ (ORCPT ); Wed, 9 Sep 2009 10:38:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753275AbZIIOiH (ORCPT ); Wed, 9 Sep 2009 10:38:07 -0400 Received: from mga14.intel.com ([143.182.124.37]:55541 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753333AbZIIOiE (ORCPT ); Wed, 9 Sep 2009 10:38:04 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.44,271,1249282800"; d="scan'208";a="185715917" Date: Wed, 9 Sep 2009 22:37:53 +0800 From: Wu Fengguang To: Jan Kara Cc: Peter Zijlstra , Chris Mason , Artem Bityutskiy , Jens Axboe , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "david@fromorbit.com" , "hch@infradead.org" , "akpm@linux-foundation.org" , "Theodore Ts'o" Subject: Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb Message-ID: <20090909143753.GA2071@localhost> References: <1252401791-22463-9-git-send-email-jens.axboe@oracle.com> <4AA633FD.3080006@gmail.com> <1252425983.7746.120.camel@twins> <20090908162936.GA2975@think> <1252428983.7746.140.camel@twins> <20090908172842.GC2975@think> <1252431974.7746.151.camel@twins> <1252432501.7746.156.camel@twins> <1252434746.7035.7.camel@laptop> <20090909142315.GA7949@duck.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090909142315.GA7949@duck.suse.cz> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3160 Lines: 73 On Wed, Sep 09, 2009 at 10:23:15PM +0800, Jan Kara wrote: > On Tue 08-09-09 20:32:26, Peter Zijlstra wrote: > > On Tue, 2009-09-08 at 19:55 +0200, Peter Zijlstra wrote: > > > > > > I think I'm somewhat confused here though.. > > > > > > There's kernel threads doing writeout, and there's apps getting stuck in > > > balance_dirty_pages(). > > > > > > If we want all writeout to be done by kernel threads (bdi/pd-flush like > > > things) then we still need to manage the actual apps and delay them. > > > > > > As things stand now, we kick pdflush into action when dirty levels are > > > above the background level, and start writing out from the app task when > > > we hit the full dirty level. > > > > > > Moving all writeout to a kernel thread sounds good from writing linear > > > stuff pov, but what do we make apps wait on then? > > > > OK, so like said in the previous email, we could have these app tasks > > simply sleep on a waitqueue which gets periodic wakeups from > > __bdi_writeback_inc() every time the dirty threshold drops. > > > > The woken tasks would then check their bdi dirty limit (its task > > dependent) against the current values and either go back to sleep or > > back to work. > Well, what I imagined we could do is: > Have a per-bdi variable 'pages_written' - that would reflect the amount of > pages written to the bdi since boot (OK, we'd have to handle overflows but > that's doable). > > There will be a per-bdi variable 'pages_waited'. When a thread should sleep > in balance_dirty_pages() because we are over limits, it kicks writeback thread > and does: > to_wait = max(pages_waited, pages_written) + sync_dirty_pages() (or > whatever number we decide) > pages_waited = to_wait > sleep until pages_written reaches to_wait or we drop below dirty limits. > > That will make sure each thread will sleep until writeback threads have done > their duty for the writing thread. > > If we make sure sleeping threads are properly ordered on the wait queue, > we could always wakeup just the first one and thus avoid the herding > effect. When we drop below dirty limits, we would just wakeup the whole > waitqueue. > > Does this sound reasonable? Yup! I have a similar idea: for each chunk the kernel writeback thread synced, it 'honours' so many pages of quota to some waiting/sleeping dirtier task to consume (so that it can continue dirty so many pages). This makes it possible to control the relative/absolute writeback bandwidth for each dirtier tasks. Something like IO controller. Thanks, Fengguang > > The only problem would be the mass wakeups when lots of tasks are > > blocked on dirty, but I'm guessing there's no way around that anyway, > > and its better to have a limited number of writers than have everybody > > write something, which would result in massive write fragmentation. > > Honza > -- > Jan Kara > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/