Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755991AbXKARRD (ORCPT ); Thu, 1 Nov 2007 13:17:03 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752524AbXKARQy (ORCPT ); Thu, 1 Nov 2007 13:16:54 -0400 Received: from pentafluge.infradead.org ([213.146.154.40]:40682 "EHLO pentafluge.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752271AbXKARQx (ORCPT ); Thu, 1 Nov 2007 13:16:53 -0400 Subject: Re: per-bdi-throttling: synchronous writepage doesn't work correctly From: Peter Zijlstra To: Miklos Szeredi Cc: jdike@addtoit.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , Andrew Morton , Al Viro In-Reply-To: <1193936949.27652.321.camel@twins> References: <1193935886.27652.313.camel@twins> <1193936949.27652.321.camel@twins> Content-Type: text/plain Date: Thu, 01 Nov 2007 18:16:48 +0100 Message-Id: <1193937408.27652.326.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2481 Lines: 59 On Thu, 2007-11-01 at 18:09 +0100, Peter Zijlstra wrote: > On Thu, 2007-11-01 at 18:00 +0100, Miklos Szeredi wrote: > > > > Hi, > > > > > > > > It looks like bdi_thresh will always be zero if filesystem does > > > > synchronous writepage, resulting in very poor write performance. > > > > > > > > Hostfs (UML) is one such example, but there might be others. > > > > > > > > The only solution I can think of is to add a set_page_writeback(); > > > > end_page_writeback() pair (or some reduced variant, that only does > > > > the proportions magic). But that means auditing quite a few > > > > filesystems... > > > > > > Ouch... > > > > > > I take it there is no other function that is shared between all these > > > writeout paths which we could stick a bdi_writeout_inc(bdi) in? > > > > No, and you can't detect it from the callers either I think. > > The page not having PG_writeback set on return is a hint, but not fool > proof, it could be the device is just blazing fast. > > I guess there is nothing to it but for me to grep writepage and manually > look at all hits... writepage: called by the VM to write a dirty page to backing store. This may happen for data integrity reasons (i.e. 'sync'), or to free up memory (flush). The difference can be seen in wbc->sync_mode. The PG_Dirty flag has been cleared and PageLocked is true. writepage should start writeout, should set PG_Writeback, and should make sure the page is unlocked, either synchronously or asynchronously when the write operation completes. If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to try too hard if there are problems, and may choose to write out other pages from the mapping if that is easier (e.g. due to internal dependencies). If it chooses not to start writeout, it should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep calling ->writepage on that page. See the file "Locking" for more details. The "should set PG_Writeback" bit threw me off I guess. Anyway, do we want me to just stick in bdi_writeout_inc(page->mapping->backing_dev_info) everywhere, or do we want to dress this up in a nice API? If so, any suggestions? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/