Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753776Ab0BWDXh (ORCPT ); Mon, 22 Feb 2010 22:23:37 -0500 Received: from thunk.org ([69.25.196.29]:53534 "EHLO thunker.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753482Ab0BWDXg (ORCPT ); Mon, 22 Feb 2010 22:23:36 -0500 Date: Mon, 22 Feb 2010 22:23:17 -0500 From: tytso@mit.edu To: Dave Chinner Cc: Jan Kara , Linus Torvalds , Jens Axboe , Linux Kernel , jengelh@medozas.de, stable@kernel.org, gregkh@suse.de Subject: Re: [PATCH] writeback: Fix broken sync writeback Message-ID: <20100223032317.GG23832@thunk.org> Mail-Followup-To: tytso@mit.edu, Dave Chinner , Jan Kara , Linus Torvalds , Jens Axboe , Linux Kernel , jengelh@medozas.de, stable@kernel.org, gregkh@suse.de References: <20100216230017.GJ3153@quack.suse.cz> <20100217013336.GK3153@quack.suse.cz> <20100217043009.GZ5337@thunk.org> <20100222172938.GA2601@quack.suse.cz> <20100222210112.GE23832@thunk.org> <20100223025350.GC22370@discord.disaster> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100223025350.GC22370@discord.disaster> User-Agent: Mutt/1.5.20 (2009-06-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on thunker.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1537 Lines: 37 On Tue, Feb 23, 2010 at 01:53:50PM +1100, Dave Chinner wrote: > > Ignoring nr_to_write completely can lead to issues like we used to > have with XFS - it would write an entire extent (8GB) at a time and > starve all other writeback. Those starvation problems - which were > very obvious on NFS servers - went away when we trimmed back the > amount to write in a single pass to saner amounts... How do you determine what a "sane amount" is? Is it something that is determined dynamically, or is it a hard-coded or manually tuned value? > As to a generic solution, why do you think I've been advocating > separate per-sb data sync and inode writeback methods that separate > data writeback from inode writeback for so long? ;) Heh. > > This is done to avoid a lock inversion, and so this is an > > ext4-specific thing (at least I don't think XFS's delayed allocation > > has this misfeature). > > Not that I know of, but then again I don't know what inversion ext4 > is trying to avoid. Can you describe the inversion, Ted? The locking order is journal_start_handle (starting a micro transaction in jbd) -> lock_page. A more detailed description of why this locking order is non-trivial for us to fix in ext4 can be found in the description of commit f0e6c985. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/