Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754135AbYHYB7q (ORCPT ); Sun, 24 Aug 2008 21:59:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752412AbYHYB7g (ORCPT ); Sun, 24 Aug 2008 21:59:36 -0400 Received: from ipmail01.adl6.internode.on.net ([203.16.214.146]:36202 "EHLO ipmail01.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751752AbYHYB7f (ORCPT ); Sun, 24 Aug 2008 21:59:35 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmkDAMOusUh5LD0wiGdsb2JhbACSLQEBAQ8gn3mBaw X-IronPort-AV: E=Sophos;i="4.32,264,1217773800"; d="scan'208";a="179453348" Date: Mon, 25 Aug 2008 11:59:22 +1000 From: Dave Chinner To: Nick Piggin Cc: gus3 , Szabolcs Szakacsits , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: XFS vs Elevators (was Re: [PATCH RFC] nilfs2: continuous snapshotting file system) Message-ID: <20080825015922.GP5706@disturbed> Mail-Followup-To: Nick Piggin , gus3 , Szabolcs Szakacsits , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com References: <20080821051508.GB5706@disturbed> <200808211933.34565.nickpiggin@yahoo.com.au> <20080821170854.GJ5706@disturbed> <200808221229.11069.nickpiggin@yahoo.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200808221229.11069.nickpiggin@yahoo.com.au> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4764 Lines: 110 On Fri, Aug 22, 2008 at 12:29:10PM +1000, Nick Piggin wrote: > On Friday 22 August 2008 03:08, Dave Chinner wrote: > > On Thu, Aug 21, 2008 at 07:33:34PM +1000, Nick Piggin wrote: > > > > I don't really see it as too complex. If you know how you want the > > > request to be handled, then it should be possible to implement. > > > > That is the problem in a nutshell. Nobody can keep up with all > > the shiny new stuff that is being implemented,let alone the > > subtle behavioural differences that accumulate through such > > change... > > I'm not sure exactly what you mean.. I certainly have not been keeping > up with all the changes here as I'm spending most of my time on other > things lately... > > But from what I see, you've got a fairly good handle on analysing the > elevator behaviour (if only the end result). Only from having to do this analysis over and over again trying to understand what has changed in the elevator that has negated the effect of some previous optimisation.... > So if you were to tell > Jens that "these blocks" need more priority, or not to contribute to > a process's usage quota, etc. then I'm sure improvements could be > made. It's exactly this sort of complexity that is the problem. When the behaviour of such things change, filesystems that are optimised for the previous behaviour are not updated - we're not even aware that the elevator has been changed in some subtle manner that breaks the optimisations that have been done. To keep on top of this, we keep adding new variations and types and expect the filesystems to make best use of them (without documentation) to optimise for certain situations. Example - the new(ish) BIO_META tag that only CFQ understands. I can change the way XFS issues bios to use this tag to make CFQ behave the same way it used to w.r.t. metadata I/O from XFS, but then the deadline and AS will probably regress because they don't understand that tag and still need the old optimisations that just got removed. Ditto for prioritised bio dispatch - CFQ supports it but none of the others do. IOWs, I am left with a choice - optimise for a specific elevator (CFQ) to the detriment of all others (noop, as, deadline), or make the filesystem work best with the simple elevator (noop) and consider the smarter schedulers deficient if they are slower than the noop elevator.... > Or am I completely misunderstanding you? :) You're suggesting that I add complexity to solve the too much complexity problem.... ;) > > > > With the way the elevators have been regressing, > > > > improving and changing behaviour, > > > > > > AFAIK deadline, AS, and noop haven't significantly changed for years. > > > > Yet they've regularly shown performance regressions because other > > stuff has been changing around them, right? > > Is this rhetorical? Because I don't see how *they* could be showing > regular performance regressions. I get private email fairly often asking questions as to why XFS is slower going from, say, 2.6.23 to 2.6.24 and then speeds back up in 2.6.25. I seen a number of cases where the answer to this was that elevator 'x' with XFS in 2.6.x because for some reason it is much, much slower than the others on that workload on that hardware. As seen earlier in this thread, this can be caused by a problem with the hardware, firmware, configuration, driver bugs, etc - there are so many combinations of variables that can cause performance issues that often the only 'macro' level change that you can make to avoid them is to switch schedulers. IOWs, while a specific scheduler has not changed, the code around it has changed sufficiently for a specific elevator to show a regression compared to the otherr elevators..... Basically, the complexity of the interactions between the filesystems, elevators and the storage devices is such that there are transient second order effects occurring that are not reported widely because they are easily worked around by switching elevators. > Deadline literally had its last > behaviour change nearly a year ago, and before that was before > recorded (git) history. > > AS hasn't changed much more frequently, although I will grant that it > and CFS add a lot more complexity. So I would always compare results > with deadline or noop. Which can still change by things like changing merging behaviour. Granted, it is less complex, but still we can have subtle changes having major impact in less commonly run workloads... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/