Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758213AbZC3LWi (ORCPT ); Mon, 30 Mar 2009 07:22:38 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755634AbZC3LWY (ORCPT ); Mon, 30 Mar 2009 07:22:24 -0400 Received: from mx2.redhat.com ([66.187.237.31]:51774 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754721AbZC3LWW (ORCPT ); Mon, 30 Mar 2009 07:22:22 -0400 Message-ID: <49D0AA4A.6020308@redhat.com> Date: Mon, 30 Mar 2009 07:17:30 -0400 From: Ric Wheeler User-Agent: Thunderbird 2.0.0.21 (X11/20090320) MIME-Version: 1.0 To: "Andreas T.Auer" CC: Alan Cox , Theodore Tso , Mark Lord , Stefan Richter , Jeff Garzik , Linus Torvalds , Matthew Garrett , Andrew Morton , David Rees , Jesper Krogh , Linux Kernel Mailing List Subject: Re: Linux 2.6.29 References: <49CD7B10.7010601@garzik.org> <49CD891A.7030103@rtr.ca> <49CD9047.4060500@garzik.org> <49CE2633.2000903@s5r6.in-berlin.de> <49CE3186.8090903@garzik.org> <49CE35AE.1080702@s5r6.in-berlin.de> <49CE3F74.6090103@rtr.ca> <20090329231451.GR26138@disturbed> <20090330003948.GA13356@mit.edu> <49D0710A.1030805@ursus.ath.cx> <20090330100546.51907bd2@the-village.bc.nu> <49D0A3D6.4000300@ursus.ath.cx> In-Reply-To: <49D0A3D6.4000300@ursus.ath.cx> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2121 Lines: 55 Andreas T.Auer wrote: > On 30.03.2009 11:05 Alan Cox wrote: > >>> It seems you still didn't get the point. ext3 data=ordered is not the >>> problem. The problem is that the average developer doesn't expect the fs >>> to _re-order_ stuff. This is how most common fs did work long before >>> >>> >> No it isn?t. Standard Unix file systems made no such guarantee and would >> write out data out of order. The disk scheduler would then further >> re-order things. >> >> >> > You surely know that better: Did fs actually write "later" data quite > long before "earlier" data? During the flush data may be re-ordered, but > was it also _done_ outside of it? > People keep forgetting that storage (even on your commodity s-ata class of drives) has very large & volatile cache. The disk firmware can hold writes in that cache as long as it wants, reorder its writes into anything that makes sense and has no explicit ordering promises. This is where the write barrier code comes in - for file systems that care about ordering for data, we use barrier ops to impose the required ordering. In a similar way, fsync() gives applications the power to impose their own ordering. If we assume that we can "save" an fsync cost with ordering mode, we have to keep in mind that the file system will need to do the expensive cache flushes in order to preserve its internal ordering. > >> If you think the ?guarantees? from before ext3 are normal defaults you?ve >> been writing junk code >> >> >> > I'm still on ReiserFS since it was considered stable in some SuSE 7.x. > And I expected it to be fairly ordered, but as a network protocol > programmer I didn't rely on the ordering of fs write-outs yet. > With reiserfs, you will have barriers on by default in SLES/opensuse which will keep (at least fs meta-data) properly ordered.... ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/