Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753183AbZIGNL6 (ORCPT ); Mon, 7 Sep 2009 09:11:58 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752441AbZIGNL5 (ORCPT ); Mon, 7 Sep 2009 09:11:57 -0400 Received: from thunk.org ([69.25.196.29]:44360 "EHLO thunker.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751446AbZIGNL4 (ORCPT ); Mon, 7 Sep 2009 09:11:56 -0400 Date: Mon, 7 Sep 2009 09:10:26 -0400 From: Theodore Tso To: Pavel Machek Cc: Ric Wheeler , Krzysztof Halasa , Christoph Hellwig , Mark Lord , Michael Tokarev , david@lang.hm, NeilBrown , Rob Landley , Florian Weimer , Goswin von Brederlow , kernel list , Andrew Morton , mtk.manpages@gmail.com, rdunlap@xenotime.net, linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org, corbet@lwn.net Subject: Re: wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage Message-ID: <20090907131026.GC32427@mit.edu> Mail-Followup-To: Theodore Tso , Pavel Machek , Ric Wheeler , Krzysztof Halasa , Christoph Hellwig , Mark Lord , Michael Tokarev , david@lang.hm, NeilBrown , Rob Landley , Florian Weimer , Goswin von Brederlow , kernel list , Andrew Morton , mtk.manpages@gmail.com, rdunlap@xenotime.net, linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org, corbet@lwn.net References: <4A9BCDFE.50008@rtr.ca> <20090831132139.GA5425@infradead.org> <4A9F230F.40707@redhat.com> <4A9FA5F2.9090704@redhat.com> <4A9FC9B3.1080809@redhat.com> <4A9FCF6B.1080704@redhat.com> <20090907114534.GP23450@elf.ucw.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090907114534.GP23450@elf.ucw.cz> User-Agent: Mutt/1.5.18 (2008-05-17) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@mit.edu X-SA-Exim-Scanned: No (on thunker.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1774 Lines: 42 On Mon, Sep 07, 2009 at 01:45:34PM +0200, Pavel Machek wrote: > > Yes, but ext3 was designed to handle the partial write (according to > tytso). I'm not sure what made you think that I said that. In practice things usually work out, as a conseuqence of the fact that ext3 uses physical block journaling, but it's not perfect, becase... > > Also, when you enable the write cache (MD or not) you are buffering > > multiple MB's of data that can go away on power loss. Far greater (10x) > > the exposure that the partial RAID rewrite case worries about. > > Yes, that's what barriers are for. Except that they are not there on > MD0/MD5/MD6. They actually work on local sata drives... Yes, but ext3 does not enable barriers by default (the patch has been submitted but akpm has balked because he doesn't like the performance degredation and doesn't believe that Chris Mason's "workload of doom" is a common case). Note though that it is possible for dirty blocks to remain in the track buffer for *minutes* without being written to spinning rust platters without a barrier. See Chris Mason's report of this phenonmenon here: http://lkml.org/lkml/2009/3/30/297 Here's Chris Mason "barrier test" which will corrupt ext3 filesystems 50% of the time after a power drop if the filesystem is mounted with barriers disabled (which is the default; use the mount option barrier=1 to enable barriers): http://lkml.indiana.edu/hypermail/linux/kernel/0805.2/1518.html (Yes, ext4 has barriers enabled by default.) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/