From: Andrew Morton Subject: Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes Date: Sun, 18 May 2008 21:11:40 -0700 Message-ID: <20080518211140.b29bee30.akpm@linux-foundation.org> References: <482DDA56.6000301@redhat.com> <20080516130545.845a3be9.akpm@linux-foundation.org> <482DF44B.50204@redhat.com> <20080516220315.GB15334@shareable.org> <482E08E6.4030507@redhat.com> <8763tbcrbo.fsf@basil.nowhere.org> <20080519004325.GC8335@mit.edu> <4830E60A.2010809@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Theodore Tso , Andi Kleen , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org To: Eric Sandeen Return-path: In-Reply-To: <4830E60A.2010809@redhat.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Sun, 18 May 2008 21:29:30 -0500 Eric Sandeen wrote: > Theodore Tso wrote: > ... > > > Given how rarely people have reported problems, I think it's a really > > good idea to understand what exactly our exposure is for > > $COMMON_HARDWARE. > > I'll propose that very close to 0% of users will ever report "having > barriers off seems to have corrupted my disk on power loss!" even if > that's exactly what happened. And it'd be very tricky to identify in a > post-mortem. Instead we'd probably see other weird things caught down > the road during some later fsck or during filesystem use, and then > suggest that they go check their cables, run memtest86 or something... > > Perhaps it's not the intent of this reply, Ted, but various other bits > of this thread have struck me as trying to rationalize away the problem. Not really. It's a matter of understanding how big the problem is. We know what the cost of the solution is, and it's really large. It's a tradeoff, and it is unobvious where the ideal answer lies, especially when not all the information is available. > If the discussion were about proper locking to avoid corruption, would > we really be saying well, gosh, it's a *really* small window, and > *most* people won't hit it very often, and proper locking would slow > things down.... If it slowed really really important workloads by 30% then we'd be running around with our hair on fire fixing that up. But fixing this one is nowhere near as easy as fixing some locking thing. > So I think that as you suggest, looking for ways to make barriers less > painful is the far better route, rather than sacrificing correctness for > speed by turning them off by default when we know there is a chance for > problems. People running journaling filesystems most likely expect to > be safe from this sort of thing, not most of the time, but all of the time. Well. Reducing the cost would of course make the decision easy.