Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964930AbbELXzX (ORCPT ); Tue, 12 May 2015 19:55:23 -0400 Received: from imap.thunk.org ([74.207.234.97]:60724 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964793AbbELXzV (ORCPT ); Tue, 12 May 2015 19:55:21 -0400 Date: Tue, 12 May 2015 19:55:00 -0400 From: "Theodore Ts'o" To: David Lang Cc: Daniel Phillips , Howard Chu , Dave Chinner , linux-kernel@vger.kernel.org, Mike Galbraith , Pavel Machek , tux3@tux3.org, linux-fsdevel@vger.kernel.org, OGAWA Hirofumi Subject: Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?) Message-ID: <20150512235500.GF18150@thunk.org> Mail-Followup-To: Theodore Ts'o , David Lang , Daniel Phillips , Howard Chu , Dave Chinner , linux-kernel@vger.kernel.org, Mike Galbraith , Pavel Machek , tux3@tux3.org, linux-fsdevel@vger.kernel.org, OGAWA Hirofumi References: <20150511221223.GD4434@amd> <20150511231714.GD14088@thunk.org> <555166BA.1050606@phunq.net> <20150512053842.GH15721@dastard> <55519B49.8040605@phunq.net> <555268A2.4090203@phunq.net> <55527E5E.1010100@phunq.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2728 Lines: 55 On Tue, May 12, 2015 at 03:35:43PM -0700, David Lang wrote: > > I happen to think that it's correct. It's not that Ext4 isn't tested, but > that people's expectations of how much it's been tested, and at what scale > don't match the reality. Ext4 is used at Google, on a very large number of disks. Exactly how large is not something I'm allowed to say, but there's a very amusing Ted Talk by Randall Munroe (of xkcd fame) on that topic: http://tedsummaries.com/2014/05/14/randall-munroe-comics-that-ask-what-if/ One thing I can say is that shortly after we deployed ext4 at Google, thanks to having a very large number of disks, and because we have very good system monitoring, we detected a file system corruption problem that happened with a very low probability, but we had enough disks that we could detect the pattern. (Fortunately, because Google's cluster file system has replication and/or erasure coding, no user data was lost.) Even though we could notice the problem, it took us several months to track down the problem. When we finally did, it turned out to be a race condition which only took place under high memory pressure. What was *very* amusing was after fixing the problem for ext4, I looked at ext3, and discovered that (a) the ext4 had inerited the bug was also in ext3, and (b) the bug in ext3 had not been noticed in several enterprise distribution testing runs done by Red Hat, SuSE, and IBM --- for well over a **decade**. What this means is that it's hard for *any* file system to be that well tested; it's hard to substitute for years and years of production use, hopefully in systems that have very rigorous monitoring so you would notice if data or file system metadata is getting corrupted in ways that can't be explained as hardware errors. The fact that we found a bug that was never discovered in ext3 after years and years of use in many enterprises is a testimony to that fact. (This is also why the fact that Facebook has started using btrfs in production is going to be a very good thing for btrfs. I'm sure they will find all sorts of problems once they start running at large scale, which is a _good_ thing; that's how those problems get fixed.) Of course, using xfstests certainly helps a lot, and so in my opinion all serious file system developers should be regularly using xfstests as a part of the daily development cycle, and to be be extremely ruthless about not allowing any test regressions. Best regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/