Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 16 Nov 2001 07:29:19 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 16 Nov 2001 07:29:09 -0500 Received: from chunnel.redhat.com ([199.183.24.220]:23033 "EHLO sisko.scot.redhat.com") by vger.kernel.org with ESMTP id ; Fri, 16 Nov 2001 07:29:03 -0500 Date: Fri, 16 Nov 2001 12:28:55 +0000 From: "Stephen C. Tweedie" To: Jeff Garzik Cc: "Stephen C. Tweedie" , Andrew Morton , lkml , Neil Brown Subject: Re: synchronous mounts Message-ID: <20011116122855.C2389@redhat.com> In-Reply-To: <3BF376EC.EA9B03C8@zip.com.au> <20011115214525.C14221@redhat.com> <3BF45B9F.DEE1076B@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3BF45B9F.DEE1076B@mandrakesoft.com>; from jgarzik@mandrakesoft.com on Thu, Nov 15, 2001 at 07:19:43PM -0500 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Thu, Nov 15, 2001 at 07:19:43PM -0500, Jeff Garzik wrote: > When working on something likely to crash, I always remount my > filesystems 'sync' with the intention to have the kernel immediately > sync to disk anything and everything it is coded to do. The kernel has, in my memory, never behaved like that on sync mounts. mount -o sync was always intended just to give people the BSD-style sync metadata updates that some users expected. The "mount" man page is wrong on this one. > Since the > kernel is responsible to flushing data to disk, it makes perfect sense > to have an option to sync not only metadata but data to disk > immediately, if the user desires such. If you want to sync _everything_, it's at least 5 seeks per write syscall when you're writing a new file: superblock, group descriptor, block bitmap, inode, data, and potentially inode indirect. There's no point doing all that, especially since some of that data is redundant and will be rebuilt by e2fsck anyway after a crash. Is it really such an important feature that we're willing to suffer a factor-of-100 or more slowdown for it? > Further, expecting all apps to fsync(2) files under the right > circumstances is not reasonable. There are "normal" circumstances where > someone expects non-syncing behavior of "cat foo bar > foobar", and then > there are extentuating circumstances where another expects the shell to > sync that command immediately. Should we rewrite cat/bash/apps to all > fsync, depending on an option? Should we expect people to modify all > their shell scripts to include "/bin/sync" for those times when they > want data-sync? Such is not scalable at all. Not-scalable is doing 5000 seeks to write a 4MB file. The behaviour you are talking about now, "cat foo bar > foobar" and expecting it to be intact on return, is *not the same thing*. The sync mount option is there to order metadata writes for predictable recovery of the directory structure. In the "cat" case, nobody cares what the inode is like during the write. All that is desired in that example is fsync-on-close, and it is insane to implement fsync-on-close by writing every single block of the file synchronously. At ALS, an ext3 user asked why ext3 performance was entirely unusable under mount -o sync (he had a broken config which accidentally set an ext3 mount synchronous), whereas ext2 was OK. I only realised afterwards that this was because of ext3's ordered data writes: whereas ext2 was just syncing the inodes and indirect blocks on write, ext3 was syncing the data too as part of the ordered data guarantees, and performance was totally destroyed by the extra seeks. "sync to keep the fs structures intact" and "sync to keep this file intact" are two totally different things. In the latter case, we only care about the file contents as a whole, so fsync-on-close is far more appropriate. If we want that, lets add it as a new option, but I don't see the benefit in making o- sync do all file data writes 100% synchronously. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/