Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753802AbZCZCwT (ORCPT ); Wed, 25 Mar 2009 22:52:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751436AbZCZCwI (ORCPT ); Wed, 25 Mar 2009 22:52:08 -0400 Received: from srv5.dvmed.net ([207.36.208.214]:48425 "EHLO mail.dvmed.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751135AbZCZCwH (ORCPT ); Wed, 25 Mar 2009 22:52:07 -0400 Message-ID: <49CAEDA7.1080902@garzik.org> Date: Wed, 25 Mar 2009 22:51:19 -0400 From: Jeff Garzik User-Agent: Thunderbird 2.0.0.21 (X11/20090320) MIME-Version: 1.0 To: Kyle Moffett CC: Matthew Garrett , Theodore Tso , Christoph Hellwig , Linus Torvalds , Jan Kara , Andrew Morton , Ingo Molnar , Alan Cox , Arjan van de Ven , Peter Zijlstra , Nick Piggin , Jens Axboe , David Rees , Jesper Krogh , Linux Kernel Mailing List Subject: Re: Linux 2.6.29 References: <20090324093245.GA22483@elte.hu> <20090324041249.1133efb6.akpm@linux-foundation.org> <20090325123744.GK23439@duck.suse.cz> <20090325150041.GM32307@mit.edu> <20090325185824.GO32307@mit.edu> <20090325194851.GA1617@infradead.org> <20090325215016.GP32307@mit.edu> <20090326021034.GA26559@srcf.ucam.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.4 (----) X-Spam-Report: SpamAssassin version 3.2.5 on srv5.dvmed.net summary: Content analysis details: (-4.4 points, 5.0 required) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2016 Lines: 22 Kyle Moffett wrote: >> Really, the problem is the filesystem interfaces are incomplete. There are plenty of ways to specify a "FLUSH CACHE"-type command for an individual file or for the whole filesystem, but there aren't really any ways for programs to specify barriers (either whole-blockdev or per-LBA-range). An fsync() implies you want to *wait* for the data... there's no way to ask it all to be queued with some ordering constraints. >> Perhaps we ought to add a couple extra open flags, O_BARRIER_BEFORE and O_BARRIER_AFTER, and rename3(), etc functions that take flags arguments? >> Or maybe a new set of syscalls like barrier(file1, file2) and fbarrier(fd1, fd2), which cause all pending changes (perhaps limit to this process?) to the file at fd1 to occur before any successive changes (again limited to this process?) to the file at fd2. >> It seems that rename(oldfile, newfile) with an already-existing newfile should automatically imply barrier(oldfile, newfile) before it occurs, simply because so many programs rely on that. >> In the cross-filesystem case, the fbarrier() might simply fsync(fd1), since that would provide the equivalent guarantee, albeit with possibly significant performance penalties. I can't think of any easy way to prevent one filesystem from syncing writes to a particular file until another filesystem has finished an asynchronous fsync() call. Perhaps a half-way solution would be to asynchronously fsync(fd1) and simply block the next write()/ioctl()/etc on fd2 until the async fsync returns. Then you have just reinvented the transactional userspace API that people often want to replace POSIX API with. Maybe one day they will succeed. But "POSIX API replacement" is an area never short of proposals... :) Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/