Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757613Ab1E3SDC (ORCPT ); Mon, 30 May 2011 14:03:02 -0400 Received: from mail.lang.hm ([64.81.33.126]:60198 "EHLO bifrost.lang.hm" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751757Ab1E3SDB (ORCPT ); Mon, 30 May 2011 14:03:01 -0400 Date: Mon, 30 May 2011 11:02:05 -0700 (PDT) From: david@lang.hm X-X-Sender: dlang@asgard.lang.hm To: "D. Jansen" cc: Theodore Tso , Oliver Neukum , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Dave Chinner , njs@pobox.com, bart@samwel.tk Subject: Re: [rfc] Ignore Fsync Calls in Laptop_Mode In-Reply-To: Message-ID: References: <201105231012.06928.oneukum@suse.de> <20110525000003.GJ32466@dastard> <201105250850.12179.oneukum@suse.de> <410B37BE-E380-40D0-82AA-48B56F389E16@mit.edu> <20110526133155.GH9520@thunk.org> <20110526162138.GN9520@thunk.org> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="680960-162631439-1306778527=:5766" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4000 Lines: 81 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --680960-162631439-1306778527=:5766 Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8BIT On Mon, 30 May 2011, D. Jansen wrote: > On Mon, May 30, 2011 at 3:53 AM, wrote: >> On Sun, 29 May 2011, D. Jansen wrote: >>> On Fri, May 27, 2011 at 4:17 PM, Theodore Tso wrote: >>>> On May 27, 2011, at 3:12 AM, D. Jansen wrote: >>>>> That reordering is exactly what I'm talking about. It wasn't my idea. >>>>> But if I understood it correctly, it's possible that the kernel >>>>> commits writes of an application, _to one and the same file_, in a >>>>> non-FIFO order, if the application does not fsync. And this _afaiu_ >>>>> could result in the loss not only of new data, but complete corruption >>>>> of previously existing data in laptop mode without fsync. >>>> >>>> No, you're not understanding the problem.   All layers of the storage >>>> stack -- including the hard drive -- is allowed to reorder writes.  So >>>> even if the kernel sends data to the disk in the exact same order that >>>> the application wrote it, it could still get written in a different >>>> order, >>>> because the hard drive itself can reorder writes.   This is necessary >>>> for performance; if you didn't have this, the storage stack would be >>>> dog slow, and would consume even more power. >>>> >>>> So at least level, the only thing you can count upon is that if you want >>>> to make sure everything is flushed to stable store, you need to send >>>> an fsync() command at the application to file system level, or a barrier >>>> or flush command at the OS to hard drive level. >>> (...) >>>> Ordering doesn't matter, because nothing, including the hard drive, >>>> guarantees ordering.  What does matter is that the fsync() commands >>>> act like barriers; writes before the fsync() command are guaranteed >>>> to be written to the disk, and survive a reboot, before any writes after >>>> the fsync() are processed.  See? >>> >>> Ok, thanks a lot! I understand a lot better now! >>> So we can't live without the fsyncs. >>> >>> So what if we would queue the fsyncs along with the writes - we would >>> just fsync later instead of immediately, in between the writes as they >>> came in. Then by design previous data could not be corrupted, right? >>> We would do exactly the same thing, just later. >>> It'd be kind of a disk write time distortion field. >> >> the problem is that the spec for fsync says that your program stops until >> fsync finishes. If you don't do that then you will corrupt and loose data. >> >> so if you delay fsync you will have your application (or desktop manager) >> freeze until the fsync completes. > > So that would not be an option. Freezing until the end of the write > window is not what we want. > Neither is ignoring the fsync because that could corrupt data, esp. in > databases like sqlite. >> >> if what you are wanting is the ability to say 'these things must be written >> before these other things to keep them from being corrupted, but I don't >> care when they get written (or if they get lost in a crash)' then what you >> want isn't fsync, it's a barrier. > > That sounds great! > So an fsync call in laptop mode could be interpreted as a barrier > and we would be reasonably save from corrupting old existing data? no, you cannot just change a fsync to a barrier, in some cases the data absolutly needs to be saved, not just ordered (remember the example of a mail server telling the other system that the data can be deleted after a fsync returns) David Lang --680960-162631439-1306778527=:5766-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/