Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp2139387rdb; Mon, 9 Oct 2023 14:06:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGtTop5VP0yhu1TCQdV+WXTqgtiDTx4kVOvSB3/6L+ptL16KZ9yp4ueb6V5/rSsgUTHFzDj X-Received: by 2002:a17:90a:74cc:b0:267:ffcf:e9e3 with SMTP id p12-20020a17090a74cc00b00267ffcfe9e3mr15694084pjl.46.1696885611399; Mon, 09 Oct 2023 14:06:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696885611; cv=none; d=google.com; s=arc-20160816; b=PeLYoyVzMbclKu0LDJvsr5RWGIIzPptC818xpNGD5m5InprJJzrZOpVUHQe9VAshtl jOfwnckCjxcRJ/MjqfaiavXukqyzKkyQsETBP+47XjuYsJt5QqEYTGRzQm/MDpMFpJB6 D4KyMbGlDRNDrx0UfEBFAU4Z4UmXGOxYujZOWaBvB4Kv+3BRHDzrCEiZO1pPDLnLfWcQ O+IPRr8tSt81PlII87Gjhzzv21cnUAsgUymJz1nTEWxvrY6uF1Q4UjW02odxB98MGuC8 Kxi3dmRSWfosyoC6DLcKTy22DmE2E2V6eeZ+ZDv082Xh50zGyjA3jp58FGNxj380rBgS Wc2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=TQlG9arozV41TeOeQi0bx41D2o2routKbdqS0t79eFM=; fh=+egdf8eBi2PZXvQyvwrSa1jmQzydR6P7oZSDAkgQ4xI=; b=prwDdyRneL34WI/mFedvD20y1CvzLYEnjzuojybxvxlbDkX/H4E4CMly+oxAgQmhkn F1ko7hyPIUSvDQxOI1j/LJNLPgbdSOLqfodTc2k49wqg4CpojhjHT5BjQTQ2DVMNBHo0 ySXHuT8vqSCZTivCRlYMbdmI7jFmjkFl+oL/WYRhjSZoQG090ho1p6STHEHxhavFPVGu yps7bWKOKt3Wj5TQnn5VIJgTSevKPQpO/YOSd67YJA9hngFnO9qEG+0nbPnwoo5fAzWQ lBOvv9pNBKI2F4WpE+6QD2Ik+uHyveX0HVRPXnd3WgEvf2dTo3jmdy9MlvAX7rk2gZw9 xO7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=T3qbRRq1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id hk10-20020a17090b224a00b002639acf55c7si13536818pjb.7.2023.10.09.14.06.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 14:06:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=T3qbRRq1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id DB0C68069D9D; Mon, 9 Oct 2023 14:06:04 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378221AbjJIVFi (ORCPT + 99 others); Mon, 9 Oct 2023 17:05:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378201AbjJIVFf (ORCPT ); Mon, 9 Oct 2023 17:05:35 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7C4AAF; Mon, 9 Oct 2023 14:05:32 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74EB5C433C8; Mon, 9 Oct 2023 21:05:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1696885532; bh=FZhRi5JSh0p/d2usQybblt8GOkPVFT7Pmxw3bLtebBs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=T3qbRRq16OIPybsmh/yi+PWNwCeO1ieVJAqSU8vLBx8fVcFUkiqE8Sag9LIvWVzLe 2IcwZnN8TBi8hj/vST0lqnJhIb0XyPybZuxDX3XLs5bUS9Su/JGkEt17yQmr+Oy+yF msi6r4boWZ90461u3nQrBvYTudl8BXWfPz0K0rF6eQEEl8YOPf0Ryah3cHYQnli8y7 3CyL4hEMkP0OnGP63EZ4A5ibteVsrGljshMdeUaDDIK+pCMG14RcsW39fE756mNMT5 OVB8YIQrBl6yOfCDudxsCtGQ6nshQRyVkRc91eGPA8PO2gtP85g9GXTpexJRJpOE5J +VQ8KfVDglBHg== Date: Mon, 9 Oct 2023 14:05:31 -0700 From: "Darrick J. Wong" To: Dave Chinner Cc: John Garry , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, martin.petersen@oracle.com, himanshu.madhani@oracle.com Subject: Re: [PATCH 2/4] readv.2: Document RWF_ATOMIC flag Message-ID: <20231009210531.GB214073@frogsfrogsfrogs> References: <20230929093717.2972367-1-john.g.garry@oracle.com> <20230929093717.2972367-3-john.g.garry@oracle.com> <20231009174438.GE21283@frogsfrogsfrogs> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 09 Oct 2023 14:06:05 -0700 (PDT) On Tue, Oct 10, 2023 at 07:39:17AM +1100, Dave Chinner wrote: > On Mon, Oct 09, 2023 at 10:44:38AM -0700, Darrick J. Wong wrote: > > On Fri, Sep 29, 2023 at 09:37:15AM +0000, John Garry wrote: > > > From: Himanshu Madhani > > > > > > Add RWF_ATOMIC flag description for pwritev2(). > > > > > > Signed-off-by: Himanshu Madhani > > > #jpg: complete rewrite > > > Signed-off-by: John Garry > > > --- > > > man2/readv.2 | 45 +++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 45 insertions(+) > .... > > > +For when regular files are opened with > > > +.BR open (2) > > > +but without > > > +.B O_SYNC > > > +or > > > +.B O_DSYNC > > > +and the > > > +.BR pwritev2() > > > +call is made without > > > +.B RWF_SYNC > > > +or > > > +.BR RWF_DSYNC > > > +set, the range metadata must already be flushed to storage and the data range > > > +must not be in unwritten state, shared, a preallocation, or a hole. > > > > I think that we can drop all of these flags requirements, since the > > contiguous small space allocation requirement means that the fs can > > provide all-or-nothing writes even if metadata updates are needed: > > > > If the file range is allocated and marked unwritten (i.e. a > > preallocation), the ioend will clear the unwritten bit from the file > > mapping atomically. After a crash, the application sees either zeroes > > or all the data that was written. > > > > If the file range is shared, the ioend will map the COW staging extent > > into the file atomically. After a crash, the application sees either > > the old contents from the old blocks, or the new contents from the new > > blocks. > > > > If the file range is a sparse hole, the directio setup will allocate > > space and create an unwritten mapping before issuing the write bio. The > > rest of the process works the same as preallocations and has the same > > behaviors. > > > > If the file range is allocated and was previously written, the write is > > issued and that's all that's needed from the fs. After a crash, reads > > of the storage device produce the old contents or the new contents. > > This is exactly what I explained when reviewing the code that > rejected RWF_ATOMIC without O_DSYNC on metadata dirty inodes. I'm glad we agree. :) John, when you're back from vacation, can we get rid of this language and all those checks under _is_dsync() in the iomap patch? (That code is 100% the result of me handwaving and bellyaching 6 months ago when the team was trying to get all the atomic writes bits working prior to LSF and I was too burned out to think the xfs part through. As a result, I decided that we'd only support strict overwrites for the first iteration.) > > Summarizing: > > > > An (ATOMIC|SYNC) request provides the strongest guarantees (data > > will not be torn, and all file metadata updates are persisted before > > the write is returned to userspace. Programs see either the old data or > > the new data, even if there's a crash. > > > > (ATOMIC|DSYNC) is less strong -- data will not be torn, and any file > > updates for just that region are persisted before the write is returned. > > > > (ATOMIC) is the least strong -- data will not be torn. Neither the > > filesystem nor the device make guarantees that anything ended up on > > stable storage, but if it does, programs see either the old data or the > > new data. > > Yup, that makes sense to me. Perhaps this ^^ is what we should be documenting here. > > Maybe we should rename the whole UAPI s/atomic/untorn/... > > Perhaps, though "torn writes" is nomenclature that nobody outside > storage and filesystem developers really knows about. All I ever > hear from userspace developers is "we want atomic/all-or-nothing > data writes"... Fair 'enuf. --D > -Dave. > -- > Dave Chinner > david@fromorbit.com