Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp2128701rdb; Mon, 9 Oct 2023 13:39:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFjfgQXHMY+ZqbvA9afhRH74AdXTUCQbgU8cyUY8ZKM06ltrCUbJkCWX5wxpQYrzPcVfqDA X-Received: by 2002:a05:6808:1411:b0:3ae:2b08:549d with SMTP id w17-20020a056808141100b003ae2b08549dmr21401471oiv.37.1696883983273; Mon, 09 Oct 2023 13:39:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696883983; cv=none; d=google.com; s=arc-20160816; b=b0hDbToLRNUK8m0D86FYjS93xHn7UungKNTdvL5y2FS5Tlthv5HN/vYghi95lyfV/d JiOl3uAILPSqMa6ehMqX8TSHn0OzUMPwinmADHryJgH46tf/Ekl+DwVYdfJ5Gw2sw8E6 jVMHTx9tOAz5lJrM8SoiAVaOOd0QtmjZsIJkMJK1K6rSkS4Ij51R+9kGpOvYI7hla2hJ /uswKq5a0Eul1XKFQ3QxUoo0sRNJlfC/hQ2cufAlZqwDm1VfC1UakdSsJavReKazu6H+ pK5aVm5zDnNdy7vwiRI7oyJAmMAivtaXAk9U7T4YRv5LlUd6On98qfs3TjtDWkU3ynfU 1WPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=6th8j9k2Pe6baZ3T+hOaDAZYRZhrApdfI4UqgTmaNzs=; fh=nTPWdp8acDJXVL8Wi+U26IlMIFxQ27bucOwpCpFWsIo=; b=n23Q4/JZIgJTzy6M4t+S/yovZtLd1pSyUTO2ZJmrZKOGoZYCjDtVDE/qj5MOYZBTpJ 7EVlt9T9SAsv3syzbFvqGHwX7KHrdJECMXVCbDCcSpMb9Ie5hFdqE9dbBY0WrGGmXzPs Sk4HwIQwNR91U5PzIzxrECoaDa3cIe9HBfk8NCUJs6iK4F/4ezd89g/0OspFAVLBh0dm OVxf/dwLfk6zTSCe2WC/WkFrJmSP9SJ/vsHa++9KLhyUioxNv2jVk9uuQtodVuwWWvUR aaFV8x7NAWcNBZUB14sZX7i/bCCQQugzyph2ojCxdsk69nIEmWSyqYE+jhnFDyHXTr+i RgDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=X8IJlrRy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id o15-20020a656a4f000000b00578a28df3e2si12044038pgu.816.2023.10.09.13.39.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 13:39:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=X8IJlrRy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 6A4D180BB532; Mon, 9 Oct 2023 13:39:40 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377942AbjJIUjY (ORCPT + 99 others); Mon, 9 Oct 2023 16:39:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234506AbjJIUjW (ORCPT ); Mon, 9 Oct 2023 16:39:22 -0400 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11E8192 for ; Mon, 9 Oct 2023 13:39:21 -0700 (PDT) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1c5cd27b1acso39177475ad.2 for ; Mon, 09 Oct 2023 13:39:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1696883960; x=1697488760; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=6th8j9k2Pe6baZ3T+hOaDAZYRZhrApdfI4UqgTmaNzs=; b=X8IJlrRygR144Gh5KnY2qPN1xK0Nth5Q2WG67OkObOJUjwHOtFs11P5LPYl8nl3ivZ 4jmuhz87aW36LN5YBQXLWHcW7e9hby50mgZZguTCD4LamktiQr+pY8qiANgiIjNmPtDx 65qImMUZ1tVWwk2Xx/uBNH+iV1ka4y5k+By3wyLu53bK6zrMXA0O33nxBI0hAaRdPx9P qJ2nAhHxgWl++ra2HewQdbq4w5jIOlEOzV8YC6MdiMgjLQrRdeoJu8kuaKsjioAj4aC/ MIkdCE3FcuL6ZLTIsQUdltLmSkOC+bPP9rht05JLXNOda8s1i41msjB19H+Dpg653HxX 0opQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696883960; x=1697488760; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6th8j9k2Pe6baZ3T+hOaDAZYRZhrApdfI4UqgTmaNzs=; b=G8JTQs7yqKIbGZVeRTC92EBlDnU9gwwV++kol++aUlTE+JG9qDWanHuj7OyKEjwDn2 DVXDLgGHZwmpV5/Q52LY7JrhZLv7IavhcBrEee1/BcXYTlCQYEmBwJ+xdbVrN17WSciC dlps7sW5UAPm4RyiM5K02Z057W+H0VC2r++VpmBZoqcss7L9toCk3oqWZ62GXF6judX/ y8/e/z+gySdwSgMkpA343qOALl3cTbMewhBsEAhS6mDs6Z2eUveQONdNTqh4IT9G+uQm G5QNe5jomVTLC90CxagiGjWuorfQ+jRBBm6utXFknnlI22u5Je9IEmXzAmWVkn/uEJbT H9Pw== X-Gm-Message-State: AOJu0YwUZTqA29ZrRlafjR1mBsMGkhQgMdTXNy449E5Ma/Eu/wpVwrAp 0s5klzBjXtOhGkOohsE5cIWUpA== X-Received: by 2002:a17:903:1c8:b0:1c8:a63a:2087 with SMTP id e8-20020a17090301c800b001c8a63a2087mr5691413plh.65.1696883960472; Mon, 09 Oct 2023 13:39:20 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id h19-20020a170902f7d300b001b86492d724sm10072201plw.223.2023.10.09.13.39.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 13:39:19 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qpx25-00Bfxr-22; Tue, 10 Oct 2023 07:39:17 +1100 Date: Tue, 10 Oct 2023 07:39:17 +1100 From: Dave Chinner To: "Darrick J. Wong" Cc: John Garry , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, martin.petersen@oracle.com, himanshu.madhani@oracle.com Subject: Re: [PATCH 2/4] readv.2: Document RWF_ATOMIC flag Message-ID: References: <20230929093717.2972367-1-john.g.garry@oracle.com> <20230929093717.2972367-3-john.g.garry@oracle.com> <20231009174438.GE21283@frogsfrogsfrogs> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231009174438.GE21283@frogsfrogsfrogs> X-Spam-Status: No, score=2.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_SBL_CSS, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 09 Oct 2023 13:39:40 -0700 (PDT) X-Spam-Level: ** On Mon, Oct 09, 2023 at 10:44:38AM -0700, Darrick J. Wong wrote: > On Fri, Sep 29, 2023 at 09:37:15AM +0000, John Garry wrote: > > From: Himanshu Madhani > > > > Add RWF_ATOMIC flag description for pwritev2(). > > > > Signed-off-by: Himanshu Madhani > > #jpg: complete rewrite > > Signed-off-by: John Garry > > --- > > man2/readv.2 | 45 +++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 45 insertions(+) .... > > +For when regular files are opened with > > +.BR open (2) > > +but without > > +.B O_SYNC > > +or > > +.B O_DSYNC > > +and the > > +.BR pwritev2() > > +call is made without > > +.B RWF_SYNC > > +or > > +.BR RWF_DSYNC > > +set, the range metadata must already be flushed to storage and the data range > > +must not be in unwritten state, shared, a preallocation, or a hole. > > I think that we can drop all of these flags requirements, since the > contiguous small space allocation requirement means that the fs can > provide all-or-nothing writes even if metadata updates are needed: > > If the file range is allocated and marked unwritten (i.e. a > preallocation), the ioend will clear the unwritten bit from the file > mapping atomically. After a crash, the application sees either zeroes > or all the data that was written. > > If the file range is shared, the ioend will map the COW staging extent > into the file atomically. After a crash, the application sees either > the old contents from the old blocks, or the new contents from the new > blocks. > > If the file range is a sparse hole, the directio setup will allocate > space and create an unwritten mapping before issuing the write bio. The > rest of the process works the same as preallocations and has the same > behaviors. > > If the file range is allocated and was previously written, the write is > issued and that's all that's needed from the fs. After a crash, reads > of the storage device produce the old contents or the new contents. This is exactly what I explained when reviewing the code that rejected RWF_ATOMIC without O_DSYNC on metadata dirty inodes. > Summarizing: > > An (ATOMIC|SYNC) request provides the strongest guarantees (data > will not be torn, and all file metadata updates are persisted before > the write is returned to userspace. Programs see either the old data or > the new data, even if there's a crash. > > (ATOMIC|DSYNC) is less strong -- data will not be torn, and any file > updates for just that region are persisted before the write is returned. > > (ATOMIC) is the least strong -- data will not be torn. Neither the > filesystem nor the device make guarantees that anything ended up on > stable storage, but if it does, programs see either the old data or the > new data. Yup, that makes sense to me. > Maybe we should rename the whole UAPI s/atomic/untorn/... Perhaps, though "torn writes" is nomenclature that nobody outside storage and filesystem developers really knows about. All I ever hear from userspace developers is "we want atomic/all-or-nothing data writes"... -Dave. -- Dave Chinner david@fromorbit.com