Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp30914rdb; Thu, 5 Oct 2023 15:36:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGSTsM1Ta4I5R4YV/hzR9aCqZkcHjy2KnkF7gpbkLuG19sKEXFEGBhXhkA+MmMKLXeAE2hM X-Received: by 2002:a05:6a21:7906:b0:14c:4deb:7120 with SMTP id bg6-20020a056a21790600b0014c4deb7120mr5682829pzc.62.1696545403031; Thu, 05 Oct 2023 15:36:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696545403; cv=none; d=google.com; s=arc-20160816; b=CzG6UwdZqnqT98FeDebhk0r9u45Ukfwl2VpMHQ0X8xCeaVVPVdpI+Z+OEumX8NBuDS n8/7JfuMMEO+7tfOBfRzxQWpSzFCKRcpohAI9s+yvolr8tsyG9DG70GvWAkiQoHQkxoL QtT3xT4HDLzAkd7GdnpumiM/EN+SB7UwkSlbka8YJNZrsd8BR0/SFxRjQwz1ykEITXXh R2eUfD3wqDe5c3Kpm5wMUUqYgfUjg5nGLcVyfrH8kBkgs46fc4noJaSl/VVrrlhS3tS/ EgXNiHmbyHE+iNVBFjG9skxQiCjlmannz74KmWykYpjlISSExo9mZme8k5WBuDtcnLYy vNVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=b2tQ3RBiSUcSSCDIGm1AqxO9GDksjubSbZBBLyz+/c8=; fh=3/poCUNwWzvlD+VkllB01LLtNEHk7zUmpplaSRThMD4=; b=c9Ty8Qz9Xdiw2+8WMewDjhWYpvTjIzoTGRjXF+YWNpjHbsEstsBKzTWwJpyhkslO4v +njL44t2iDUb5c7Jr0TUOaScuPvNLk/XQRpk2+3PZJzxItYWO5oRB4L5M9zbmAJjOHsZ a3M4SAuN9naVDzwCjn/98EilLUUSca/Yvc0IvDFi7mlbHpznRjGT3MwiRbOoVSjnNywD upCCeS6wnJLAjCws0ln9ZwWL1RhqsN9Y0KxcipL8S/MmeeSmqQxx7NHJVhY1AaeVKYBh 0En/XfsO0RAaKDtnJTX1IMpXJ/EtdQVPWG4fgVyDrVLyLVJAnmW9mlZAYEc3DRcv3ccM /kwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=REhiCFVh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id ct27-20020a056a000f9b00b006933c9f9f20si170858pfb.225.2023.10.05.15.36.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 15:36:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=REhiCFVh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 8104680CB15B; Thu, 5 Oct 2023 15:36:40 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229865AbjJEWgb (ORCPT + 99 others); Thu, 5 Oct 2023 18:36:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229530AbjJEWg3 (ORCPT ); Thu, 5 Oct 2023 18:36:29 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15663E4 for ; Thu, 5 Oct 2023 15:36:28 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-578e33b6fb7so1055643a12.3 for ; Thu, 05 Oct 2023 15:36:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1696545387; x=1697150187; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=b2tQ3RBiSUcSSCDIGm1AqxO9GDksjubSbZBBLyz+/c8=; b=REhiCFVhABsJ2q1jPxHObTX6y84Tz2iTSR6iSkdrBM8Hsi+tc+MX6HtHS+ktP+h56f CvWqHZsqOaXtYLJ1QqEDS2JjQrJ7seEe4+D9rYAvbq5mcoBtjcnc1i/Q3k+IWcDgbAMh 92TFvO2l/kNKi566QIqjeD+fdcKq7MHxuhbrstbYcm+q/Qq7ifN955l5YyUQkDv+QUlk N8A/gM1KmbpnupgOVyklj9vtP5OqW9RSxFLf3uN06GXn4aWhf6OVjziy55LbjINGimYD TSInO6f0KTTlV8D6BMo9rgT2MOUGrFtGNxTTfGQCKEVh0q0oum7k5oADeCJWQEos65bH ySwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696545387; x=1697150187; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=b2tQ3RBiSUcSSCDIGm1AqxO9GDksjubSbZBBLyz+/c8=; b=cac+AIlSchbMJPUtNjH/JGq2LaFZh4WAET/yubGzvIltEnvVxR+T6t423Zo2AuD5vn T6I+8JSkQpJuSQQ6sVPhzZUwTtvEi+GMxtLS+FChsDrX4okQHIm/yFouZHNZezwMViS2 kTrTJn/79awhAESOqJesia8m7s6AknVJi3m4egELI2HqmBK+CWgmnr2uKNiJh5UbF6gR rSt5bNBwn0dLRmn54DqH1kE2IRxKyTFPBk5HHC5xdxQDzs+w58jMo3cLiYjMZQX2dVdG 0V9tyJMkQZ88+XXinVrTus3Ulc0C9sE24FWFbmKg3tYa/RsAxsaoTs9pmMGjEGiIh3Wc Q8Zw== X-Gm-Message-State: AOJu0Yz2bRujw/LpMuZDyrJn/q2scZLEeaAXMclMsDsvSufdLar4T0V6 fvB3xm7O/qk3EwCmHfR4ML+fR5kIBUjbtHVCZs8= X-Received: by 2002:a17:90a:2cc4:b0:271:7cd6:165d with SMTP id n62-20020a17090a2cc400b002717cd6165dmr6401433pjd.26.1696545387428; Thu, 05 Oct 2023 15:36:27 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id 30-20020a17090a005e00b002776350b50dsm4406336pjb.29.2023.10.05.15.36.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 15:36:26 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qoWxD-009xqX-3D; Fri, 06 Oct 2023 09:36:24 +1100 Date: Fri, 6 Oct 2023 09:36:23 +1100 From: Dave Chinner To: Bart Van Assche Cc: "Martin K. Petersen" , John Garry , axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, chandan.babu@oracle.com, dchinner@redhat.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-api@vger.kernel.org Subject: Re: [PATCH 10/21] block: Add fops atomic write support Message-ID: References: <20230929102726.2985188-11-john.g.garry@oracle.com> <17ee1669-5830-4ead-888d-a6a4624b638a@acm.org> <5d26fa3b-ec34-bc39-ecfe-4616a04977ca@oracle.com> <34c08488-a288-45f9-a28f-a514a408541d@acm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 05 Oct 2023 15:36:40 -0700 (PDT) On Thu, Oct 05, 2023 at 10:10:45AM -0700, Bart Van Assche wrote: > On 10/4/23 11:17, Martin K. Petersen wrote: > > > > Hi Bart! > > > > > In other words, also for the above example it is guaranteed that > > > writes of a single logical block (512 bytes) are atomic, no matter > > > what value is reported as the ATOMIC TRANSFER LENGTH GRANULARITY. > > > > There is no formal guarantee that a disk drive sector read-modify-write > > operation results in a readable sector after a power failure. We have > > definitely seen blocks being mangled in the field. > > Aren't block devices expected to use a capacitor that provides enough > power to handle power failures cleanly? Nope. Any block device that says it operates in writeback cache mode (i.e. almost every single consumer SATA and NVMe drive ever made) has a volatile write back cache and so does not provide any power fail data integrity guarantees. Simple to check, my less-than-1-yr-old workstation tells me: $ lspci |grep -i nvme 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 $ cat /sys/block/nvme*n1/queue/write_cache write back write back $ That they have volatile writeback caches.... > How about blacklisting block devices that mangle blocks if a power > failure occurs? I think such block devices are not compatible with > journaling filesystems nor with log-structured filesystems. Statements like this from people working on storage hardware really worry me. It demonstrates a lack of understanding of how filesystems actually work, not to mention the fact that this architectural problem (i.e. handling volatile device write caches correctly) was solved in the Linux IO stack a couple of decades ago. This isn't even 'state of the art' knowledge - this is foundational knowlege that everyone working on storage should know. The tl;dr summary is that filesystems will issue a cache flush request (REQ_PREFLUSH) and/or write-through to stable storage semantics (REQ_FUA) for any data, metadata or journal IO that has data integrity and/or ordering requirements associated with it. The block layer will then do the most optimal correct thing with that request (e.g. ignore them for IO being directed at WC disabled devices), but it guarantees the flush/fua semantics for those IOs will be provided by all layers in the stack right down to the persistent storage media itself. Hence all the filesystem has to do is get it's IO and cache flush ordering correct, and everything just works regardless of the underlying storage capabilities. And, yes, any storage device with volatile caches that doesn't implement cache flushes correctly is considered broken and will get black listed.... -Dave. -- Dave Chinner david@fromorbit.com