Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp5403464rdb; Wed, 13 Dec 2023 07:45:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IE121Wkm/nPSzBfKJSCgTFEGUCBEZ1z/nS10Wrb5fkBo6VDnRgBeF7KI2MG/jy+SVEAggWX X-Received: by 2002:a05:6a00:4b85:b0:6ce:f65f:8131 with SMTP id ks5-20020a056a004b8500b006cef65f8131mr3578682pfb.8.1702482353375; Wed, 13 Dec 2023 07:45:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702482353; cv=none; d=google.com; s=arc-20160816; b=SyVl+J4ezF7hmkxvJV0XKJVbJgZIM2ha9x3/L7YPVefcrndVzfVj4DPOj5V0mCIXv2 9NRP70RpfzRy3Nau/o9AnHc1RS8E7h2UfNJcyV0ftHqv+eZLId2yD4QeJ4kwb8mrmb2D TcPQKZ4aomqWjb6iokcR9YlOytkmO87ennwWWtqVsyKK9Ju7mpoJYwWb+9pLB2S5xkYe 8EMhJL+fSvE4rx/R9rnj9Mr/JCOMjS5Le/XmEx3bmJZYq8RBkh8EUIl9rWMcWKzsq0mC Y0X+Q7BCPzwtPtJ+wu0sKLrK44t1T5NM6bCsDZs3/5Tc8poAtEWPOxysgy01rkTbfeDx T0cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Ror5jpKvnBeoTVJlMIHZkXCUWDwuNdZgowR+xALiX+I=; fh=VyqK/KqPE5W8g7EpM+UfYWj6W9IWnwpYoBcnd3c12Q4=; b=pw1ZKZEI5cSizZspQhtkGqyAK4LqIMP5PkcwAYFF1HsPoh9ZMq+5PmHHX9bggGpqIF oMDcIBxPaPtD/sZnUm6Tu09pDznW6876RNcFfQW9udk6y7CamnUnb8EpRFMFTdon6PB3 VnVeiNTFsjk5uADJ4/NAhZo7r9mUrmAJuH79rxTueDnvJmQBouFin/xHcgLcHEG6VUTy AOZ1TASsWI/bxhjWNLLz8TOTdHd52BU+Er+963S/K3d/Sfo+OPjwplo888i+oweEUEXd B+/teLRaF8ygZSEpKvHs0TH97e6OuudoRLubzaubpEg//IBJe+JkYPy1dfPy4zRvMU76 i9uA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id f7-20020a6547c7000000b005c5fdbbaadbsi9851942pgs.588.2023.12.13.07.45.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 07:45:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id B06A7807A5BF; Wed, 13 Dec 2023 07:45:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442208AbjLMPpE (ORCPT + 99 others); Wed, 13 Dec 2023 10:45:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235404AbjLMPov (ORCPT ); Wed, 13 Dec 2023 10:44:51 -0500 Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B27A26AE; Wed, 13 Dec 2023 07:44:16 -0800 (PST) Received: by verein.lst.de (Postfix, from userid 2407) id 3036768B05; Wed, 13 Dec 2023 16:44:09 +0100 (CET) Date: Wed, 13 Dec 2023 16:44:09 +0100 From: Christoph Hellwig To: John Garry Cc: Christoph Hellwig , axboe@kernel.dk, kbusch@kernel.org, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org, ming.lei@redhat.com, jaswin@linux.ibm.com, bvanassche@acm.org Subject: Re: [PATCH v2 00/16] block atomic writes Message-ID: <20231213154409.GA7724@lst.de> References: <20231212110844.19698-1-john.g.garry@oracle.com> <20231212163246.GA24594@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 13 Dec 2023 07:45:19 -0800 (PST) On Wed, Dec 13, 2023 at 09:32:06AM +0000, John Garry wrote: >>> - How to make API extensible for when we have no HW support? In that case, >>> we would prob not have to follow rule of power-of-2 length et al. >>> As a possible solution, maybe we can say that atomic writes are >>> supported for the file via statx, but not set unit_min and max values, >>> and this means that writes need to be just FS block aligned there. >> I don't think the power of two length is much of a problem to be >> honest, and if we every want to lift it we can still do that easily >> by adding a new flag or limit. > > ok, but it would be nice to have some idea on what that flag or limit > change would be. That would require a concrete use case. The simples thing for a file system that can or does log I/O it would simply be a flag waving all the alignment and size requirements. >> I suspect we need an on-disk flag that forces allocations to be >> aligned to the atomic write limit, in some ways similar how the >> XFS rt flag works. You'd need to set it on an empty file, and all >> allocations after that are guaranteed to be properly aligned. > > Hmmm... so how is this different to the XFS forcealign feature? Maybe not much. But that's not what it is about - we need a common API for this and not some XFS internal flag. So if this is something we could support in ext4 as well that would be a good step. And for btrfs you'd probably want to support something like it in nocow mode if people care enough, or always support atomics and write out of place. > For XFS, I thought that your idea was to always CoW new extents for > misaligned extents or writes which spanned multiple extents. Well, that is useful for two things: - atomic writes on hardware that does not support it - atomic writes for bufferd I/O - supporting other sizes / alignments than the strict power of two above. > Right, so we should limit atomic write queue limits to max_hw_sectors. But > people can still tweak max_sectors, and I am inclined to say that > atomic_write_unit_max et al should be (dynamically) limited to max_sectors > also. Allowing people to tweak it seems to be asking for trouble. >> have that silly limit. For NVMe that would require SGL support >> (and some driver changes I've been wanting to make for long where >> we always use SGLs for transfers larger than a single PRP if supported) > > If we could avoid dealing with a virt boundary, then that would be nice. > > Are there any patches yet for the change to always use SGLs for transfers > larger than a single PRP? No. > On a related topic, I am not sure about how - or if we even should - > enforce iovec PAGE-alignment or length; rather, the user could just be > advised that iovecs must be PAGE-aligned and min PAGE length to achieve > atomic_write_unit_max. Anything that just advices the user an it not clear cut and results in an error is data loss waiting to happen. Even more so if it differs from device to device.