Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp2887643rdb; Wed, 4 Oct 2023 15:01:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF5HlKFC6SIfwZ0ixhu7Uv402DMoo8cfCPUVak3QMC38bQzC1VbYUpLpLen2TFIWHR5/gcW X-Received: by 2002:a9d:4f0e:0:b0:6bf:1444:966d with SMTP id d14-20020a9d4f0e000000b006bf1444966dmr3367314otl.1.1696456873529; Wed, 04 Oct 2023 15:01:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696456873; cv=none; d=google.com; s=arc-20160816; b=BwSMyRiD6wqG0aJel/Awl1+VgQleS23mROCc/Kgc8LjqXZfna3LlTgdzUaT6BGezvy fgnyfsJ2ttNTTPzQjbG6NmRaSoCCS1MVv04tj6iPFcCMbPtIAwpFArA9vLe6JGCCIyYj X30K6DDY7EroxXVODzBDgTTnIEAwsWDomz3PfHnwY3UrJ1pd1dMyt7fquuuPDuBxQJin MLCjv1JYX+zoOWDvU9qj4q9VgZwy+3mXI9RQUIt2QtPHHUIBCgJQdzo9R8AU5oZqzDuV tTJrM8IiXby3NcMxha+ESrldDhXWVLIulcIWkcWs88NgTzx/p/LCBuq5AqYQUiMs1mI2 WC2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=8zIzwqHlLY+Qfqxn8xlWssJN81SYfup2U1XurU6Xi68=; fh=i3Hv0V6PebGwIl9mgeXD68O23LIq6ObhAjDJdr8vPJU=; b=jTZvmR93ZIbYjB1o/YeXWTdF/qhRRcmYurJRbNrWqBiFg28EMQZ5Rz2mCb4UuxLgJf gVKAqT+0K61TZ4ZKSYXcWEt/Gvus5+0uxrjOwpDGaFamQX992AOep2Rgmc2x8UvrVawv oBrXbkqMXfxqYDifpPPTTBmSrnqdd18sBS/B5rh6VfSf5fup0bXV3KDiRtS1M8m8uWEa +QLaCoWwk2OfEmjC49lUtxEsKuSUJo00WktCDmAbWDwILPjV/aYzaBflekIvat3ADWVe dxX/RZQXOL5etyagdyNZJgOLlh1Ilbjx2X3somSjU1nJnjBul87Hh3VCbl/KnKva/trB Ksxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=EEsBWVzU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id x7-20020a63cc07000000b00578a43e3b0bsi29305pgf.655.2023.10.04.15.01.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:01:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=EEsBWVzU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 63B5F807C863; Wed, 4 Oct 2023 15:00:15 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229916AbjJDWAI (ORCPT + 99 others); Wed, 4 Oct 2023 18:00:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233563AbjJDWAG (ORCPT ); Wed, 4 Oct 2023 18:00:06 -0400 Received: from mail-oo1-xc33.google.com (mail-oo1-xc33.google.com [IPv6:2607:f8b0:4864:20::c33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E0AFDC for ; Wed, 4 Oct 2023 15:00:01 -0700 (PDT) Received: by mail-oo1-xc33.google.com with SMTP id 006d021491bc7-57bbb38d5d4so170303eaf.2 for ; Wed, 04 Oct 2023 15:00:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1696456800; x=1697061600; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=8zIzwqHlLY+Qfqxn8xlWssJN81SYfup2U1XurU6Xi68=; b=EEsBWVzU+YlG+VxpFQzcNAw2kIE3vUf5gEiwamx30JyP1rWhj/pWLF/x85MuAmm7Im Fhc+/iZ3ahszA46KKhd8KPeT640NJ03WDqLcsHiRqKwz5/VWRuZB6sH/HBE/YfQGqgDb ZZ/YaxLUoyI+SpuepUdLc+PydKCmt0FiiDDZEV60sVI/J2iBrSVVcs3UMpcbPmcGJYeN UF/3bNj5pVWbPRPKnuC0eX73dS9kLksZdcRMe1LIN5pEmI+OoymOInzcxOpaWbJOxXeN NYSTvh0x/PY4bdhoILNSCCtJw+rfNX2GorI9aHo5ErbMqkTsxPGLBYQbh6FbURpAYFjv B92A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456800; x=1697061600; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8zIzwqHlLY+Qfqxn8xlWssJN81SYfup2U1XurU6Xi68=; b=P+xwtlApZgitlqQReBglXqHbx3nj2qKm60HVqKrbY1nLzAZeOx7LT0MnBHrUS659e9 VP22kD6kgITVQHLQjRgtBCG/QSpOy6n4PTAEcbvp4zCDs+7yBh8HF0aWn48ZeiaXIS87 YMyq5XNkClOqKYpxNGOIYhJdXkcPFaGH1FVgkw1UwUJFnqvDtVNofE0b0v6M7uAP4ZxX utfr6JgHm1D1/PIjdWM1Dsrlext2Iht8Fo1KTf5O3+yuXKuBZbWDpoEh79nwFpL31eRE OjapCtVTVgQL5NYl3k8KX5N1wVCwnDxqjXUZ9zasFO1l7kFY38euRJoyVR4CgRJ9AmO/ OvZg== X-Gm-Message-State: AOJu0YxSOHdX/9OuzTO5RkPCK0fNHu6rfc24qmwvPL8u1AjP8mZoCI68 F9TbJ1o1QoKpqJIMWakNl8NwgA== X-Received: by 2002:a05:6358:8a2:b0:14b:86a3:b3f0 with SMTP id m34-20020a05635808a200b0014b86a3b3f0mr3841093rwj.5.1696456800389; Wed, 04 Oct 2023 15:00:00 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id 9-20020a17090a018900b00274a43c3414sm2236230pjc.47.2023.10.04.14.59.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 14:59:59 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qo9uO-009W7k-1F; Thu, 05 Oct 2023 08:59:56 +1100 Date: Thu, 5 Oct 2023 08:59:56 +1100 From: Dave Chinner To: Bart Van Assche Cc: John Garry , axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, chandan.babu@oracle.com, dchinner@redhat.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-api@vger.kernel.org Subject: Re: [PATCH 10/21] block: Add fops atomic write support Message-ID: References: <20230929102726.2985188-1-john.g.garry@oracle.com> <20230929102726.2985188-11-john.g.garry@oracle.com> <17ee1669-5830-4ead-888d-a6a4624b638a@acm.org> <5d26fa3b-ec34-bc39-ecfe-4616a04977ca@oracle.com> <1adeff8e-e2fe-7dc3-283e-4979f9bd6adc@oracle.com> <8e2f4aeb-e00e-453a-9658-b1c4ae352084@acm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Wed, 04 Oct 2023 15:00:15 -0700 (PDT) On Wed, Oct 04, 2023 at 10:34:13AM -0700, Bart Van Assche wrote: > On 10/4/23 02:14, John Garry wrote: > > On 03/10/2023 17:45, Bart Van Assche wrote: > > > On 10/3/23 01:37, John Garry wrote: > > > > I don't think that is_power_of_2(write length) is specific to XFS. > > > > > > I think this is specific to XFS. Can you show me the F2FS code that > > > restricts the length of an atomic write to a power of two? I haven't > > > found it. The only power-of-two check that I found in F2FS is the > > > following (maybe I overlooked something): > > > > > > $ git grep -nH is_power fs/f2fs > > > fs/f2fs/super.c:3914:??? if (!is_power_of_2(zone_sectors)) { > > > > Any usecases which we know of requires a power-of-2 block size. > > > > Do you know of a requirement for other sizes? Or are you concerned that > > it is unnecessarily restrictive? > > > > We have to deal with HW features like atomic write boundary and FS > > restrictions like extent and stripe alignment transparent, which are > > almost always powers-of-2, so naturally we would want to work with > > powers-of-2 for atomic write sizes. > > > > The power-of-2 stuff could be dropped if that is what people want. > > However we still want to provide a set of rules to the user to make > > those HW and FS features mentioned transparent to the user. > > Hi John, > > My concern is that the power-of-2 requirements are only needed for > traditional filesystems and not for log-structured filesystems (BTRFS, > F2FS, BCACHEFS). Filesystems that support copy-on-write data (needed for arbitrary filesystem block aligned RWF_ATOMIC support) are not necessarily log structured. For example: XFS. All three of the filesystems you list above still use power-of-2 block sizes for most of their metadata structures and for large data extents. Hence once you go above a certain file size they are going to be doing full power-of-2 block size aligned IO anyway. hence the constraint of atomic writes needing to be power-of-2 block size aligned to avoid RMW cycles doesn't really change for these filesystems. In which case, they can just set their minimum atomic IO size to be the same as their block size (e.g. 4kB) and set the maximum to something they can guarantee gets COW'd in a single atomic transaction. What the hardware can do with REQ_ATOMIC IO is completely irrelevant at this point.... > What I'd like to see is that each filesystem declares its atomic write > requirements (in struct address_space_operations?) and that > blkdev_atomic_write_valid() checks the filesystem-specific atomic write > requirements. That seems unworkable to me - IO constraints propagate from the bottom up, not from the top down. Consider multi-device filesystems (btrfs and XFS), where different devices might have different atomic write parameters. Which set of bdev parameters does the filesystem report to the querying bdev? (And doesn't that question just sound completely wrong?) It also doesn't work for filesystems that can configure extent allocation alignment at an individual inode level (like XFS) - what does the filesystem report to the device when it doesn't know what alignment constraints individual on-disk inodes might be using? That's why statx() vectors through filesystems to all them to set their own parameters based on the inode statx() is being called on. If the filesystem has a native RWF_ATOMIC implementation, it can put it's own parameters in the statx min/max atomic write size fields. If the fs doesn't have it's own native support, but can do physical file offset/LBA alignment, then it publishes the block device atomic support parameters or overrides them with it's internal allocation alignment constraints. If the bdev doesn't support REQ_ATOMIC, the filesystem says "atomic writes are not supported". -Dave. -- Dave Chinner david@fromorbit.com