Received: by 2002:a05:7412:bbc7:b0:fc:a2b0:25d7 with SMTP id kh7csp975090rdb; Fri, 2 Feb 2024 09:26:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IGWlQr5oYQVkfjS1V/rdc3hEVjN0uqrWjvCFf0qxuDlg7Mx1WFTVRzz7F5ej91rhRQBfX5A X-Received: by 2002:a05:620a:261d:b0:783:42dc:f7f2 with SMTP id z29-20020a05620a261d00b0078342dcf7f2mr3546341qko.57.1706894794227; Fri, 02 Feb 2024 09:26:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706894794; cv=pass; d=google.com; s=arc-20160816; b=MWPuKZNIvwi6uvcnWEEqJ05H5KO75/J0Wskr1XPAjmH/UGvOa7GripR/QSClAtb9DU xFSyzvzLaGukPvBNed/CzSy2MgbKsM/oH7ksw18T3nSA2AT1vQaJIBlpm3XojaKXnJCz hZa+/HExc+DjZjZLK9OXGPWlglFFjrG+t4vvkfwOMeAwItIEHNRG6CjZByGJVieQi+qa HJbRB/IcQITBWwOOIx0rHB6O/xX1gqK6jsgaOYcnmbnNpyuK96wWl+/UvKsDorrqNRKq L79jF1YTexPh/zAg/YTmX5rztsOUAA/WzhhWWongJEmLUOEGX3+WG+l3OtHT0BC765Nc 8ksg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=hV1VmMjzXSqfbHSHeybjthsqjK4yMs4iYy/vd4MvGts=; fh=s3zaSiJsTYJ33IxkMJKdSegcEIavs+WNLGhYlr7ctKo=; b=Av5yRuD3AO5QS//+jbiu0BdhK34+7KGeKHnAnXufjZ2jrxnEt0HOvBVdBKfNOs8rB4 tGPhBibnE8vebd3NHHX/VCWIBtOgWuCwHGCsDvvKUZ4xjbsVEXVhd4kc8n3KxgrC8zz3 0XbtQAauB4yvHg7IAfqm7DGbtxsN06fSIwZRkz0HNu52lshsHuoHLtY+2dRVuhXuFMdz VJ3JXkZbv3eC6wzdzkIZGy+qKKlQFj2GtvaZIiHGldjroKimsORlbJxwVKTjywrNa/5E XX8nE+VyFZn+D3OKrG6lcrQbqZ3WXVnaTb25ueVH1gPD5fXvvccabD1HatiXzzBOhzKt IgEQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aqgWYQ71; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-50231-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-50231-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=1; AJvYcCVxB8ye1WXE7t4R05AwNFRTzxvLgq0NwiQTsHiBs8crTzk1pPN8WPoY2MzkxWL1heXyAoaz77Q5WLWRe903ZXip+tss1Ojn83LrpdPdVQ== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id bk24-20020a05620a1a1800b007853f81f244si2700610qkb.22.2024.02.02.09.26.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Feb 2024 09:26:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-50231-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aqgWYQ71; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-50231-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-50231-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 8FF801C278BB for ; Fri, 2 Feb 2024 17:25:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 58C6714A4FD; Fri, 2 Feb 2024 17:25:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aqgWYQ71" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59F651474A1; Fri, 2 Feb 2024 17:25:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706894714; cv=none; b=PrBQuwymyOAUrWxb1lPGD0Cz2sQVhWMY8yF5x7mnucexBo36a3ABiyiAzyqknKzq6Y81fR1fqc3Cim5TRpCNwMbx8AsJp7CfCcsBLpqJDRUWqpy5IXXKFCyH+buc11J6pKx8lkWCdzB+d03WMBNrWCSnidg2CQEaULifGHY0kdE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706894714; c=relaxed/simple; bh=bL8+5y1uIX4CrzzBVdY7DuATZixLF3DqnJULrlkz1wM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=p/jAMqZ2gC1NYGSJsKEtePGlUR7lzRLVETBTtFHWRugrB6bitnawktBh8YP/SB+nRS8+cBwVi4mKchygPczHsxHIxrjpsOXDVXp/Xv4mIH2/2lI7PywzR1S3ckZIXc243rtjr4+0m6izInU2X8ys2ujH9eDRg4XTVuyedSAbRBc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aqgWYQ71; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB6D9C433C7; Fri, 2 Feb 2024 17:25:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706894713; bh=bL8+5y1uIX4CrzzBVdY7DuATZixLF3DqnJULrlkz1wM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=aqgWYQ71rrPA7I2TRmZsG4KbtYfiJdPSxEhJnkIU2aj2dCrnU/dp5Bcrd19lg1k1R 33jQmpAZRi/AiqU+mTu4vtiT5lIVa2prncpU5vZTGur8bbuy0EVxuuZER9Mqni8/8o kEHJ3A5LVvP0J8Hc2P1X/iNq6jwGaqW8HLQWkTK0K23lOm4bG/st4iRPSynEHr4zYt CQeQb/y65oxX8HXqIBkT0PjQSx4raNGWO80h2KtcaFc2nS5egVSLoZ5/ndr/U3A8NB CxKdXl7MZj61N7F5qg+9PelJnjoD/aCnJdLa7Q5tOXCs74fTw7YxZnmg+K2PTGaQDv lTXkgpLo6JfMA== Date: Fri, 2 Feb 2024 09:25:13 -0800 From: "Darrick J. Wong" To: John Garry Cc: hch@lst.de, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz, chandan.babu@oracle.com, martin.petersen@oracle.com, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, ojaswin@linux.ibm.com Subject: Re: [PATCH 1/6] fs: iomap: Atomic write support Message-ID: <20240202172513.GZ6226@frogsfrogsfrogs> References: <20240124142645.9334-1-john.g.garry@oracle.com> <20240124142645.9334-2-john.g.garry@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240124142645.9334-2-john.g.garry@oracle.com> On Wed, Jan 24, 2024 at 02:26:40PM +0000, John Garry wrote: > Add flag IOMAP_ATOMIC_WRITE to indicate to the FS that an atomic write > bio is being created and all the rules there need to be followed. > > It is the task of the FS iomap iter callbacks to ensure that the mapping > created adheres to those rules, like size is power-of-2, is at a > naturally-aligned offset, etc. However, checking for a single iovec, i.e. > iter type is ubuf, is done in __iomap_dio_rw(). > > A write should only produce a single bio, so error when it doesn't. > > Signed-off-by: John Garry > --- > fs/iomap/direct-io.c | 21 ++++++++++++++++++++- > fs/iomap/trace.h | 3 ++- > include/linux/iomap.h | 1 + > 3 files changed, 23 insertions(+), 2 deletions(-) > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index bcd3f8cf5ea4..25736d01b857 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -275,10 +275,12 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, > static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > struct iomap_dio *dio) > { > + bool atomic_write = iter->flags & IOMAP_ATOMIC; > const struct iomap *iomap = &iter->iomap; > struct inode *inode = iter->inode; > unsigned int fs_block_size = i_blocksize(inode), pad; > loff_t length = iomap_length(iter); > + const size_t iter_len = iter->len; > loff_t pos = iter->pos; > blk_opf_t bio_opf; > struct bio *bio; > @@ -381,6 +383,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > GFP_KERNEL); > bio->bi_iter.bi_sector = iomap_sector(iomap, pos); > bio->bi_ioprio = dio->iocb->ki_ioprio; > + if (atomic_write) > + bio->bi_opf |= REQ_ATOMIC; This really ought to be in iomap_dio_bio_opflags. Unless you can't pass REQ_ATOMIC to bio_alloc*, in which case there ought to be a comment about why. Also, what's the meaning of REQ_OP_READ | REQ_ATOMIC? Does that actually work? I don't know what that means, and "block: Add REQ_ATOMIC flag" says that's not a valid combination. I'll complain about this more below. > + > bio->bi_private = dio; > bio->bi_end_io = iomap_dio_bio_end_io; > > @@ -397,6 +402,12 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > } > > n = bio->bi_iter.bi_size; > + if (atomic_write && n != iter_len) { s/iter_len/orig_len/ ? > + /* This bio should have covered the complete length */ > + ret = -EINVAL; > + bio_put(bio); > + goto out; > + } > if (dio->flags & IOMAP_DIO_WRITE) { > task_io_account_write(n); > } else { > @@ -554,12 +565,17 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > struct blk_plug plug; > struct iomap_dio *dio; > loff_t ret = 0; > + bool is_read = iov_iter_rw(iter) == READ; > + bool atomic_write = (iocb->ki_flags & IOCB_ATOMIC) && !is_read; Hrmm. So if the caller passes in an IOCB_ATOMIC iocb with a READ iter, we'll silently drop IOCB_ATOMIC and do the read anyway? That seems like a nonsense combination, but is that ok for some reason? > trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before); > > if (!iomi.len) > return NULL; > > + if (atomic_write && !iter_is_ubuf(iter)) > + return ERR_PTR(-EINVAL); Does !iter_is_ubuf actually happen? Why don't we support any of the other ITER_ types? Is it because hardware doesn't want vectored buffers? I really wish there was more commenting on /why/ we do things here: if (iocb->ki_flags & IOCB_ATOMIC) { /* atomic reads do not make sense */ if (iov_iter_rw(iter) == READ) return ERR_PTR(-EINVAL); /* * block layer doesn't want to handle handle vectors of * buffers when performing an atomic write i guess? */ if (!iter_is_ubuf(iter)) return ERR_PTR(-EINVAL); iomi.flags |= IOMAP_ATOMIC; } > + > dio = kmalloc(sizeof(*dio), GFP_KERNEL); > if (!dio) > return ERR_PTR(-ENOMEM); > @@ -579,7 +595,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (iocb->ki_flags & IOCB_NOWAIT) > iomi.flags |= IOMAP_NOWAIT; > > - if (iov_iter_rw(iter) == READ) { > + if (is_read) { > /* reads can always complete inline */ > dio->flags |= IOMAP_DIO_INLINE_COMP; > > @@ -605,6 +621,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (iocb->ki_flags & IOCB_DIO_CALLER_COMP) > dio->flags |= IOMAP_DIO_CALLER_COMP; > > + if (atomic_write) > + iomi.flags |= IOMAP_ATOMIC; > + > if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) { > ret = -EAGAIN; > if (iomi.pos >= dio->i_size || > diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h > index c16fd55f5595..c95576420bca 100644 > --- a/fs/iomap/trace.h > +++ b/fs/iomap/trace.h > @@ -98,7 +98,8 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued); > { IOMAP_REPORT, "REPORT" }, \ > { IOMAP_FAULT, "FAULT" }, \ > { IOMAP_DIRECT, "DIRECT" }, \ > - { IOMAP_NOWAIT, "NOWAIT" } > + { IOMAP_NOWAIT, "NOWAIT" }, \ > + { IOMAP_ATOMIC, "ATOMIC" } > > #define IOMAP_F_FLAGS_STRINGS \ > { IOMAP_F_NEW, "NEW" }, \ > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > index 96dd0acbba44..9eac704a0d6f 100644 > --- a/include/linux/iomap.h > +++ b/include/linux/iomap.h > @@ -178,6 +178,7 @@ struct iomap_folio_ops { > #else > #define IOMAP_DAX 0 > #endif /* CONFIG_FS_DAX */ > +#define IOMAP_ATOMIC (1 << 9) > > struct iomap_ops { > /* > -- > 2.31.1 > >