Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp883476pxb; Fri, 22 Apr 2022 13:22:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWwrvDr/2JJMjxtMTmWyVmeSTf6Fio+b2b5udv0yQ4N0scXuJAHPsO2DAdD6yBwz4R5Wly X-Received: by 2002:a63:151e:0:b0:39d:7613:9183 with SMTP id v30-20020a63151e000000b0039d76139183mr5339119pgl.52.1650658966314; Fri, 22 Apr 2022 13:22:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650658966; cv=none; d=google.com; s=arc-20160816; b=bbYIfyjSLeiXyJnSBZDgRe79I8d0W5KWELf+NV9YlL6h5LxF7Mt0QfSyhWE7FeYJqU pS3agi/p9KsGpRissTnI+8L1pO9wUFsD5OmjZ9KNW9b6KC3NHjCwXaUst13MVro3EVUL DzVPuFCBq5zutXAtLmVdMpygsuqouni+qTyovnNwuNJDThsWBumzs8eJWHCzPRR0R0jZ 87UxU/5aTABOGPFZZUnCI77xMSfhY04D+QHk8CPsr8fK+7sR/jQUfTDwQqTQZaI7oWZ/ YYUDhY8ksoiPiI2gMoSl3B9p81x8Uv2JumAeGS1ipLNrU9+8Xzz9qLczjbO2gKtRjORh bjjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=W7sdUe6UGI1MUgXeeKMqz/sxi74dvJZ2t/vNjnFVumI=; b=Ta/pEALxKrd98Z6/OqhN6qFFJRhzPHGGHjsBYBgFLLDaibfTOGee00/UYQEsfMowh+ CqZVMn4LgKhbZdFOlZ8DNc1uUXd9SXj70NRtrauZBp4bi+g++uJGT524UCEFTejS5sz+ DHYqVqfbBShS+Df2lF6IJH6Gh3SidbmBP06tmfwocBkFDmjZSMQKofPI9bHhu8abbPSV SWawASaAk6EJLqJVbVmGn5NIOpST6m/94eWkYytE4i0q95M7YfxhfwrzzdPqkfAP8Jp1 SsSxHBtk3xtXsl6ybSrhmlMs93/m9sbYwepZ0B0gM2k0w/UZJu1BkcAggloe7dynb01y 3l8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=HeBf4TLF; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id c10-20020a170903234a00b00158a978a3b5si9788411plh.14.2022.04.22.13.22.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 13:22:46 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=HeBf4TLF; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7227825FFD6; Fri, 22 Apr 2022 12:09:16 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1390001AbiDUPZq (ORCPT + 99 others); Thu, 21 Apr 2022 11:25:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232403AbiDUPZn (ORCPT ); Thu, 21 Apr 2022 11:25:43 -0400 Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA9013F31A for ; Thu, 21 Apr 2022 08:22:52 -0700 (PDT) Received: by mail-ej1-x62f.google.com with SMTP id t11so10700674eju.13 for ; Thu, 21 Apr 2022 08:22:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=W7sdUe6UGI1MUgXeeKMqz/sxi74dvJZ2t/vNjnFVumI=; b=HeBf4TLFuc8OnH6fg36XShaqo8AR2TJDvXu98gOWPK/clgJkG8F9Y8vNtRM+DryVGD KgraMVuUBfRYFuczhKtkSF8Bmi7AZgYZZkitnA5Xbkf1zwUq1majlTqyM9XGWu1N2X8s kuv9T+D4alRQobJ6iWCoe+mwwm2BFWRm8Rbt8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=W7sdUe6UGI1MUgXeeKMqz/sxi74dvJZ2t/vNjnFVumI=; b=ayiIOgui1SsTbLPssgLOabywkUVzW+q5efRiOcgLiNn3aEuuKnlODZ1ZeSX1nSIDai l/wpk8u3fvJsIYvlQ6ldFzfGC8pV7hrZT1vvslLksvP2DokSgxlvtG3QvJiNXEAlW1EK qierNRM49VEOZeEuNT6RBka/fcfuUJQ/bg5DPZn8hZg9FDiOA3sjY5IGnjSceSrpdv8r nD5uQ4qWcIzpcr7eWhXO0ps8mZkN8auTFHQs/6HfGF6S+prOuVjyg8DYguwPI5SyX4d2 L+Ps3DWf+ULuNUCQdJwtkqeMPj+8ILYS+QD0ZmiwZdVQPYVIcmR2SV1mh1i3KkK4SHRk QgBw== X-Gm-Message-State: AOAM532KswPMv10fonwFL7NCJA0iTr9MjK9eUSysmyeyQmx4NeqAtIK4 pPEeqmaR0vIdMG75QYYwJMP99QURJiJc3bYdHqRrqg== X-Received: by 2002:a17:906:280b:b0:6ce:f3c7:688f with SMTP id r11-20020a170906280b00b006cef3c7688fmr68184ejc.468.1650554571384; Thu, 21 Apr 2022 08:22:51 -0700 (PDT) MIME-Version: 1.0 References: <20220408061809.12324-1-dharamhans87@gmail.com> <20220408061809.12324-2-dharamhans87@gmail.com> In-Reply-To: <20220408061809.12324-2-dharamhans87@gmail.com> From: Miklos Szeredi Date: Thu, 21 Apr 2022 17:22:40 +0200 Message-ID: Subject: Re: [PATCH 1/1] FUSE: Allow parallel direct writes on the same file To: Dharmendra Singh Cc: Bernd Schubert , linux-fsdevel@vger.kernel.org, fuse-devel , linux-kernel@vger.kernel.org, Dharmendra Singh Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 8 Apr 2022 at 08:18, Dharmendra Singh wrote: > > As of now, in Fuse, direct writes on the same file are serialized > over inode lock i.e we hold inode lock for the whole duration of > the write request. This serialization works pretty well for the FUSE > user space implementations which rely on this inode lock for their > cache/data integrity etc. But it hurts badly such FUSE implementations > which has their own ways of mainting data/cache integrity and does not > use this serialization at all. > > This patch allows parallel direct writes on the same file with the > help of a flag called FOPEN_PARALLEL_WRITES. If this flag is set on > the file (flag is passed from libfuse to fuse kernel as part of file > open/create), we do not hold inode lock for the whole duration of the > request, instead acquire it only to protect updates on certain fields > of the inode. FUSE implementations which rely on this inode lock can > continue to do so and this is default behaviour. > > Signed-off-by: Dharmendra Singh > --- > fs/fuse/file.c | 38 ++++++++++++++++++++++++++++++++++---- > include/uapi/linux/fuse.h | 2 ++ > 2 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 37eebfb90500..d3e8f44c1228 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -1465,6 +1465,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, > int err = 0; > struct fuse_io_args *ia; > unsigned int max_pages; > + bool p_write = write && > + (ff->open_flags & FOPEN_PARALLEL_WRITES) ? true : false; > > max_pages = iov_iter_npages(iter, fc->max_pages); > ia = fuse_io_alloc(io, max_pages); > @@ -1472,10 +1474,11 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, > return -ENOMEM; > > if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) { > - if (!write) > + /* Parallel write does not come with inode lock held */ > + if (!write || p_write) Probably would be good to add an inode_is_locked() assert in fuse_sync_writes() to make sure we don't miss cases silently. > inode_lock(inode); > fuse_sync_writes(inode); > - if (!write) > + if (!write || p_write) > inode_unlock(inode); > } > > @@ -1568,22 +1571,36 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to) > static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) > { > struct inode *inode = file_inode(iocb->ki_filp); > + struct file *file = iocb->ki_filp; > + struct fuse_file *ff = file->private_data; > struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb); > ssize_t res; > + bool p_write = ff->open_flags & FOPEN_PARALLEL_WRITES ? true : false; > + bool unlock_inode = true; > > /* Don't allow parallel writes to the same file */ > inode_lock(inode); > res = generic_write_checks(iocb, from); I don't think this needs inode lock. At least nfs_file_direct_write() doesn't have it. What it does have, however is taking the inode lock for shared for the actual write operation, which is probably something that fuse needs as well. Also I worry about size extending writes not holding the inode lock exclusive. Would that be a problem in your use case? > if (res > 0) { > + /* Allow parallel writes on the inode by unlocking it */ > + if (p_write) { > + inode_unlock(inode); > + unlock_inode = false; > + } > if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) { > res = fuse_direct_IO(iocb, from); > } else { > res = fuse_direct_io(&io, from, &iocb->ki_pos, > FUSE_DIO_WRITE); > + if (p_write) { > + inode_lock(inode); > + unlock_inode = true; > + } > fuse_write_update_attr(inode, iocb->ki_pos, res); This doesn't need the inode lock either if the actual write wasn't locked. > } > } > - inode_unlock(inode); > + if (unlock_inode) > + inode_unlock(inode); > > return res; > } > @@ -2850,10 +2867,16 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) > size_t count = iov_iter_count(iter), shortened = 0; > loff_t offset = iocb->ki_pos; > struct fuse_io_priv *io; > - > + bool p_write = (iov_iter_rw(iter) == WRITE && > + ff->open_flags & FOPEN_PARALLEL_WRITES); > pos = offset; > inode = file->f_mapping->host; > + > + if (p_write) > + inode_lock(inode); > i_size = i_size_read(inode); Neither this. > + if (p_write) > + inode_unlock(inode); > > if ((iov_iter_rw(iter) == READ) && (offset >= i_size)) > return 0; > @@ -2924,9 +2947,16 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) > kref_put(&io->refcnt, fuse_io_release); > > if (iov_iter_rw(iter) == WRITE) { > + > + if (p_write) > + inode_lock(inode); > + > fuse_write_update_attr(inode, pos, ret); > if (ret < 0 && offset + count > i_size) > fuse_do_truncate(file); > + > + if (p_write) > + inode_unlock(inode); Truncation needs the inode lock, though I'm not completely understanding why this truncation is needed. But for example here it is assumed that file size won't change, which means that non-extending writes should hold inode lock shared and extending writes should hold inode lock exculsive to meet this assumption. Thanks, Miklos