Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4919074imm; Tue, 21 Aug 2018 03:19:46 -0700 (PDT) X-Google-Smtp-Source: AA+uWPyrcyrbTDcMzEEy1siEIQA9sfcmCpy4nvnztXNMK2nXDwrWf0rtSPf/D4AENm5RsfV5wnHa X-Received: by 2002:a17:902:7b97:: with SMTP id w23-v6mr49663019pll.66.1534846786180; Tue, 21 Aug 2018 03:19:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534846786; cv=none; d=google.com; s=arc-20160816; b=nLYoPGsG9oPKlp7rFGlZVcGAdqTs2KFLgwxXMYkmxyFPslQ2L7OWbsnNqJn/iI4f9H UoGeoVsjfK9yZ4rMa2+jMiFqr9QXN9cT+Y+o+52guEYhV5+Pe+lCXgxmjd2KvkxF5IBl W47A2sn35kKK5zqlm/LwLpqF5L5CDk7WB+TjvnjuY8tYdOrJZmD0sfwlhlIYH36DvhzX +nCdItmNrWE29DVW6WIkTM8sxq+Bk5+UfemZrikoBoHsLP4dQaj/HSXt2a25gnxwFnc5 Y7UwlUvDvOAcXqzH6ibDzq4wau0Vx2JZNj6k4durZFoC9Zl3kRKtKHUJncsv4j3fvVv8 /VuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=hx4wWyf5daOks4ss3+2aQFQRosySxBUfiCdyA3gkLb4=; b=sO9Tw5hExNG/KyoR9M0MHtdCHINip5r2K6luQ+WDWSQJ+TTiIYAEz9FJZd9UwZV3db YAvZqrjzXFfNdXT1Qezmxc5jHJXqwOstxo2VpZb7mGa6o3VKithbvBoLYBLzPpLs7q+Y joetTiF/fzSeyAMdDvys2FLIMmcLi5aH0tq1cMVCl0WPCVw08MVwcguIPPxgxa7YzWd+ VQxrnBLRi6r0sbGZbL0zwmWtvBScjVI66mAkF3Rtci4EE8XigUPl28EDokBo2DeILUdI iHlsWCp63NG3CUkAfLx7cjC2M4zo4Z9xC8518aBAq2N5nr8/7Fv8Tnw+VQqvgtzYTU7U 8fnQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d2-v6si12273961pla.307.2018.08.21.03.19.29; Tue, 21 Aug 2018 03:19:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727002AbeHUNcJ (ORCPT + 99 others); Tue, 21 Aug 2018 09:32:09 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60572 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726580AbeHUNcJ (ORCPT ); Tue, 21 Aug 2018 09:32:09 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8DE5A68802; Tue, 21 Aug 2018 10:12:35 +0000 (UTC) Received: from localhost (ovpn-116-132.ams2.redhat.com [10.36.116.132]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2B11C2166BA1; Tue, 21 Aug 2018 10:12:34 +0000 (UTC) Date: Tue, 21 Aug 2018 12:12:33 +0200 From: Niels de Vos To: Miklos Szeredi Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Marcin Sulikowski Subject: Re: [PATCH v3] fuse: add support for copy_file_range() Message-ID: <20180821101233.GD2650@ndevos-x270> References: <20180629121630.GS2345@ndevos-x270> <20180629125341.30466-1-ndevos@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 21 Aug 2018 10:12:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 21 Aug 2018 10:12:35 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'ndevos@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 07, 2018 at 02:02:35PM +0200, Miklos Szeredi wrote: > On Fri, Jun 29, 2018 at 2:53 PM, Niels de Vos wrote: > > There are several FUSE filesystems that can implement server-side copy > > or other efficient copy/duplication/clone methods. The copy_file_range() > > syscall is the standard interface that users have access to while not > > depending on external libraries that bypass FUSE. > > > > Signed-off-by: Niels de Vos > > > > --- > > v2: return ssize_t instead of long > > v3: add nodeid_out to fuse_copy_file_range_in for libfuse expectations > > --- > > fs/fuse/file.c | 66 +++++++++++++++++++++++ > > fs/fuse/fuse_i.h | 3 ++ > > include/uapi/linux/fuse.h | 107 ++++++++++++++++++++++---------------- > > 3 files changed, 132 insertions(+), 44 deletions(-) > > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > > index 67648ccbdd43..864939a1215d 100644 > > --- a/fs/fuse/file.c > > +++ b/fs/fuse/file.c > > @@ -3009,6 +3009,71 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, > > return err; > > } > > > > +static ssize_t fuse_copy_file_range(struct file *file_in, loff_t pos_in, > > + struct file *file_out, loff_t pos_out, > > + size_t len, unsigned int flags) > > +{ > > + struct fuse_file *ff_in = file_in->private_data; > > + struct fuse_file *ff_out = file_out->private_data; > > + struct inode *inode_out = file_inode(file_out); > > + struct fuse_inode *fi_out = get_fuse_inode(inode_out); > > + struct fuse_conn *fc = ff_in->fc; > > + FUSE_ARGS(args); > > + struct fuse_copy_file_range_in inarg = { > > + .fh_in = ff_in->fh, > > + .off_in = pos_in, > > + .nodeid_out = ff_out->nodeid, > > + .fh_out = ff_out->fh, > > + .off_out = pos_out, > > + .len = len, > > + .flags = flags > > + }; > > + struct fuse_copy_file_range_out outarg; > > + ssize_t err; > > + > > + if (fc->no_copy_file_range) > > + return -EOPNOTSUPP; > > + > > + inode_lock(inode_out); > > + set_bit(FUSE_I_SIZE_UNSTABLE, &fi_out->state); > > This one is only needed in the non-writeback-cache case and only if > the operations is size extending. > > Here's how the writeback-cache is supposed to work: the kernel buffers > writes, just like a normal filesystem, as well as buffering related > metadata updates (size & [cm]time), again, just like a normal > filesystem. This means we just don't care about i_size being updated > in userspace, any such change will be overwritten when the metadata is > flushed out. > > In writeback-cache mode, when we do any other data modification, we > need to first flush out the cache so that the order of writes is not > mixed up. See fallocate() for example. We could be selective and > only flush the range covered by [pos, pos+len], but just flushing > everything is okay. Thanks! I think I understood what you mean and I'll be sending an updated version soon. > I could add these, but you already have a test for this set up, so, I > wouldn't mind if you post a new version. No problem. I got something ready and tested on my side. ... > > + FUSE_POLL = 40, > > + FUSE_NOTIFY_REPLY = 41, > > + FUSE_BATCH_FORGET = 42, > > + FUSE_FALLOCATE = 43, > > + FUSE_READDIRPLUS = 44, > > + FUSE_RENAME2 = 45, > > + FUSE_LSEEK = 46, > > + FUSE_COPY_FILE_RANGE = 47, > > Nit: please do tabulation with tabs instead of spaces. Will do. > > > > /* CUSE specific operations */ > > CUSE_INIT = 4096, > > @@ -792,4 +796,19 @@ struct fuse_lseek_out { > > uint64_t offset; > > }; > > > > +struct fuse_copy_file_range_in { > > + uint64_t fh_in; > > + uint64_t off_in; > > + uint64_t nodeid_out; > > + uint64_t fh_out; > > + uint64_t off_out; > > + uint64_t len; > > + uint32_t flags; > > Why not uint64_t for flags? Everything else uses uint32_t for flags in this file. I'll make it uint64_t in the next update. > > +}; > > + > > +struct fuse_copy_file_range_out { > > + uint32_t size; > > + uint32_t padding; > > +}; > > Could reuse "struct fuse_write_out" for this. Helps with the > userspace interface as well, since the same fuse_reply_write() > function can be used. I considered that before as well. In case the interface changes an updated struct fuse_copy_file_range_out can always be added later. And hopefully there is no reason to change it at all. At the moment I am running a few more test to verify an updated patch, and will send it out later today. Niels