Return-Path: Received: from mail-qk0-f181.google.com ([209.85.220.181]:33893 "EHLO mail-qk0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754947AbbDKNEJ (ORCPT ); Sat, 11 Apr 2015 09:04:09 -0400 Received: by qkgx75 with SMTP id x75so78515147qkg.1 for ; Sat, 11 Apr 2015 06:04:08 -0700 (PDT) Date: Sat, 11 Apr 2015 09:04:02 -0400 From: Jeff Layton To: Trond Myklebust Cc: Zach Brown , Linux Kernel Mailing List , Linux FS-devel Mailing List , linux-btrfs@vger.kernel.org, Linux NFS Mailing List , linux-scsi@vger.kernel.org Subject: Re: [PATCH RFC 1/3] vfs: add copy_file_range syscall and vfs helper Message-ID: <20150411090402.67d22d02@tlielax.poochiereds.net> In-Reply-To: References: <1428703236-24735-1-git-send-email-zab@redhat.com> <1428703236-24735-2-git-send-email-zab@redhat.com> <20150411000208.GA20949@lenny.home.zabbo.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Fri, 10 Apr 2015 20:24:06 -0400 Trond Myklebust wrote: > On Fri, Apr 10, 2015 at 8:02 PM, Zach Brown wrote: > > On Fri, Apr 10, 2015 at 06:36:41PM -0400, Trond Myklebust wrote: > >> On Fri, Apr 10, 2015 at 6:00 PM, Zach Brown wrote: > > > >> > + > >> > +/* > >> > + * copy_file_range() differs from regular file read and write in that it > >> > + * specifically allows return partial success. When it does so is up to > >> > + * the copy_file_range method. > >> > + */ > >> > +ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > >> > + struct file *file_out, loff_t pos_out, > >> > + size_t len, int flags) > >> > >> I'm going to repeat a gripe with this interface. I really don't think > >> we should treat copy_file_range() as taking a size_t length, since > >> that is not sufficient to do a full file copy on 32-bit systems w/ LFS > >> support. > > > > *nod*. The length type is limited by the syscall return type and the > > arbitrary desire to mimic read/write. > > > > I sympathize with wanting to copy giant files with operations that don't > > scale with file size because files can be enormous but sparse. > > The other argument against using a size_t is that there is no memory > buffer involved here. size_t is, after all, a type describing > in-memory objects, not files. > > >> Could we perhaps instead of a length, define a 'pos_in_start' and a > >> 'pos_in_end' offset (with the latter being -1 for a full-file copy) > >> and then return an 'loff_t' value stating where the copy ended? > > > > Well, the resulting offset will be set if the caller provided it. So > > they could already be getting the copied length from that. But they > > might not specify the offsets. Maybe they're just using the results to > > total up a completion indicator. > > > > Maybe we could make the length a pointer like the offsets that's set to > > the copied length on return. > > That works, but why do we care so much about the difference between a > length and an offset as a return value? > I think it just comes down to potential confusion for users. What's more useful, the number of bytes actually copied, or the offset into the file where the copy ended? I tend to the think an offset is more useful for someone trying to copy a file in chunks, particularly if the file is sparse. That gives them a clear place to continue the copy. So, I think I agree with Trond that phrasing this interface in terms of file offsets seems like it might be more useful. That also neatly sidesteps the size_t limitations on 32-bit platforms. > To be fair, the NFS copy offload also allows the copy to proceed out > of order, in which case the range of copied data could be > non-contiguous in the case of a failure. However neither the length > nor the offset case will give you the full story in that case. Any > return value can at best be considered to define an offset range whose > contents need to be checked for success/failure. > Yuck! How the heck do you clean up the mess if that happens? I guess you're just stuck redoing the copy with normal READ/WRITE? Maybe we need to have the interface return a hard error in that case and not try to give back any sort of offset? > > This all seems pretty gross. Does anyone else have a vote? > > > > (And I'll argue strongly against creating magical offset values that > > change behaviour. If we want to ignore arguments and get the length > > from the source file we'd add a flag to do so.) > > The '-1' was not intended to be a special/magical value: as far as I'm > concerned any end offset that covers the full range of supported file > lengths would be OK. > Agreed. A "whole file" flag might also be useful too, but I'd leave that for after the initial implementation is merged, just in the interest of having _something_ that works in the near term. > >> Note that both btrfs and NFSv4.2 allow for 64-bit lengths, so this > >> interface would be closer to what is already in use anyway. > > > > Yeah, btrfs doesn't allow partial progress. It returns 0 on success. > > We could also do that but people have expressed an interest in returning > > partial progress. > > Returning an end offset would satisfy the partial progress requirement > (with the caveat mentioned above). > -- Jeff Layton