Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4765669yba; Mon, 20 May 2019 03:27:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqx7PBQ6Un3doekq5t56pEbxyjwu/lubO8/TQcmOUkH9IEDeWqoQacGWDE3tEiZhE9eU4Ih7 X-Received: by 2002:a65:4243:: with SMTP id d3mr64084261pgq.57.1558348072757; Mon, 20 May 2019 03:27:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558348072; cv=none; d=google.com; s=arc-20160816; b=yB8vdgS0agBd5H8dLWIyTE02vR+ikqOqlJaCpbcW8iVQQflrWnPOYFOvO3PdDnc/lg 9UQF0UxcalhCs0j1+DIZgHUGSkuo41HKe+GRHfs2AsJOO8CBAAPeODLNjyrImMBMYiwe G+EzEeQsPzmLUSDEIp0fXMNU+fYKQnGEEvssBNEp+t1NraEySUJEFutKgKQ/rn9muPJp weq+GC+u5JkXUNyGkjYP1tv6Y2w6iEYE7/PZB+ScXx+HnVJg/+fJ4K1BuPQLDU8tH14K lmT5AeI0QrmsZVXkI2DMKPhxMaTlfxsjEsZieNq/Qz0xYdSo/rM+VLCxyafuI4+MnoP/ 83SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=WO0nb0i3TlfUEi3Ta1dEHoPzpd1bYnb4mrBSpQaLWfM=; b=JRShINzk7WR1J6HK/ESY7j1ZRayr3ITNuegLcOtae5Og5fHpPcBh9Uy/z19q8aD9G/ qHX+pRJdj1gR2JQf9OoauTprDYCCuM2b4xN80F4Dsrj2x/mEMdD8NEQ1FGwiOhVd6131 B7nZgDoq9V44uj6eQoIEyHwneKZ+suKdxGgaC+g/hBJL0xD6w8sXBVBKSqxgpegM0yVI qmTHOOJhPSbm50n6RXwn0M6JNvtk0Z7I05hn7UnS1dm3sqGlDzdCDN7Z3yR8snYjVf96 ln8pLOWqnXXw3B9mapBE7zfXgVHfVIDDgs3U3wyZzmL6DJP+EK4eGERA7le19YnRPxKt uTpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Xv0hYodw; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j19si19168265pfd.189.2019.05.20.03.27.28; Mon, 20 May 2019 03:27:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Xv0hYodw; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731193AbfETJKx (ORCPT + 99 others); Mon, 20 May 2019 05:10:53 -0400 Received: from mail-yw1-f66.google.com ([209.85.161.66]:37235 "EHLO mail-yw1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726169AbfETJKx (ORCPT ); Mon, 20 May 2019 05:10:53 -0400 Received: by mail-yw1-f66.google.com with SMTP id 186so5573443ywo.4; Mon, 20 May 2019 02:10:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WO0nb0i3TlfUEi3Ta1dEHoPzpd1bYnb4mrBSpQaLWfM=; b=Xv0hYodwAou2zdmZxFJFkZxUeVf2epBSGZOWi2TyQkrLXK8DUAYF+ifb6tHe6uP1uJ WpIFKZqzTWDvwDlxDgVUR8RUElqV6b5mzhbhGUonKzOHzxIGbAfME2G07FUKU2zb8XCl kT1nIN4zh1LvcCTOmQeYmbJG6jMj8IX+OyaJ8uYq9msIUvr1n6BVU/Sg7iEt7A4rJZP6 l26TIYtOAS+ByK2yoLoRUFW6ywzCkUOPdHMh3A2VoVhAUIp59Ikd0eZefAk8OcUUmC+J 3YaaybhHqRgI1WA7XetpaCtWYUokDERoaOGtAXYMBoxHzFRbsI9JbgMJv2bUzKnpSM0z P/NA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WO0nb0i3TlfUEi3Ta1dEHoPzpd1bYnb4mrBSpQaLWfM=; b=eYOrWPR030iZWhZsRO0nHPUDOy0pzDriqNlX+sKWBZby3Ji9YGGc87StT9RnAQ/TXO AagjbZVMpq26aOytdZYrIc7WuZ0WdocDRLEFkCqyC7kXZfR+Oh0jZhOlMfknZEmujUO0 kJA63vcExrdeQqXHTykQK6sis2Oh3sgIKvm8u8mOPelb8/vI0xfbhu2Z5+0uW7mKmAxI Hm/vmTIHzDXVibEGk1n6ufvEMwuwr70B9q5kK8YsKNy1KutuQM8TaCTpgOs++Ohur4v2 R8AZNt7cXviajV4v3ilLFMJZYUtSn5cM2TYfEbq8RRk/JXFpuw51pmYPAQpF+XGQ+QlA QsKA== X-Gm-Message-State: APjAAAWDxyTf561eOQec2t30li/AH1aVq3RHTkjaqjfTQV8KpfSwcfcG G3MmSfn7vxLwSDhD2eCoQFldiaHjGjR9oxu2QalVkUpS X-Received: by 2002:a81:3344:: with SMTP id z65mr2829786ywz.294.1558343451573; Mon, 20 May 2019 02:10:51 -0700 (PDT) MIME-Version: 1.0 References: <20181203083416.28978-1-david@fromorbit.com> <20181203083416.28978-2-david@fromorbit.com> <20181204151332.GA32245@infradead.org> <20181204212948.GO6311@dastard> <20181204223102.GR6311@dastard> In-Reply-To: <20181204223102.GR6311@dastard> From: Amir Goldstein Date: Mon, 20 May 2019 12:10:39 +0300 Message-ID: Subject: Re: [PATCH 01/11] vfs: copy_file_range source range over EOF should fail To: Dave Chinner Cc: Olga Kornievskaia , Christoph Hellwig , linux-fsdevel , linux-xfs , linux-nfs , overlayfs , ceph-devel@vger.kernel.org, CIFS Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Dec 5, 2018 at 12:31 AM Dave Chinner wrote: > > On Tue, Dec 04, 2018 at 04:47:18PM -0500, Olga Kornievskaia wrote: > > On Tue, Dec 4, 2018 at 4:35 PM Dave Chinner wrote: > > > > > > On Tue, Dec 04, 2018 at 07:13:32AM -0800, Christoph Hellwig wrote: > > > > On Mon, Dec 03, 2018 at 02:46:20PM +0200, Amir Goldstein wrote: > > > > > > From: Dave Chinner > > > > > > > > > > > > The man page says: > > > > > > > > > > > > EINVAL Requested range extends beyond the end of the source file > > > > > > > > > > > > But the current behaviour is that copy_file_range does a short > > > > > > copy up to the source file EOF. Fix the kernel behaviour to match > > > > > > the behaviour described in the man page. > > > > > > > > I think the behavior implemented is a lot more useful than the one > > > > documented.. > > > > > > The current behaviour is really nasty. Because copy_file_range() can > > > return short copies, the caller has to implement a loop to ensure > > > the range hey want get copied. When the source range you are > > > trying to copy overlaps source EOF, this loop: > > > > > > while (len > 0) { > > > ret = copy_file_range(... len ...) > > > ... > > > off_in += ret; > > > off_out += ret; > > > len -= ret; > > > } > > > > > > Currently the fallback code copies up to the end of the source file > > > on the first copy and then fails the second copy with EINVAL because > > > the source range is now completely beyond EOF. > > > > > > So, from an application perspective, did the copy succeed or did it > > > fail? > > > > > > Existing tools that exercise copy_file_range (like xfs_io) consider > > > this a failure, because the second copy_file_range() call returns > > > EINVAL and not some "there is no more to copy" marker like read() > > > returning 0 bytes when attempting to read beyond EOF. > > > > > > IOWs, we cannot tell the difference between a real error and a short > > > copy because the input range spans EOF and it was silently > > > shortened. That's the API problem we need to fix here - the existing > > > behaviour is really crappy for applications. Erroring out > > > immmediately is one solution, and it's what the man page says should > > > happen so that is what I implemented. > > > > > > Realistically, though, I think an attempt to read beyond EOF for the > > > copy should result in behaviour like read() (i.e. return 0 bytes), > > > not EINVAL. The existing behaviour needs to change, though. > > > > There are two checks to consider > > 1. pos_in >= EOF should return EINVAL > > 2. however what's perhaps should be relaxed is pos_in+len >= EOF > > should return a short copy. > > > > Having check#1 enforced allows to us to differentiate between a real > > error and a short copy. > > That's what the code does right now and *exactly what I'm trying to > fix* because it EINVAL is ambiguous and not an indicator that we've > reached the end of the source file. EINVAL can indicate several > different errors, so it really has to be treated as a "copy failed" > error by applications. > > Have a look at read/pread() - they return 0 in this case to indicate > a short read, and the value of zero is explicitly defined as meaning > "read position is beyond EOF". Applications know straight away that > there is no more data to be read and there was no error, so can > terminate on a successful short read. > > We need to allow applications to terminate copy loops on a > successful short copy. IOWs, applications need to either: > > - get an immediate error saying the range is invalid rather > than doing a short copy (as per the man page); or > - have an explicit marker to say "no more data to be copied" > > Applications need the "no more data to copy" case to be explicit and > unambiguous so they can make sane decisions about whether a short > copy was successful because the file was shorter than expected or > whether a short copy was a result of a real error being encountered. > The current behaviour is largely unusable for applications because > they have to guess at the reason for EINVAL part way through a > copy.... > Dave, I went a head and implemented the desired behavior. However, while testing I observed that the desired behavior is already the existing behavior. For example, trying to copy 10 bytes from a 2 bytes file, xfs_io copy loop ends as expected: copy_file_range(4, [0], 3, [0], 10, 0) = 2 copy_file_range(4, [2], 3, [2], 8, 0) = 0 This was tested on ext4 and xfs with reflink on recent kernel as well as on v4.20-rc1 (era of original patch set). Where and how did you observe the EINVAL behavior described above? (besides man page that is). There are even xfstests (which you modified) that verify the return 0 for past EOF behavior. For now, I am just dropping this patch from the patch series. Let me know if I am missing something. Thanks, Amir.