Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp4800121ybv; Tue, 11 Feb 2020 03:45:27 -0800 (PST) X-Google-Smtp-Source: APXvYqw4/4ylcH16mryFWFcBqPYg3gIPxiqx4aJHzMKG5egTdgV/wnuEk+U8lmBMpFHvJPZxunhb X-Received: by 2002:a9d:5a09:: with SMTP id v9mr4670126oth.214.1581421527327; Tue, 11 Feb 2020 03:45:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581421527; cv=none; d=google.com; s=arc-20160816; b=nF++r1PshHe1bPv5TnLhVAM5/9RcI1ptDq9m5SvCOvhnRYuoiVIvsg9Mo/mtC+FLJt RB2XvyL2adn4Cx+f8GnIog7E0id7pDQteFykZL51+s2dhJlqvaNwKHV5MhyzuMDUWFf+ 748Gf/BqRdjBiTFVRnUHo26EHfDTZTL3eyPDKvaa66ambKqVpQsVMrq887Gx1UkCGJ2M mGqmPQ+ONogfTyLzIIvHFMzPnxt7N36TnLphWmhUY0zN1FXfih27dQuZoQNZGPyZ1FZN A/bkyQEQIiTsYGfOleKcocAZSP5sBZlgLtSFUbq91n1TCZwRhXs7r8hjam0PWMcBk0KD TDXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=R3y8ZmiEYtDezKD63+l4DczridZuvwHW36ri3A/Fxdk=; b=FjYUu4wqHnmKcB6B7Thb9VegCf8uJvXwqd8LTfAtMuFwy2Kd3eWXZmGmAVZOkNllqE PPVPvXxfX7eBqIvfWzrQznLEk1iWretEONz1oc3gYRF2dx8HksYSkC9ulcNQWgz5xZJ8 TspD6pGz/QApR0CqV0bbqezZqd+O682+4DKYGCxc1PQ7H5wnQ8VYIHXVNlJp7UhEXQ5N 3beI6v2raN/IXGDzI6oljLyoNP27Y+A4A4bSogKhAVvDBn6D1+x4MFlsFIv1DImxVZoa p1AqHgy1gLuBjXfJNxvhLr1T4P/H8wR0KRWM4MvZajTWhL2YUtelKjf3RZbJJmUyzfqv swXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=njInSJ1L; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g22si1700230otp.55.2020.02.11.03.45.15; Tue, 11 Feb 2020 03:45:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=njInSJ1L; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728121AbgBKLEA (ORCPT + 99 others); Tue, 11 Feb 2020 06:04:00 -0500 Received: from mail-il1-f193.google.com ([209.85.166.193]:46006 "EHLO mail-il1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727561AbgBKLEA (ORCPT ); Tue, 11 Feb 2020 06:04:00 -0500 Received: by mail-il1-f193.google.com with SMTP id p8so3042747iln.12; Tue, 11 Feb 2020 03:03:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=R3y8ZmiEYtDezKD63+l4DczridZuvwHW36ri3A/Fxdk=; b=njInSJ1LF/+jt83rq+wKFldX5QtA6salRCBIGztkvhRx7XVRmAWcdivsjlA+w/FjYr qtbyGVSaXaY0DtWR5UEbpWVxsdSHJlr6agtKRlkUkXDxWOTU4M1aqAF1NsbpFu2iiC3p WGGVPDnqWrKf6ausGnQmp+UoQzcx7RLB4hjkcQIOSDC03QN/kpVgi44CQO9voAG38qP4 mN4GKzGAu/GYpjk3PxCr1JFB2OPYz4N0rU2+SSEGZee0sQ1pGPbdgY75SRln8W6v7jh8 UTppcNXm8KA/isIxFwMK+qWwmAjRMXoLw6N9Dn6anTr6RbgV/Yn/gYETUDoN7q6fxr8O azhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=R3y8ZmiEYtDezKD63+l4DczridZuvwHW36ri3A/Fxdk=; b=q7vSiYwgyYfJQhc53mJNLNB6IT4Zua9PzwiUjcHnQaabbFv7BS/M/trGzF0miSqfV3 Q10APOGP8Hu0Fa1ItlieR3O1jiIl1AiHxNd5a74/Bt+jaxdQbiZCuV5nyLBV3CLQePLS E0XFwCC6qW0dyEtH36CGYeDaKJaqc6yWPt2lWgyIN0PxOaTt3rNUsvjpPt36Jc3w7WCH prFL1vbZjxc3bUW7aCVhN5LgMdmziSA6CUC9GKxKOpVN+2+YacApEMyAIdh+wKjeT4jI CIIXivMWZ2pXU5xqKNuNCrrEujAGtmERG/2E6DB1IiIGsE7DZfwv8tqM+hsrL+Gm4yxp tvbA== X-Gm-Message-State: APjAAAUfMwNnKyAoO6ZS2zBZpYvYzbw50F+y+ngl3vcQR4aqhQGNhnaK UvE881hSa7XWHvbw+azxk4tAaV+ocmL5Bk6uAes= X-Received: by 2002:a92:3a8d:: with SMTP id i13mr6123726ilf.112.1581419039079; Tue, 11 Feb 2020 03:03:59 -0800 (PST) MIME-Version: 1.0 References: <20200205192414.GA27345@suse.com> <20200206103842.14936-1-lhenriques@suse.com> <20200211103239.GA21723@suse.com> In-Reply-To: <20200211103239.GA21723@suse.com> From: Ilya Dryomov Date: Tue, 11 Feb 2020 12:04:18 +0100 Message-ID: Subject: Re: [PATCH v2] ceph: fix copy_file_range error path in short copies To: Luis Henriques Cc: Jeff Layton , Sage Weil , "Yan, Zheng" , Gregory Farnum , Ceph Development , LKML , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 11, 2020 at 11:32 AM Luis Henriques wrote: > > On Mon, Feb 10, 2020 at 07:38:10PM +0100, Ilya Dryomov wrote: > > On Thu, Feb 6, 2020 at 11:38 AM Luis Henriques wrote: > > > > > > When there's an error in the copying loop but some bytes have already been > > > copied into the destination file, it is necessary to dirty the caps and > > > eventually update the MDS with the file metadata (timestamps, size). This > > > patch fixes this error path. > > > > > > Another issue this patch fixes is the destination file size being reported > > > to the MDS. If we're on the error path but the amount of bytes written > > > has already changed the destination file size, the offset to use is > > > dst_off and not endoff. > > > > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Luis Henriques > > > --- > > > fs/ceph/file.c | 18 +++++++++++++----- > > > 1 file changed, 13 insertions(+), 5 deletions(-) > > > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > > index 11929d2bb594..f7f8cb6c243f 100644 > > > --- a/fs/ceph/file.c > > > +++ b/fs/ceph/file.c > > > @@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > > > CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0); > > > if (err) { > > > dout("ceph_osdc_copy_from returned %d\n", err); > > > - if (!ret) > > > + /* > > > + * If we haven't done any copy yet, just exit with the > > > + * error code; otherwise, return the number of bytes > > > + * already copied, update metadata and dirty caps. > > > + */ > > > + if (!ret) { > > > ret = err; > > > - goto out_caps; > > > + goto out_caps; > > > + } > > > + goto update_dst_inode; > > > } > > > len -= object_size; > > > src_off += object_size; > > > @@ -2118,16 +2125,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > > > /* We still need one final local copy */ > > > do_final_copy = true; > > > > > > +update_dst_inode: > > > file_update_time(dst_file); > > > inode_inc_iversion_raw(dst_inode); > > > > > > - if (endoff > size) { > > > + if (dst_off > size) { > > > int caps_flags = 0; > > > > > > /* Let the MDS know about dst file size change */ > > > - if (ceph_quota_is_max_bytes_approaching(dst_inode, endoff)) > > > + if (ceph_quota_is_max_bytes_approaching(dst_inode, dst_off)) > > > caps_flags |= CHECK_CAPS_NODELAY; > > > - if (ceph_inode_set_size(dst_inode, endoff)) > > > + if (ceph_inode_set_size(dst_inode, dst_off)) > > > caps_flags |= CHECK_CAPS_AUTHONLY; > > > if (caps_flags) > > > ceph_check_caps(dst_ci, caps_flags, NULL); > > > > Hi Luis, > > > > I think this function still has short copy and file size issues: > > > > - do_splice_direct() may write fewer bytes than requested, including > > nothing at all (i.e. return 0). While we don't care about the second > > call much, handling the first call is crucial because proceeding to > > the copy-from loop with src/dst_off not at the object boundary will > > corrupt the destination file. > > > > - size is set after caps are acquired for the first time and never > > updated. But caps are dropped before do_splice_direct(), so by the > > time we get to dst_off > size check, it may be stale. Again, data > > loss if e.g. old-size < dst_off < new-size because the destination > > file will get truncated... > > > > Also, src/dst_oloc need to be freed with ceph_oloc_destroy() to avoid > > leaking memory on namespace layouts. > > > > It seems clear that this function needs to be split, with the new > > loop around do_splice_direct() and the copy-from loop each going into > > a separate functions with clear pre- and post-conditions. > > Right, it makes sense to refactor this function and fix all these issues > you're pointing. It'll be a pain because a lot of parameters will need to > be handed over into these new functions (maybe a new 'struct copy_desc' > can help making it a bit less messy). Anyway, I'll try to spend some time > working on that and see what I can come up with. Yeah, this code really needs more work and extensive verification. I'm dropping this patch because it's only a partial fix. Backporting it alone, with known data corruption issues remaining (and without the copy-from2 patch that fixes another data corruption issue that is much easier to hit), doesn't make sense. Thanks, Ilya