Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp5942319pxb; Tue, 16 Feb 2021 11:22:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJxSyNK71k7hvFVLP7GWa6U9N/3ewj32EvRZ69vzOCmAr4W0uMdYztU7AMIJRGDB8fk+sdwm X-Received: by 2002:aa7:c259:: with SMTP id y25mr16695494edo.306.1613503326671; Tue, 16 Feb 2021 11:22:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613503326; cv=none; d=google.com; s=arc-20160816; b=AdgsenUgIrrfiBxPHFJfnzrfxV8bdstqbDM0XgKVfmwjNFN8cMKGm4tagfBIe3HZfR T8ijgPRlgzGuvq7TdM3aTIyRLJdG4GO1PNX/2TbPeMy0tYtoQkaKOEPK+zLMSzCkMCQ8 uLwlCDBN0PZH/Oy3ba/RMmFSYBp6HWrpA9HnFTF476Mv52mRTFYxZ6UZ8N1vEdHue+3d xOaplphdpWI1DykhqO/35OcFHXzp1Hhqjwj2F3YIl6w8xVVNmVKbsLqrwZuXnhlH+tR2 C5idwhPf4J5X8nhmntLn6Zc7Kgvzz4IbKQeNU4+aqx+mMh09vL9TKy9KI+lhVJZomrjc JuHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=F/UAe+Q5KrxzGPRrDpk+yw/nVFEpqh6cyKfpX+3V5cA=; b=UxySRx6E7a57EwITz2SMcdocMJsqe42nsVOQDkXWlZngXtqfOXj3B6rQ3WX969rqNm 2jRjOTtzyG7DefCZai1I7t0PfzUe1/2eFKbcUi5YZQ/Kj5Sbpey0/Uv9lRylJhN5RDll E8ZB1dofA07IKhywpwN5raLHelZbnJ7leRhZInu1rJglV7boiSSm8hEa/ddvLeUQuCQB m5sWwId/5zij/WyGKbaqU56h5jRjRDYS18X9JQp6OFltxX/ZmFV8IJxxTJOpp/drAT+S 9odoKEkHLctaF0fwOW9UuDTbTeUk36B67V8AnRRItpaiUXCgfFBEnQaoh+BbdSZrWVWK 4KMA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=qgO4DGEK; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p5si15029087ejo.398.2021.02.16.11.21.43; Tue, 16 Feb 2021 11:22:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=qgO4DGEK; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229989AbhBPTV3 (ORCPT + 99 others); Tue, 16 Feb 2021 14:21:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229916AbhBPTV2 (ORCPT ); Tue, 16 Feb 2021 14:21:28 -0500 Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32E0AC06174A; Tue, 16 Feb 2021 11:20:48 -0800 (PST) Received: by mail-io1-xd35.google.com with SMTP id e133so11310921iof.8; Tue, 16 Feb 2021 11:20:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=F/UAe+Q5KrxzGPRrDpk+yw/nVFEpqh6cyKfpX+3V5cA=; b=qgO4DGEKgZcSWPj1Fh7eGlI4HjLMaqKqxTNGmtldCwGKBeQ2fZV174A3XcB4FIX8Uy /cHVelRqlb7p9w+r0IXnqDbDKworj4TpzTCbwPImx0nQZ8n8yiWox7tlO8pxnt5bIO96 tGm1wHzSpI3J2q5kU4lUVPIAu/OB374dtzomNA1aIjJvezs+G64YECTq/8RSiDBpU8CR /qqhpQqE+sXuRnYgShPDwlohoGAErXQe32s6mS/hUmnIT+35erQdHiZmzBEqKV8bQEXk 7ay6yJgmtm4S57cURF1KYVSYy8Nl1b6M3bVBU3KvC0I1Op/QqW9CB6Bcsxl+rNbYoGpu zdOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=F/UAe+Q5KrxzGPRrDpk+yw/nVFEpqh6cyKfpX+3V5cA=; b=gJpmNX8/y2ZeZJ3f9CNA1CKJONFEUnJaZwsHYcdEjBJr7jjfIGcR87R88cs2Ke5Xa0 QZT0nUP1XZuSIjjiJ8FOtXFaQpmdRDxrOyYKFnczsH692saNACH9nmlo+114A/rp/hdD Wd1laoGVEGkMemOrM+PhxEC54XPAfIDFQVYQzuBYMV1OCo+PSzOHJon/t+EE9j8NHma5 Ug7jWkcUnwO9wxMzsveETnyXHvls4JngzTEXGG6mNaBscjm+44FJs2p5LBtv8Zb15PPn QUnCsqYKRrjQo1hAsvAajaNqq1Gb3OTiU//DyYc0ZeI/lraCJhSGhNPQpMAbWos9HQwA lXcw== X-Gm-Message-State: AOAM532uXtKOMM+FBRLBPDdctfaiRd8PCHNHeutwT2TgWuUahkjyw0y/ UoXgwvyTNZMk8Uecvb9ksCdIPwvOS2C/Ho8bq7Q= X-Received: by 2002:a05:6638:3491:: with SMTP id t17mr21518938jal.81.1613503247661; Tue, 16 Feb 2021 11:20:47 -0800 (PST) MIME-Version: 1.0 References: <20210215154317.8590-1-lhenriques@suse.de> <73ab4951f48d69f0183548c7a82f7ae37e286d1c.camel@hammerspace.com> <92d27397479984b95883197d90318ee76995b42e.camel@hammerspace.com> <87r1lgjm7l.fsf@suse.de> <87blckj75z.fsf@suse.de> <874kibkflh.fsf@suse.de> In-Reply-To: <874kibkflh.fsf@suse.de> From: Amir Goldstein Date: Tue, 16 Feb 2021 21:20:36 +0200 Message-ID: Subject: Re: [PATCH v2] vfs: prevent copy_file_range to copy across devices To: Luis Henriques Cc: Trond Myklebust , "samba-technical@lists.samba.org" , "drinkcat@chromium.org" , "iant@google.com" , "linux-cifs@vger.kernel.org" , "darrick.wong@oracle.com" , "linux-kernel@vger.kernel.org" , "jlayton@kernel.org" , "anna.schumaker@netapp.com" , "llozano@chromium.org" , "linux-nfs@vger.kernel.org" , "miklos@szeredi.hu" , "viro@zeniv.linux.org.uk" , "dchinner@redhat.com" , "linux-fsdevel@vger.kernel.org" , "gregkh@linuxfoundation.org" , "sfrench@samba.org" , "ceph-devel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, Feb 16, 2021 at 8:54 PM Luis Henriques wrote: > > Amir Goldstein writes: > > > On Tue, Feb 16, 2021 at 6:41 PM Luis Henriques wrote: > >> > >> Amir Goldstein writes: > >> > >> >> Ugh. And I guess overlayfs may have a similar problem. > >> > > >> > Not exactly. > >> > Generally speaking, overlayfs should call vfs_copy_file_range() > >> > with the flags it got from layer above, so if called from nfsd it > >> > will allow cross fs copy and when called from syscall it won't. > >> > > >> > There are some corner cases where overlayfs could benefit from > >> > COPY_FILE_SPLICE (e.g. copy from lower file to upper file), but > >> > let's leave those for now. Just leave overlayfs code as is. > >> > >> Got it, thanks for clarifying. > >> > >> >> > This is easy to solve with a flag COPY_FILE_SPLICE (or something) that > >> >> > is internal to kernel users. > >> >> > > >> >> > FWIW, you may want to look at the loop in ovl_copy_up_data() > >> >> > for improvements to nfsd_copy_file_range(). > >> >> > > >> >> > We can move the check out to copy_file_range syscall: > >> >> > > >> >> > if (flags != 0) > >> >> > return -EINVAL; > >> >> > > >> >> > Leave the fallback from all filesystems and check for the > >> >> > COPY_FILE_SPLICE flag inside generic_copy_file_range(). > >> >> > >> >> Ok, the diff bellow is just to make sure I understood your suggestion. > >> >> > >> >> The patch will also need to: > >> >> > >> >> - change nfs and overlayfs calls to vfs_copy_file_range() so that they > >> >> use the new flag. > >> >> > >> >> - check flags in generic_copy_file_checks() to make sure only valid flags > >> >> are used (COPY_FILE_SPLICE at the moment). > >> >> > >> >> Also, where should this flag be defined? include/uapi/linux/fs.h? > >> > > >> > Grep for REMAP_FILE_ > >> > Same header file, same Documentation rst file. > >> > > >> >> > >> >> Cheers, > >> >> -- > >> >> Luis > >> >> > >> >> diff --git a/fs/read_write.c b/fs/read_write.c > >> >> index 75f764b43418..341d315d2a96 100644 > >> >> --- a/fs/read_write.c > >> >> +++ b/fs/read_write.c > >> >> @@ -1383,6 +1383,13 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, > >> >> struct file *file_out, loff_t pos_out, > >> >> size_t len, unsigned int flags) > >> >> { > >> >> + if (!(flags & COPY_FILE_SPLICE)) { > >> >> + if (!file_out->f_op->copy_file_range) > >> >> + return -EOPNOTSUPP; > >> >> + else if (file_out->f_op->copy_file_range != > >> >> + file_in->f_op->copy_file_range) > >> >> + return -EXDEV; > >> >> + } > >> > > >> > That looks strange, because you are duplicating the logic in > >> > do_copy_file_range(). Maybe better: > >> > > >> > if (WARN_ON_ONCE(flags & ~COPY_FILE_SPLICE)) > >> > return -EINVAL; > >> > if (flags & COPY_FILE_SPLICE) > >> > return do_splice_direct(file_in, &pos_in, file_out, &pos_out, > >> > len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0); > >> > >> My initial reasoning for duplicating the logic in do_copy_file_range() was > >> to allow the generic_copy_file_range() callers to be left unmodified and > >> allow the filesystems to default to this implementation. > >> > >> With this change, I guess that the calls to generic_copy_file_range() from > >> the different filesystems can be dropped, as in my initial patch, as they > >> will always get -EINVAL. The other option would be to set the > >> COPY_FILE_SPLICE flag in those calls, but that would get us back to the > >> problem we're trying to solve. > > > > I don't understand the problem. > > > > What exactly is wrong with the code I suggested? > > Why should any filesystem be changed? > > > > Maybe I am missing something. > > Ok, I have to do a full brain reboot and start all over. > > Before that, I picked the code you suggested and tested it. I've mounted > a cephfs filesystem and used xfs_io to execute a 'copy_range' command > using /sys/kernel/debug/sched_features as source. The result was a > 0-sized file in cephfs. And the reason is thevfs_copy_file_range() > early exit in: > > if (len == 0) > return 0; > > 'len' is set in generic_copy_file_checks(). Good point.. I guess we will need to do all the checks earlier in generic_copy_file_checks() including the logic of: if (file_in->f_op->remap_file_range && file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) > > This means that we're not solving the original problem anymore (probably > since v1 of this patch, haven't checked). > > Also, re-reading Trond's emails, I read: "... also disallowing the copy > from, say, an XFS formatted partition to an ext4 partition". Isn't that > *exactly* what we're trying to do here? I.e. _prevent_ these copies from > happening so that tracefs files can't be CFR'ed? > We want to address the report which means calls coming from copy_file_range() syscall. Trond's use case is vfs_copy_file_range() coming from nfsd. When he writes about copy from XFS to ext4, he means an NFS client is issuing server side copy (on same or different NFS mounts) and the NFS server is executing nfsd_copy_file_range() on a source file that happens to be on XFS and destination happens to be on ext4. We can undo the copy_file_range() syscall change of behavior from v5.3 without regressing the NFS use case. We just need to be careful and look at all the affected code paths. Thanks, Amir.