Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp5925064pxb; Tue, 16 Feb 2021 10:55:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJzQNJ+xPac4pu0lnP4KmVVcxH8lzm7BA1S+SMMXA+Bsw6P9Dzo7Gj8LVAzdhHr5oPoaQCEc X-Received: by 2002:a50:bf47:: with SMTP id g7mr22012478edk.323.1613501730829; Tue, 16 Feb 2021 10:55:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613501730; cv=none; d=google.com; s=arc-20160816; b=KgMnfrDsOTAMTspTHOTl2udEgLCasupMsoHyJWgfwur+b/bktTnrv+5Btz9t+AzOWv +X3yUDpmy4/zwtz6yaqpY74akN0Gzu0XH5HlcVGvVT/Fit3P9ZJS+UXg4RweSKM7gmp4 K1UFbOgNO7+0FzrDCjf1/JRdNZA/duPF4Z9gUGxKk8bYBVFyBXzyIGJW5LvApedHZUZs Yxb/BqTtbx0YFDRB0p6Q9MyDhx/7rhnUeu/orWAPJ2ZibyKErGiB7SSq8o9J4yVi2EJJ HEuHmu0uKsdisesAgtzYDzL6+JMp3bn0FYFQ/w24X9+v96TypemPdClQg1kg+h5ZJ1H1 j1Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:in-reply-to:date :references:subject:cc:to:from; bh=WRFm2erbHInwRMONr2e9DDaqAK/OjewpifYaa6cJ7wY=; b=rTDmVudSeStKxokuYQbsHPeNWO6AzaNoOgv+8lp9RqB2Pzxl/rBi0bw7kz8NWNOrMW qVpFoAlspa2lTYqXS8DNoCksyo2UaUmHIjv41NeHjUYWLztXFnlPIeQ61yvRSTvO1qkB jHmcY/taGNYDwYmLX4R2nENO+agEe9mqpu1wsnne6nL/XAoDyBHAxNKFWOeeiPzfWAWT C8Fvv6qn0/oMJvWf9CqJlzajarlDLFo8pGHW6VK/zKVzuQzooSsjL1VO/GvkOMvTWWCT o/W+ljMHDrYSj4RQ5rbeaAZXPeWzNS6C2K2iw/aDV3HjzC7U5IrZNGU0GzHz7WcotEaC KQ0w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j23si16951078eje.690.2021.02.16.10.54.57; Tue, 16 Feb 2021 10:55:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230465AbhBPSyt (ORCPT + 99 others); Tue, 16 Feb 2021 13:54:49 -0500 Received: from mx2.suse.de ([195.135.220.15]:44618 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230317AbhBPSys (ORCPT ); Tue, 16 Feb 2021 13:54:48 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8BF0EAE53; Tue, 16 Feb 2021 18:54:05 +0000 (UTC) Received: from localhost (brahms [local]) by brahms (OpenSMTPD) with ESMTPA id b64fe64e; Tue, 16 Feb 2021 18:55:07 +0000 (UTC) From: Luis Henriques To: Amir Goldstein Cc: Trond Myklebust , "samba-technical@lists.samba.org" , "drinkcat@chromium.org" , "iant@google.com" , "linux-cifs@vger.kernel.org" , "darrick.wong@oracle.com" , "linux-kernel@vger.kernel.org" , "jlayton@kernel.org" , "anna.schumaker@netapp.com" , "llozano@chromium.org" , "linux-nfs@vger.kernel.org" , "miklos@szeredi.hu" , "viro@zeniv.linux.org.uk" , "dchinner@redhat.com" , "linux-fsdevel@vger.kernel.org" , "gregkh@linuxfoundation.org" , "sfrench@samba.org" , "ceph-devel@vger.kernel.org" Subject: Re: [PATCH v2] vfs: prevent copy_file_range to copy across devices References: <20210215154317.8590-1-lhenriques@suse.de> <73ab4951f48d69f0183548c7a82f7ae37e286d1c.camel@hammerspace.com> <92d27397479984b95883197d90318ee76995b42e.camel@hammerspace.com> <87r1lgjm7l.fsf@suse.de> <87blckj75z.fsf@suse.de> Date: Tue, 16 Feb 2021 18:55:06 +0000 In-Reply-To: (Amir Goldstein's message of "Tue, 16 Feb 2021 19:44:48 +0200") Message-ID: <874kibkflh.fsf@suse.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Amir Goldstein writes: > On Tue, Feb 16, 2021 at 6:41 PM Luis Henriques wrote: >> >> Amir Goldstein writes: >> >> >> Ugh. And I guess overlayfs may have a similar problem. >> > >> > Not exactly. >> > Generally speaking, overlayfs should call vfs_copy_file_range() >> > with the flags it got from layer above, so if called from nfsd it >> > will allow cross fs copy and when called from syscall it won't. >> > >> > There are some corner cases where overlayfs could benefit from >> > COPY_FILE_SPLICE (e.g. copy from lower file to upper file), but >> > let's leave those for now. Just leave overlayfs code as is. >> >> Got it, thanks for clarifying. >> >> >> > This is easy to solve with a flag COPY_FILE_SPLICE (or something) that >> >> > is internal to kernel users. >> >> > >> >> > FWIW, you may want to look at the loop in ovl_copy_up_data() >> >> > for improvements to nfsd_copy_file_range(). >> >> > >> >> > We can move the check out to copy_file_range syscall: >> >> > >> >> > if (flags != 0) >> >> > return -EINVAL; >> >> > >> >> > Leave the fallback from all filesystems and check for the >> >> > COPY_FILE_SPLICE flag inside generic_copy_file_range(). >> >> >> >> Ok, the diff bellow is just to make sure I understood your suggestion. >> >> >> >> The patch will also need to: >> >> >> >> - change nfs and overlayfs calls to vfs_copy_file_range() so that they >> >> use the new flag. >> >> >> >> - check flags in generic_copy_file_checks() to make sure only valid flags >> >> are used (COPY_FILE_SPLICE at the moment). >> >> >> >> Also, where should this flag be defined? include/uapi/linux/fs.h? >> > >> > Grep for REMAP_FILE_ >> > Same header file, same Documentation rst file. >> > >> >> >> >> Cheers, >> >> -- >> >> Luis >> >> >> >> diff --git a/fs/read_write.c b/fs/read_write.c >> >> index 75f764b43418..341d315d2a96 100644 >> >> --- a/fs/read_write.c >> >> +++ b/fs/read_write.c >> >> @@ -1383,6 +1383,13 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, >> >> struct file *file_out, loff_t pos_out, >> >> size_t len, unsigned int flags) >> >> { >> >> + if (!(flags & COPY_FILE_SPLICE)) { >> >> + if (!file_out->f_op->copy_file_range) >> >> + return -EOPNOTSUPP; >> >> + else if (file_out->f_op->copy_file_range != >> >> + file_in->f_op->copy_file_range) >> >> + return -EXDEV; >> >> + } >> > >> > That looks strange, because you are duplicating the logic in >> > do_copy_file_range(). Maybe better: >> > >> > if (WARN_ON_ONCE(flags & ~COPY_FILE_SPLICE)) >> > return -EINVAL; >> > if (flags & COPY_FILE_SPLICE) >> > return do_splice_direct(file_in, &pos_in, file_out, &pos_out, >> > len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0); >> >> My initial reasoning for duplicating the logic in do_copy_file_range() was >> to allow the generic_copy_file_range() callers to be left unmodified and >> allow the filesystems to default to this implementation. >> >> With this change, I guess that the calls to generic_copy_file_range() from >> the different filesystems can be dropped, as in my initial patch, as they >> will always get -EINVAL. The other option would be to set the >> COPY_FILE_SPLICE flag in those calls, but that would get us back to the >> problem we're trying to solve. > > I don't understand the problem. > > What exactly is wrong with the code I suggested? > Why should any filesystem be changed? > > Maybe I am missing something. Ok, I have to do a full brain reboot and start all over. Before that, I picked the code you suggested and tested it. I've mounted a cephfs filesystem and used xfs_io to execute a 'copy_range' command using /sys/kernel/debug/sched_features as source. The result was a 0-sized file in cephfs. And the reason is thevfs_copy_file_range() early exit in: if (len == 0) return 0; 'len' is set in generic_copy_file_checks(). This means that we're not solving the original problem anymore (probably since v1 of this patch, haven't checked). Also, re-reading Trond's emails, I read: "... also disallowing the copy from, say, an XFS formatted partition to an ext4 partition". Isn't that *exactly* what we're trying to do here? I.e. _prevent_ these copies from happening so that tracefs files can't be CFR'ed? /me stops now and waits to see if the morning brings some sun :-) Cheers, -- Luis