Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp5948122pxb; Tue, 16 Feb 2021 11:33:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJx4DeniCLEhesw1fcEj1N9MBXfVn3zwmzJzlaJTf8tBNrUW73hcLD0JIkifot6F1bJvysN4 X-Received: by 2002:a05:6402:278a:: with SMTP id b10mr22986143ede.347.1613503985847; Tue, 16 Feb 2021 11:33:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613503985; cv=none; d=google.com; s=arc-20160816; b=em9jyPo3ryw3b+6+bywDVYaxhWV7f1cXDZas5zL82y0cgm7u8SzLgnFFF8dOrrwWhf a1dUDYmuDVx79VvHOtvt+Uuo6HE7AzAh9hyJcYiCXTTVRCKZshHN3LGYta0/zcp0ogIp 8/QlY6vRvH76qHJpU2mfoXG9JzuNgnC7o/7lJ/eLChWdEhu7rD+24vi4zOp9HKe4/Sds roop4lGv8dxnXVNbkDKfEIkpWXqXdRAdMmc/d07r3sQLv5/CUeSxhTwBUPKjghoemmzR 9r99+Z0UR35+4Ik13btHgsnXyrFwtRCX1IeVe/TmwZVjycDYbTqPqNGyKDjiZm/ClHwC Tykg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=HmCAvuNvzzlx1wPnK2xDnzXf91xowJ6tBiZ0TiED0PI=; b=V69sJ4l1N3eKw2JTigZDryiX7m8JrKipYb9i99lARJsA8PQvQet+3r29wGRAt/5gnB uOOPegqxPOW0QHzUg8McO+ffV3CEczNgOOi63t+Gu8zCNehxkWxXdPjxNnW6H/aRUgQc vjnCZeTOjVSqHOcuxnrvE29NiZHaBIuhsa9g6VYd88d74+NbEygYB5cOfkgfMqsMj1EF Uh0IOa5mxiaaY6HrBiHNZrzofHwL1N2oxChlKXgZ1eO5WVINUxIR1QPI66I63/iCIHU5 Rhcz4zoIIZhPPattBcwmUy7WK5krdjThPNXVjxiEZq2L6nHYVZAM3unVdodjMQ/4xOFe NRIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dLlRRZs9; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p5si15029087ejo.398.2021.02.16.11.32.22; Tue, 16 Feb 2021 11:33:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dLlRRZs9; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229845AbhBPTcU (ORCPT + 99 others); Tue, 16 Feb 2021 14:32:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbhBPTcR (ORCPT ); Tue, 16 Feb 2021 14:32:17 -0500 Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5746BC061574; Tue, 16 Feb 2021 11:31:36 -0800 (PST) Received: by mail-lf1-x12e.google.com with SMTP id v5so17706604lft.13; Tue, 16 Feb 2021 11:31:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HmCAvuNvzzlx1wPnK2xDnzXf91xowJ6tBiZ0TiED0PI=; b=dLlRRZs9sqq/C+iquScyiGbIAVAydtmmb8a3f0adYj/RcJVedNfOAwkxADuFTkGaCV txLpLFP9c10q0hpRv3HOqSmEQfgjYVJaU9cOHl2YLIlR+KNKdPl3x8kHkZeZZ7bNgWI9 umeOcUZC3lSVxwIha4/1EP8z2khGaU8ibPBL41r6NiR9k27X4kYMtW9lzZmP+KA/+PA1 x6ztNYDg9mQOMe3RECBZiY+kqhQWVYn3AE/ZIx+XB7cuZaXlFkKz504Kh4pVbiRnO0o3 Kaa4XMa6NaOjrkhXBVvW+OU7csW7gw6lg39n5JxTZhvxZKz47K3CC7DJJFep1D3JSCzM CaoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HmCAvuNvzzlx1wPnK2xDnzXf91xowJ6tBiZ0TiED0PI=; b=sP3EM5CyzxSfpNVV6lOONE3nzAmIxAzRIVSdiIUVsFhzqrJYYSUQKUv3GzK/OlN/yX aF4QNWBoseHpZGcpJut3OhNKHBLXUB/g+wTJNhLGP+g0iHl9S1j5+uZ07EEKzQWqbIdD 8lY2EI1I0NV0Z2FkxzcqTrbkyRTvWZRaN5HEYlh9qS8ERst9f9ukY2tqp2RZH17UiwP5 wtF77vgon1q5muwil1quOSc7s6tM0/xrayVH/mraNc3faN+EoBfTgE2oBnT3QS5uZHCq cBEA+bd5Eem6M7GaS5GVGCqWO7RlvIs/8u6oLtoHuXTwiNKZSsa+5FcPYiGRocIp5MUX UeZw== X-Gm-Message-State: AOAM53088NhCDFdnhFDunskniWUCH2Zjd8sLLFzimyV0iL5DhHBm1LLn FEDf2TOXZmDxcECA9xdQJU/TOiORkyAC0De6kld3OJLXawDJlg== X-Received: by 2002:a19:224d:: with SMTP id i74mr12226898lfi.395.1613503893146; Tue, 16 Feb 2021 11:31:33 -0800 (PST) MIME-Version: 1.0 References: <20210215154317.8590-1-lhenriques@suse.de> <73ab4951f48d69f0183548c7a82f7ae37e286d1c.camel@hammerspace.com> <92d27397479984b95883197d90318ee76995b42e.camel@hammerspace.com> <87r1lgjm7l.fsf@suse.de> <87blckj75z.fsf@suse.de> <874kibkflh.fsf@suse.de> In-Reply-To: From: Steve French Date: Tue, 16 Feb 2021 13:31:21 -0600 Message-ID: Subject: Re: [PATCH v2] vfs: prevent copy_file_range to copy across devices To: Anna Schumaker Cc: Amir Goldstein , Luis Henriques , Trond Myklebust , "samba-technical@lists.samba.org" , "drinkcat@chromium.org" , "iant@google.com" , "linux-cifs@vger.kernel.org" , "darrick.wong@oracle.com" , "linux-kernel@vger.kernel.org" , "jlayton@kernel.org" , "llozano@chromium.org" , "linux-nfs@vger.kernel.org" , "miklos@szeredi.hu" , "viro@zeniv.linux.org.uk" , "dchinner@redhat.com" , "linux-fsdevel@vger.kernel.org" , "gregkh@linuxfoundation.org" , "sfrench@samba.org" , "ceph-devel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, Feb 16, 2021 at 1:29 PM Anna Schumaker wrote: > > On Tue, Feb 16, 2021 at 2:22 PM Amir Goldstein wrote: > > > > On Tue, Feb 16, 2021 at 8:54 PM Luis Henriques wrote: > > > > > > Amir Goldstein writes: > > > > > > > On Tue, Feb 16, 2021 at 6:41 PM Luis Henriques wrote: > > > >> > > > >> Amir Goldstein writes: > > > >> > > > >> >> Ugh. And I guess overlayfs may have a similar problem. > > > >> > > > > >> > Not exactly. > > > >> > Generally speaking, overlayfs should call vfs_copy_file_range() > > > >> > with the flags it got from layer above, so if called from nfsd it > > > >> > will allow cross fs copy and when called from syscall it won't. > > > >> > > > > >> > There are some corner cases where overlayfs could benefit from > > > >> > COPY_FILE_SPLICE (e.g. copy from lower file to upper file), but > > > >> > let's leave those for now. Just leave overlayfs code as is. > > > >> > > > >> Got it, thanks for clarifying. > > > >> > > > >> >> > This is easy to solve with a flag COPY_FILE_SPLICE (or something) that > > > >> >> > is internal to kernel users. > > > >> >> > > > > >> >> > FWIW, you may want to look at the loop in ovl_copy_up_data() > > > >> >> > for improvements to nfsd_copy_file_range(). > > > >> >> > > > > >> >> > We can move the check out to copy_file_range syscall: > > > >> >> > > > > >> >> > if (flags != 0) > > > >> >> > return -EINVAL; > > > >> >> > > > > >> >> > Leave the fallback from all filesystems and check for the > > > >> >> > COPY_FILE_SPLICE flag inside generic_copy_file_range(). > > > >> >> > > > >> >> Ok, the diff bellow is just to make sure I understood your suggestion. > > > >> >> > > > >> >> The patch will also need to: > > > >> >> > > > >> >> - change nfs and overlayfs calls to vfs_copy_file_range() so that they > > > >> >> use the new flag. > > > >> >> > > > >> >> - check flags in generic_copy_file_checks() to make sure only valid flags > > > >> >> are used (COPY_FILE_SPLICE at the moment). > > > >> >> > > > >> >> Also, where should this flag be defined? include/uapi/linux/fs.h? > > > >> > > > > >> > Grep for REMAP_FILE_ > > > >> > Same header file, same Documentation rst file. > > > >> > > > > >> >> > > > >> >> Cheers, > > > >> >> -- > > > >> >> Luis > > > >> >> > > > >> >> diff --git a/fs/read_write.c b/fs/read_write.c > > > >> >> index 75f764b43418..341d315d2a96 100644 > > > >> >> --- a/fs/read_write.c > > > >> >> +++ b/fs/read_write.c > > > >> >> @@ -1383,6 +1383,13 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, > > > >> >> struct file *file_out, loff_t pos_out, > > > >> >> size_t len, unsigned int flags) > > > >> >> { > > > >> >> + if (!(flags & COPY_FILE_SPLICE)) { > > > >> >> + if (!file_out->f_op->copy_file_range) > > > >> >> + return -EOPNOTSUPP; > > > >> >> + else if (file_out->f_op->copy_file_range != > > > >> >> + file_in->f_op->copy_file_range) > > > >> >> + return -EXDEV; > > > >> >> + } > > > >> > > > > >> > That looks strange, because you are duplicating the logic in > > > >> > do_copy_file_range(). Maybe better: > > > >> > > > > >> > if (WARN_ON_ONCE(flags & ~COPY_FILE_SPLICE)) > > > >> > return -EINVAL; > > > >> > if (flags & COPY_FILE_SPLICE) > > > >> > return do_splice_direct(file_in, &pos_in, file_out, &pos_out, > > > >> > len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0); > > > >> > > > >> My initial reasoning for duplicating the logic in do_copy_file_range() was > > > >> to allow the generic_copy_file_range() callers to be left unmodified and > > > >> allow the filesystems to default to this implementation. > > > >> > > > >> With this change, I guess that the calls to generic_copy_file_range() from > > > >> the different filesystems can be dropped, as in my initial patch, as they > > > >> will always get -EINVAL. The other option would be to set the > > > >> COPY_FILE_SPLICE flag in those calls, but that would get us back to the > > > >> problem we're trying to solve. > > > > > > > > I don't understand the problem. > > > > > > > > What exactly is wrong with the code I suggested? > > > > Why should any filesystem be changed? > > > > > > > > Maybe I am missing something. > > > > > > Ok, I have to do a full brain reboot and start all over. > > > > > > Before that, I picked the code you suggested and tested it. I've mounted > > > a cephfs filesystem and used xfs_io to execute a 'copy_range' command > > > using /sys/kernel/debug/sched_features as source. The result was a > > > 0-sized file in cephfs. And the reason is thevfs_copy_file_range() > > > early exit in: > > > > > > if (len == 0) > > > return 0; > > > > > > 'len' is set in generic_copy_file_checks(). > > > > Good point.. I guess we will need to do all the checks earlier in > > generic_copy_file_checks() including the logic of: > > > > if (file_in->f_op->remap_file_range && > > file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) > > > > > > > > > > This means that we're not solving the original problem anymore (probably > > > since v1 of this patch, haven't checked). > > > > > > Also, re-reading Trond's emails, I read: "... also disallowing the copy > > > from, say, an XFS formatted partition to an ext4 partition". Isn't that > > > *exactly* what we're trying to do here? I.e. _prevent_ these copies from > > > happening so that tracefs files can't be CFR'ed? > > > > > > > We want to address the report which means calls coming from > > copy_file_range() syscall. > > > > Trond's use case is vfs_copy_file_range() coming from nfsd. > > When he writes about copy from XFS to ext4, he means an > > NFS client is issuing server side copy (on same or different NFS mounts) > > and the NFS server is executing nfsd_copy_file_range() on a source > > file that happens to be on XFS and destination happens to be on ext4. > > NFS also supports a server-to-server copy where the destination server > mounts the source server and reads the data to be copied. Please don't > break that either :) This is a case we will eventually need to support for cifs (SMB3) as well. -- Thanks, Steve