2022-03-02 22:08:53

by J. Bruce Fields

[permalink] [raw]
Subject: nfs generic/373 failure after "fs: allow cross-vfsmount reflink/dedupe"

I started seeing generic/373 fail on recent linux-next in NFS testing.

Bisect lands it on aaf40970b1d0 "fs: allow cross-vfsmount
reflink/dedupe".

The test fails because a clone between two mounts is expected to fail,
and no longer does.

In my setup both mounts are nfs mounts. They are mounts of different
exports, and the exports are exports of different filesystems. So it
does make sense that the clone should fail.

I see the NFS client send a CLONE rpc to the server, and the server
return success. That seems wrong.

Both exported filesystems are xfs, and from the code it looks like the
server calls vfs_clone_file_range(), which ends up calling
xfs_file_remap_range().

Are we missing a check now in that xfs case?

I haven't looked any more closely at what's going on, so I could be
missing something.

--b.

> >
> > Sent from my iPhone
> >
> > > On Mar 1, 2022, at 6:53 PM, J. Bruce Fields <[email protected]> wrote:
> > >
> > > This is probably moving the check for whether a cross-vfsmount reflink
> > > is allowed to someplace that makes less sense for NFS?
> > >
> > > I did find the thread I was thinking of:
> > >
> > > https://lore.kernel.org/linux-fsdevel/67ae4c62a4749ae6870c452d1b458cc5f48b8263.1645042835.git.josef@toxicpanda.com/#r
> > >
> > > though maybe they're discussing a different problem.
> > >
> > > --b.
> > >
> > >> On Tue, Mar 01, 2022 at 06:04:33PM -0500, J. Bruce Fields wrote:
> > >> aaf40970b1d0f4ac41dad7963f35c9e353b4a41d is the first bad commit
> > >> commit aaf40970b1d0f4ac41dad7963f35c9e353b4a41d
> > >> Author: Josef Bacik <[email protected]>
> > >> Date: Fri Feb 18 09:38:14 2022 -0500
> > >>
> > >> fs: allow cross-vfsmount reflink/dedupe
> > >>
> > >> Currently we disallow reflink and dedupe if the two files aren't on the
> > >> same vfsmount. However we really only need to disallow it if they're
> > >> not on the same super block. It is very common for btrfs to have a main
> > >> subvolume that is mounted and then different subvolumes mounted at
> > >> different locations. It's allowed to reflink between these volumes, but
> > >> the vfsmount check disallows this. Instead fix dedupe to check for the
> > >> same superblock, and simply remove the vfsmount check for reflink as it
> > >> already does the superblock check.
> > >>
> > >> Reviewed-by: Darrick J. Wong <[email protected]>
> > >> Reviewed-by: Nikolay Borisov <[email protected]>
> > >> Reviewed-by: David Sterba <[email protected]>
> > >> Signed-off-by: Josef Bacik <[email protected]>
> > >> Signed-off-by: David Sterba <[email protected]>
> > >>
> > >> fs/ioctl.c | 4 ----
> > >> fs/remap_range.c | 7 +------
> > >> 2 files changed, 1 insertion(+), 10 deletions(-)
> > >> bisect found first bad commitgit bisect start
> > >> # bad: [6705cd745adbbeac6b13002c7a30060f7b2568a5] Add linux-next specific files for 20220228
> > >> git bisect bad 6705cd745adbbeac6b13002c7a30060f7b2568a5
> > >> # good: [7e57714cd0ad2d5bb90e50b5096a0e671dec1ef3] Linux 5.17-rc6
> > >> git bisect good 7e57714cd0ad2d5bb90e50b5096a0e671dec1ef3
> > >> # bad: [b1c65e65460a75a16cb8b658a28dd10fe465c59c] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > >> git bisect bad b1c65e65460a75a16cb8b658a28dd10fe465c59c
> > >> # bad: [03de4e1d1f2cb2993df929a481661cdebc6c2c3d] Merge branch 'hwmon-next' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git
> > >> git bisect bad 03de4e1d1f2cb2993df929a481661cdebc6c2c3d
> > >> # good: [6b9c42995b55b2d01731105ef85170944d8da96f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
> > >> git bisect good 6b9c42995b55b2d01731105ef85170944d8da96f
> > >> # good: [af69a7a5a64464a756328de668041c1075f86454] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux.git
> > >> git bisect good af69a7a5a64464a756328de668041c1075f86454
> > >> # bad: [1f6aae68ded84c44b31249d2aae82be5ceacc758] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
> > >> git bisect bad 1f6aae68ded84c44b31249d2aae82be5ceacc758
> > >> # bad: [25ebc69693daccd38953d627151dbebc369ea3ff] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
> > >> git bisect bad 25ebc69693daccd38953d627151dbebc369ea3ff
> > >> # good: [0cbe7d755415ae2f40a8741eedb7c1717a21de53] btrfs: do not clean up repair bio if submit fails
> > >> git bisect good 0cbe7d755415ae2f40a8741eedb7c1717a21de53
> > >> # good: [5d820d692b5e550cdb590c362fdd35c000ebf420] Merge branch 'master' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git
> > >> git bisect good 5d820d692b5e550cdb590c362fdd35c000ebf420
> > >> # good: [e9a9bca06a61e4700a8f81a1527c82966ed40922] btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
> > >> git bisect good e9a9bca06a61e4700a8f81a1527c82966ed40922
> > >> # bad: [d3cb500f9ca9e8a33a1e728cbd4c9c6e272f3fd8] Merge branch 'ext/qu/subpage-more-sizes' into for-next-next-v5.17-20220224
> > >> git bisect bad d3cb500f9ca9e8a33a1e728cbd4c9c6e272f3fd8
> > >> # bad: [fa3b92e259c8f71731e9bfb9cc6978d6184d8d8d] Merge branch 'ext/josef/cross-mount' into for-next-next-v5.17-20220224
> > >> git bisect bad fa3b92e259c8f71731e9bfb9cc6978d6184d8d8d
> > >> # bad: [aaf40970b1d0f4ac41dad7963f35c9e353b4a41d] fs: allow cross-vfsmount reflink/dedupe
> > >> git bisect bad aaf40970b1d0f4ac41dad7963f35c9e353b4a41d
> > >> # good: [244a73f987b06fd33041dc9cb0f99527a1c0815c] btrfs: remove the cross file system checks from remap
> > >> git bisect good 244a73f987b06fd33041dc9cb0f99527a1c0815c
> > >> # first bad commit: [aaf40970b1d0f4ac41dad7963f35c9e353b4a41d] fs: allow cross-vfsmount reflink/dedupe


2022-03-03 00:32:24

by Josef Bacik

[permalink] [raw]
Subject: Re: nfs generic/373 failure after "fs: allow cross-vfsmount reflink/dedupe"

On Wed, Mar 02, 2022 at 05:04:50PM -0500, J. Bruce Fields wrote:
> I started seeing generic/373 fail on recent linux-next in NFS testing.
>
> Bisect lands it on aaf40970b1d0 "fs: allow cross-vfsmount
> reflink/dedupe".
>
> The test fails because a clone between two mounts is expected to fail,
> and no longer does.
>
> In my setup both mounts are nfs mounts. They are mounts of different
> exports, and the exports are exports of different filesystems. So it
> does make sense that the clone should fail.
>
> I see the NFS client send a CLONE rpc to the server, and the server
> return success. That seems wrong.
>
> Both exported filesystems are xfs, and from the code it looks like the
> server calls vfs_clone_file_range(), which ends up calling
> xfs_file_remap_range().
>
> Are we missing a check now in that xfs case?
>
> I haven't looked any more closely at what's going on, so I could be
> missing something.
>

Yeah there's a few fstests that test this functionality that need to be removed,
I have patches pending for this in our fstests staging tree (since we run
fstests nightly on our tree)

https://github.com/btrfs/fstests/tree/staging

Right now the patches just remove the tests from auto since that's what we run,
I'll remove them properly once the patch lands in linus. Thanks,

Josef

2022-03-03 00:32:30

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs generic/373 failure after "fs: allow cross-vfsmount reflink/dedupe"

On Wed, Mar 02, 2022 at 05:26:08PM -0500, Josef Bacik wrote:
> On Wed, Mar 02, 2022 at 05:04:50PM -0500, J. Bruce Fields wrote:
> > I started seeing generic/373 fail on recent linux-next in NFS testing.
> >
> > Bisect lands it on aaf40970b1d0 "fs: allow cross-vfsmount
> > reflink/dedupe".
> >
> > The test fails because a clone between two mounts is expected to fail,
> > and no longer does.
> >
> > In my setup both mounts are nfs mounts. They are mounts of different
> > exports, and the exports are exports of different filesystems. So it
> > does make sense that the clone should fail.
> >
> > I see the NFS client send a CLONE rpc to the server, and the server
> > return success. That seems wrong.
> >
> > Both exported filesystems are xfs, and from the code it looks like the
> > server calls vfs_clone_file_range(), which ends up calling
> > xfs_file_remap_range().
> >
> > Are we missing a check now in that xfs case?
> >
> > I haven't looked any more closely at what's going on, so I could be
> > missing something.
> >
>
> Yeah there's a few fstests that test this functionality that need to be removed,
> I have patches pending for this in our fstests staging tree (since we run
> fstests nightly on our tree)
>
> https://github.com/btrfs/fstests/tree/staging
>
> Right now the patches just remove the tests from auto since that's what we run,
> I'll remove them properly once the patch lands in linus. Thanks,

So, out of curiosity, what is xfs doing in this case? These are two
filesystems on separate partitions, is it falling back on a read/write
loop or something?

--b.

2022-03-03 00:35:30

by Josef Bacik

[permalink] [raw]
Subject: Re: nfs generic/373 failure after "fs: allow cross-vfsmount reflink/dedupe"

On Wed, Mar 02, 2022 at 05:42:50PM -0500, J. Bruce Fields wrote:
> On Wed, Mar 02, 2022 at 05:26:08PM -0500, Josef Bacik wrote:
> > On Wed, Mar 02, 2022 at 05:04:50PM -0500, J. Bruce Fields wrote:
> > > I started seeing generic/373 fail on recent linux-next in NFS testing.
> > >
> > > Bisect lands it on aaf40970b1d0 "fs: allow cross-vfsmount
> > > reflink/dedupe".
> > >
> > > The test fails because a clone between two mounts is expected to fail,
> > > and no longer does.
> > >
> > > In my setup both mounts are nfs mounts. They are mounts of different
> > > exports, and the exports are exports of different filesystems. So it
> > > does make sense that the clone should fail.
> > >
> > > I see the NFS client send a CLONE rpc to the server, and the server
> > > return success. That seems wrong.
> > >
> > > Both exported filesystems are xfs, and from the code it looks like the
> > > server calls vfs_clone_file_range(), which ends up calling
> > > xfs_file_remap_range().
> > >
> > > Are we missing a check now in that xfs case?
> > >
> > > I haven't looked any more closely at what's going on, so I could be
> > > missing something.
> > >
> >
> > Yeah there's a few fstests that test this functionality that need to be removed,
> > I have patches pending for this in our fstests staging tree (since we run
> > fstests nightly on our tree)
> >
> > https://github.com/btrfs/fstests/tree/staging
> >
> > Right now the patches just remove the tests from auto since that's what we run,
> > I'll remove them properly once the patch lands in linus. Thanks,
>
> So, out of curiosity, what is xfs doing in this case? These are two
> filesystems on separate partitions, is it falling back on a read/write
> loop or something?

I don't think so? I'm actually kind of confused, because nfsd does
vfs_clone_file_range, and the only place I messed with for CLONE was
ioctl_clone_file, so the patch changed literally nothing, unless you aren't
using nfsd for the server?

And if they are in fact two different file systems the i_sb != i_sb of the
files, so there's something pretty strange going on here, my patch shouldn't
affect your setup. Thanks,

Josef

2022-03-03 00:47:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs generic/373 failure after "fs: allow cross-vfsmount reflink/dedupe"

On Wed, Mar 02, 2022 at 06:45:12PM -0500, Josef Bacik wrote:
> On Wed, Mar 02, 2022 at 05:42:50PM -0500, J. Bruce Fields wrote:
> > On Wed, Mar 02, 2022 at 05:26:08PM -0500, Josef Bacik wrote:
> > > On Wed, Mar 02, 2022 at 05:04:50PM -0500, J. Bruce Fields wrote:
> > > > I started seeing generic/373 fail on recent linux-next in NFS testing.
> > > >
> > > > Bisect lands it on aaf40970b1d0 "fs: allow cross-vfsmount
> > > > reflink/dedupe".
> > > >
> > > > The test fails because a clone between two mounts is expected to fail,
> > > > and no longer does.
> > > >
> > > > In my setup both mounts are nfs mounts. They are mounts of different
> > > > exports, and the exports are exports of different filesystems. So it
> > > > does make sense that the clone should fail.
> > > >
> > > > I see the NFS client send a CLONE rpc to the server, and the server
> > > > return success. That seems wrong.
> > > >
> > > > Both exported filesystems are xfs, and from the code it looks like the
> > > > server calls vfs_clone_file_range(), which ends up calling
> > > > xfs_file_remap_range().
> > > >
> > > > Are we missing a check now in that xfs case?
> > > >
> > > > I haven't looked any more closely at what's going on, so I could be
> > > > missing something.
> > > >
> > >
> > > Yeah there's a few fstests that test this functionality that need to be removed,
> > > I have patches pending for this in our fstests staging tree (since we run
> > > fstests nightly on our tree)
> > >
> > > https://github.com/btrfs/fstests/tree/staging
> > >
> > > Right now the patches just remove the tests from auto since that's what we run,
> > > I'll remove them properly once the patch lands in linus. Thanks,
> >
> > So, out of curiosity, what is xfs doing in this case? These are two
> > filesystems on separate partitions, is it falling back on a read/write
> > loop or something?
>
> I don't think so? I'm actually kind of confused, because nfsd does
> vfs_clone_file_range, and the only place I messed with for CLONE was
> ioctl_clone_file, so the patch changed literally nothing, unless you aren't
> using nfsd for the server?
>
> And if they are in fact two different file systems the i_sb != i_sb of the
> files, so there's something pretty strange going on here, my patch shouldn't
> affect your setup. Thanks,

Sorry, took me a minute to understand, myself:

It's actually only the client behavior that changed. Previously the
client would reject an attempt to clone across filesystems, so the
server never saw such a request. After this patch, the client will go
ahead and send the CLONE. (Which, come to think of it, is probably the
right thing for the client to do.)

So the server's probably always had a bug, and this just uncovered it.

I'd be curious what the consequences are. And where the check should be
(above or below vfs_clone_file_range()?).

--b.

2022-03-03 01:00:34

by Josef Bacik

[permalink] [raw]
Subject: Re: nfs generic/373 failure after "fs: allow cross-vfsmount reflink/dedupe"

On Wed, Mar 02, 2022 at 07:07:35PM -0500, J. Bruce Fields wrote:
> On Wed, Mar 02, 2022 at 06:45:12PM -0500, Josef Bacik wrote:
> > On Wed, Mar 02, 2022 at 05:42:50PM -0500, J. Bruce Fields wrote:
> > > On Wed, Mar 02, 2022 at 05:26:08PM -0500, Josef Bacik wrote:
> > > > On Wed, Mar 02, 2022 at 05:04:50PM -0500, J. Bruce Fields wrote:
> > > > > I started seeing generic/373 fail on recent linux-next in NFS testing.
> > > > >
> > > > > Bisect lands it on aaf40970b1d0 "fs: allow cross-vfsmount
> > > > > reflink/dedupe".
> > > > >
> > > > > The test fails because a clone between two mounts is expected to fail,
> > > > > and no longer does.
> > > > >
> > > > > In my setup both mounts are nfs mounts. They are mounts of different
> > > > > exports, and the exports are exports of different filesystems. So it
> > > > > does make sense that the clone should fail.
> > > > >
> > > > > I see the NFS client send a CLONE rpc to the server, and the server
> > > > > return success. That seems wrong.
> > > > >
> > > > > Both exported filesystems are xfs, and from the code it looks like the
> > > > > server calls vfs_clone_file_range(), which ends up calling
> > > > > xfs_file_remap_range().
> > > > >
> > > > > Are we missing a check now in that xfs case?
> > > > >
> > > > > I haven't looked any more closely at what's going on, so I could be
> > > > > missing something.
> > > > >
> > > >
> > > > Yeah there's a few fstests that test this functionality that need to be removed,
> > > > I have patches pending for this in our fstests staging tree (since we run
> > > > fstests nightly on our tree)
> > > >
> > > > https://github.com/btrfs/fstests/tree/staging
> > > >
> > > > Right now the patches just remove the tests from auto since that's what we run,
> > > > I'll remove them properly once the patch lands in linus. Thanks,
> > >
> > > So, out of curiosity, what is xfs doing in this case? These are two
> > > filesystems on separate partitions, is it falling back on a read/write
> > > loop or something?
> >
> > I don't think so? I'm actually kind of confused, because nfsd does
> > vfs_clone_file_range, and the only place I messed with for CLONE was
> > ioctl_clone_file, so the patch changed literally nothing, unless you aren't
> > using nfsd for the server?
> >
> > And if they are in fact two different file systems the i_sb != i_sb of the
> > files, so there's something pretty strange going on here, my patch shouldn't
> > affect your setup. Thanks,
>
> Sorry, took me a minute to understand, myself:
>
> It's actually only the client behavior that changed. Previously the
> client would reject an attempt to clone across filesystems, so the
> server never saw such a request. After this patch, the client will go
> ahead and send the CLONE. (Which, come to think of it, is probably the
> right thing for the client to do.)
>
> So the server's probably always had a bug, and this just uncovered it.
>
> I'd be curious what the consequences are. And where the check should be
> (above or below vfs_clone_file_range()?).
>

This is where I'm confused, this really shouldn't succeed

loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t len, unsigned int remap_flags)
{
loff_t ret;

WARN_ON_ONCE(remap_flags & REMAP_FILE_DEDUP);

if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
return -EXDEV;


loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t len, unsigned int remap_flags)
{
loff_t ret;

file_start_write(file_out);
ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len,
remap_flags);

And even if we get past here, I imagine XFS would freak out because it can't
find the extents (unless you're getting lucky and everything is lining up?).
I'm super confused...

Josef

2022-03-03 01:16:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs generic/373 failure after "fs: allow cross-vfsmount reflink/dedupe"

On Wed, Mar 02, 2022 at 07:29:34PM -0500, Josef Bacik wrote:
> On Wed, Mar 02, 2022 at 07:07:35PM -0500, J. Bruce Fields wrote:
> > Sorry, took me a minute to understand, myself:
> >
> > It's actually only the client behavior that changed. Previously the
> > client would reject an attempt to clone across filesystems, so the
> > server never saw such a request. After this patch, the client will go
> > ahead and send the CLONE. (Which, come to think of it, is probably the
> > right thing for the client to do.)
> >
> > So the server's probably always had a bug, and this just uncovered it.
> >
> > I'd be curious what the consequences are. And where the check should be
> > (above or below vfs_clone_file_range()?).
> >
>
> This is where I'm confused, this really shouldn't succeed
>
> loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
> struct file *file_out, loff_t pos_out,
> loff_t len, unsigned int remap_flags)
> {
> loff_t ret;
>
> WARN_ON_ONCE(remap_flags & REMAP_FILE_DEDUP);
>
> if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> return -EXDEV;
>
>
> loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> struct file *file_out, loff_t pos_out,
> loff_t len, unsigned int remap_flags)
> {
> loff_t ret;
>
> file_start_write(file_out);
> ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len,
> remap_flags);
>
> And even if we get past here, I imagine XFS would freak out because it can't
> find the extents (unless you're getting lucky and everything is lining up?).
> I'm super confused...

Bah, I see what you mean. Maybe there's something wrong with my setup.
I'll try some more stuff and report back....

--b.

2022-03-04 20:10:20

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs generic/373 failure after "fs: allow cross-vfsmount reflink/dedupe"

On Wed, Mar 02, 2022 at 07:50:01PM -0500, J. Bruce Fields wrote:
> On Wed, Mar 02, 2022 at 07:29:34PM -0500, Josef Bacik wrote:
> > On Wed, Mar 02, 2022 at 07:07:35PM -0500, J. Bruce Fields wrote:
> > > Sorry, took me a minute to understand, myself:
> > >
> > > It's actually only the client behavior that changed. Previously the
> > > client would reject an attempt to clone across filesystems, so the
> > > server never saw such a request. After this patch, the client will go
> > > ahead and send the CLONE. (Which, come to think of it, is probably the
> > > right thing for the client to do.)
> > >
> > > So the server's probably always had a bug, and this just uncovered it.
> > >
> > > I'd be curious what the consequences are. And where the check should be
> > > (above or below vfs_clone_file_range()?).
> > >
> >
> > This is where I'm confused, this really shouldn't succeed
> >
> > loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
> > struct file *file_out, loff_t pos_out,
> > loff_t len, unsigned int remap_flags)
> > {
> > loff_t ret;
> >
> > WARN_ON_ONCE(remap_flags & REMAP_FILE_DEDUP);
> >
> > if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> > return -EXDEV;
> >
> >
> > loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> > struct file *file_out, loff_t pos_out,
> > loff_t len, unsigned int remap_flags)
> > {
> > loff_t ret;
> >
> > file_start_write(file_out);
> > ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len,
> > remap_flags);
> >
> > And even if we get past here, I imagine XFS would freak out because it can't
> > find the extents (unless you're getting lucky and everything is lining up?).
> > I'm super confused...
>
> Bah, I see what you mean. Maybe there's something wrong with my setup.
> I'll try some more stuff and report back....

Sorry for the noise, you're right, generic/373 is just being dumb.

I assumed it was mounting different exports for some reason. But in
fact it's just doing a bind mount and then "cp --reflink=always" between
two mounts of the same filesystem. Previously that got rejected out of
hand, now the client sends a CLONE and the server handles it.

Which is an improvement. So it's only generic/373 that needs fixing.

--b.