2019-05-29 15:09:44

by Benjamin Coddington

[permalink] [raw]
Subject: client lookup_revalidate bug

Hey, here's an interesting one, this seems wrong:

[root@fedora27_c2_v5 ~]# mkdir /mnt/one
[root@fedora27_c2_v5 ~]# mkdir /mnt/two
[root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
192.168.122.50:/exports /mnt/one
[root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
192.168.122.50:/exports /mnt/two
[root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
[root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
[root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
[root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
[root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
[root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
[root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
[root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
File: /mnt/two/B/foo
Size: 0 Blocks: 0 IO Block: 262144 regular empty
file
Device: 2fh/47d Inode: 439603 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Context: system_u:object_r:nfs_t:s0
Access: 2019-05-28 14:00:18.929663705 -0400
Modify: 2019-05-28 14:00:18.929663705 -0400
Change: 2019-05-28 14:00:58.990124573 -0400
Birth: -


^^ that lstat should return -ENOENT.

I think we detect a stale directory by comparing the directory's
change_attr with the dentry's d_time. But, here's a case where they are
the same!

Am I wrong about this, or any clever ideas to catch this case?

Ben


2019-05-29 16:22:52

by Trond Myklebust

[permalink] [raw]
Subject: Re: client lookup_revalidate bug

On Wed, 2019-05-29 at 11:08 -0400, Benjamin Coddington wrote:
> Hey, here's an interesting one, this seems wrong:
>
> [root@fedora27_c2_v5 ~]# mkdir /mnt/one
> [root@fedora27_c2_v5 ~]# mkdir /mnt/two
> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
> 192.168.122.50:/exports /mnt/one
> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
> 192.168.122.50:/exports /mnt/two
> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
> [root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> [root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
> [root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> [root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
> File: /mnt/two/B/foo
> Size: 0 Blocks: 0 IO Block: 262144 regular
> empty
> file
> Device: 2fh/47d Inode: 439603 Links: 1
> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid:
> ( 0/ root)
> Context: system_u:object_r:nfs_t:s0
> Access: 2019-05-28 14:00:18.929663705 -0400
> Modify: 2019-05-28 14:00:18.929663705 -0400
> Change: 2019-05-28 14:00:58.990124573 -0400
> Birth: -
>
>
> ^^ that lstat should return -ENOENT.
>
> I think we detect a stale directory by comparing the directory's
> change_attr with the dentry's d_time. But, here's a case where they
> are
> the same!
>
> Am I wrong about this, or any clever ideas to catch this case?
>

When you are mounting using 'nosharecache' then you are making /mnt/one
and /mnt/two act as if they are different filesystems. The fact that
they are the same on the server, means you are setting up a testcase
where the files+directories are acting like the "changing on the
server" case as far as the client is concerned.

If you want the above to work in a sane fashion, then just don't use
'nosharecache'.

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-05-29 16:41:10

by Benjamin Coddington

[permalink] [raw]
Subject: Re: client lookup_revalidate bug

On 29 May 2019, at 12:21, Trond Myklebust wrote:

> On Wed, 2019-05-29 at 11:08 -0400, Benjamin Coddington wrote:
>> Hey, here's an interesting one, this seems wrong:
>>
>> [root@fedora27_c2_v5 ~]# mkdir /mnt/one
>> [root@fedora27_c2_v5 ~]# mkdir /mnt/two
>> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
>> 192.168.122.50:/exports /mnt/one
>> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
>> 192.168.122.50:/exports /mnt/two
>> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
>> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
>> [root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
>> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
>> [root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
>> [root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
>> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
>> [root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
>> File: /mnt/two/B/foo
>> Size: 0 Blocks: 0 IO Block: 262144 regular
>> empty
>> file
>> Device: 2fh/47d Inode: 439603 Links: 1
>> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid:
>> ( 0/ root)
>> Context: system_u:object_r:nfs_t:s0
>> Access: 2019-05-28 14:00:18.929663705 -0400
>> Modify: 2019-05-28 14:00:18.929663705 -0400
>> Change: 2019-05-28 14:00:58.990124573 -0400
>> Birth: -
>>
>>
>> ^^ that lstat should return -ENOENT.
>>
>> I think we detect a stale directory by comparing the directory's
>> change_attr with the dentry's d_time. But, here's a case where they
>> are
>> the same!
>>
>> Am I wrong about this, or any clever ideas to catch this case?
>>
>
> When you are mounting using 'nosharecache' then you are making /mnt/one
> and /mnt/two act as if they are different filesystems. The fact that
> they are the same on the server, means you are setting up a testcase
> where the files+directories are acting like the "changing on the
> server" case as far as the client is concerned.
>
> If you want the above to work in a sane fashion, then just don't use
> 'nosharecache'.

That was deliberate to avoid having to show two clients in the example..
sorry, I should have been more explicit.

I think the client should be able to detect this case, since it can see an
updated change_attr for that particular directory -- that is
"/mnt/two/B/foo", but maybe it needs to compare the change_attr to its
previous value instead of comparing it to the child's d_time?

The person who reported it has some workload that flips files between
directories on separate clients, and doesn't like it when `mv` reports
"source and destination are the same file".

Ben

2019-05-29 17:04:52

by Achilles Gaikwad

[permalink] [raw]
Subject: Re: client lookup_revalidate bug

The file doesn't exist, but when specifying the complete pathname,
nfsv4 thinks that the file already exists.

[...]
# mv /mnt/nfs/B/f /mnt/nfs/A/f
mv: ‘/mnt/nfs/B/f’ and ‘/mnt/nfs/A/f’ are the same file

# ls -l /mnt/nfs/A/f
-rw-r--r--. 1 nfsnobody nfsnobody 0 May 29 01:59 /mnt/nfs/A/f

# ls -l /mnt/nfs/A/
total 0
[...]

Things I've tried:
1. dropping cache
2. using noac

What mv tries to do is mv ‘/mnt/nfs/B/f’ ‘/mnt/nfs/A/f’, it finds
that the file already exists. This is a bug.
Enabling rpcdbug logs for
# rpcdebug -m nfs -s pagecache lookupcache dircache fscache

[47821.556171] NFS: nfs_lookup_revalidate(A/f) is valid

Best,
- Achilles
---
Reproducer by my colleague @Takayuki Nagata

Steps to Reproduce:
1. Mount a NFSv4 volume with noac on node1 and node2.

[root@node1 ~]# mount -o noac server:/srv/export /mnt/nfs
[root@node2 ~]# mount -o noac server:/srv/export /mnt/nfs

2. Create directories and a file.

[root@node1 ~]# mkdir /mnt/nfs/{A,B}; touch /mnt/nfs/A/f

3. Move the file from A to B on the node1.

[root@node1 ~]# mv /mnt/nfs/A/f /mnt/nfs/B/

4. cat the file on the node2, and move it from B to A.

[root@node2 ~]# cat /mnt/nfs/B/f; mv /mnt/nfs/B/f /mnt/nfs/A/

5. Move it again from A to B on the node1.

[root@node1 ~]# mv /mnt/nfs/A/f /mnt/nfs/B/

6. cat the file again on the node2, and move it from B to A again.

[root@node2 ~]# cat /mnt/nfs/B/f; mv /mnt/nfs/B/f /mnt/nfs/A/
mv: `/mnt/nfs/B/f' and `/mnt/nfs/A/f' are the same file

Best,
- Achilles
---


On Wed, May 29, 2019 at 10:09 PM Benjamin Coddington
<[email protected]> wrote:
>
> On 29 May 2019, at 12:21, Trond Myklebust wrote:
>
> > On Wed, 2019-05-29 at 11:08 -0400, Benjamin Coddington wrote:
> >> Hey, here's an interesting one, this seems wrong:
> >>
> >> [root@fedora27_c2_v5 ~]# mkdir /mnt/one
> >> [root@fedora27_c2_v5 ~]# mkdir /mnt/two
> >> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
> >> 192.168.122.50:/exports /mnt/one
> >> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
> >> 192.168.122.50:/exports /mnt/two
> >> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
> >> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
> >> [root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
> >> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> >> [root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
> >> [root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
> >> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> >> [root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
> >> File: /mnt/two/B/foo
> >> Size: 0 Blocks: 0 IO Block: 262144 regular
> >> empty
> >> file
> >> Device: 2fh/47d Inode: 439603 Links: 1
> >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid:
> >> ( 0/ root)
> >> Context: system_u:object_r:nfs_t:s0
> >> Access: 2019-05-28 14:00:18.929663705 -0400
> >> Modify: 2019-05-28 14:00:18.929663705 -0400
> >> Change: 2019-05-28 14:00:58.990124573 -0400
> >> Birth: -
> >>
> >>
> >> ^^ that lstat should return -ENOENT.
> >>
> >> I think we detect a stale directory by comparing the directory's
> >> change_attr with the dentry's d_time. But, here's a case where they
> >> are
> >> the same!
> >>
> >> Am I wrong about this, or any clever ideas to catch this case?
> >>
> >
> > When you are mounting using 'nosharecache' then you are making /mnt/one
> > and /mnt/two act as if they are different filesystems. The fact that
> > they are the same on the server, means you are setting up a testcase
> > where the files+directories are acting like the "changing on the
> > server" case as far as the client is concerned.
> >
> > If you want the above to work in a sane fashion, then just don't use
> > 'nosharecache'.
>
> That was deliberate to avoid having to show two clients in the example..
> sorry, I should have been more explicit.
>
> I think the client should be able to detect this case, since it can see an
> updated change_attr for that particular directory -- that is
> "/mnt/two/B/foo", but maybe it needs to compare the change_attr to its
> previous value instead of comparing it to the child's d_time?
>
> The person who reported it has some workload that flips files between
> directories on separate clients, and doesn't like it when `mv` reports
> "source and destination are the same file".
>
> Ben

2019-05-29 17:21:31

by Trond Myklebust

[permalink] [raw]
Subject: Re: client lookup_revalidate bug

On Wed, 2019-05-29 at 12:39 -0400, Benjamin Coddington wrote:
> On 29 May 2019, at 12:21, Trond Myklebust wrote:
>
> > On Wed, 2019-05-29 at 11:08 -0400, Benjamin Coddington wrote:
> > > Hey, here's an interesting one, this seems wrong:
> > >
> > > [root@fedora27_c2_v5 ~]# mkdir /mnt/one
> > > [root@fedora27_c2_v5 ~]# mkdir /mnt/two
> > > [root@fedora27_c2_v5 ~]# mount -t nfs
> > > -ov4,noac,sec=sys,nosharecache
> > > 192.168.122.50:/exports /mnt/one
> > > [root@fedora27_c2_v5 ~]# mount -t nfs
> > > -ov4,noac,sec=sys,nosharecache
> > > 192.168.122.50:/exports /mnt/two
> > > [root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
> > > [root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
> > > [root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
> > > [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> > > [root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
> > > [root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
> > > [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> > > [root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
> > > File: /mnt/two/B/foo
> > > Size: 0 Blocks: 0 IO Block: 262144
> > > regular
> > > empty
> > > file
> > > Device: 2fh/47d Inode: 439603 Links: 1
> > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid:
> > > ( 0/ root)
> > > Context: system_u:object_r:nfs_t:s0
> > > Access: 2019-05-28 14:00:18.929663705 -0400
> > > Modify: 2019-05-28 14:00:18.929663705 -0400
> > > Change: 2019-05-28 14:00:58.990124573 -0400
> > > Birth: -
> > >
> > >
> > > ^^ that lstat should return -ENOENT.
> > >
> > > I think we detect a stale directory by comparing the directory's
> > > change_attr with the dentry's d_time. But, here's a case where
> > > they
> > > are
> > > the same!
> > >
> > > Am I wrong about this, or any clever ideas to catch this case?
> > >
> >
> > When you are mounting using 'nosharecache' then you are making
> > /mnt/one
> > and /mnt/two act as if they are different filesystems. The fact
> > that
> > they are the same on the server, means you are setting up a
> > testcase
> > where the files+directories are acting like the "changing on the
> > server" case as far as the client is concerned.
> >
> > If you want the above to work in a sane fashion, then just don't
> > use
> > 'nosharecache'.
>
> That was deliberate to avoid having to show two clients in the
> example..
> sorry, I should have been more explicit.
>
> I think the client should be able to detect this case, since it can
> see an
> updated change_attr for that particular directory -- that is
> "/mnt/two/B/foo", but maybe it needs to compare the change_attr to
> its
> previous value instead of comparing it to the child's d_time?
>
> The person who reported it has some workload that flips files between
> directories on separate clients, and doesn't like it when `mv`
> reports
> "source and destination are the same file".

Sorry, but that's not the case, because of the abuse of the
nosharecache flag. You are manipulating the file on the server and
expecting an immediate cache invalidation. That would require
information that the client does not have.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-05-30 10:36:53

by Benjamin Coddington

[permalink] [raw]
Subject: Re: client lookup_revalidate bug

On 29 May 2019, at 13:19, Trond Myklebust wrote:
> On Wed, 2019-05-29 at 12:39 -0400, Benjamin Coddington wrote:
>> On 29 May 2019, at 12:21, Trond Myklebust wrote:
>
> Sorry, but that's not the case, because of the abuse of the
> nosharecache flag. You are manipulating the file on the server and
> expecting an immediate cache invalidation. That would require
> information that the client does not have.

Yes, forget about the nosharecache flag. Let's just assume we move the file
on the server. The client does perform a GETATTR and sees the updated
change_attr for the directory, but it matches the dentry's new d_time.

I don't expect immediate cache invalidation; invalidation will never happen
as long as B/'s change_attr isn't bumped again. On the client, B/foo
dentry's d_time is the same as the change_attr for B/, and even though
there's no file at B/foo on the server, the client will keep the B/foo
dentry until something else bumps B/'s change_attr.

Maybe I need to imagine a scenario where this matters more..

But, I think that if we use the method of comparing d_time with the parent's
change_attr to determine if we need to lookup, this is a case where that's
not reliable.

I'll try to come up with something as I have time to work on it since I am
getting pressure about it. It looks to me like this behavior has been
around for a long time. Thanks for the look.

Ben

2019-05-30 13:44:36

by Benjamin Coddington

[permalink] [raw]
Subject: Re: client lookup_revalidate bug

On 30 May 2019, at 6:35, Benjamin Coddington wrote:

> On 29 May 2019, at 13:19, Trond Myklebust wrote:
>> On Wed, 2019-05-29 at 12:39 -0400, Benjamin Coddington wrote:
>>> On 29 May 2019, at 12:21, Trond Myklebust wrote:
>>
>> Sorry, but that's not the case, because of the abuse of the
>> nosharecache flag. You are manipulating the file on the server and
>> expecting an immediate cache invalidation. That would require
>> information that the client does not have.
>
> Yes, forget about the nosharecache flag. Let's just assume we move
> the file
> on the server. The client does perform a GETATTR and sees the updated
> change_attr for the directory, but it matches the dentry's new d_time.
>
> I don't expect immediate cache invalidation; invalidation will never
> happen
> as long as B/'s change_attr isn't bumped again. On the client, B/foo
> dentry's d_time is the same as the change_attr for B/, and even though
> there's no file at B/foo on the server, the client will keep the B/foo
> dentry until something else bumps B/'s change_attr.
>
> Maybe I need to imagine a scenario where this matters more..
>
> But, I think that if we use the method of comparing d_time with the
> parent's
> change_attr to determine if we need to lookup, this is a case where
> that's
> not reliable.
>
> I'll try to come up with something as I have time to work on it since
> I am
> getting pressure about it. It looks to me like this behavior has been
> around for a long time. Thanks for the look.

Takayuki Nagata points out that this behavior only happens when we have
a
read delegation, and it seems obvious that we ought to skip lookup
revalidation if so.

Let me see if I can clarify the order of events so anyone that wishes
can
better argue about what should happen.

Single client, NFS is mounted on /mnt

Client reads /mnt/A/foo, dentry A/foo is hashed.
Client moves /mnt/A/foo to /mnt/B/foo, dentry B/foo is hashed.
Server moves B/foo back to A/foo.
Client reads A/foo, gets a read delegation.
Client looks up B/foo, skips revalidation because it holds the read
delegation.

Client will always return a positive lookup for both A/foo and B/foo at
this
point.

Should this happen? My position is no, it shouldn't.