Hi Trond,
I'd like to fix up lookup revalidation for v4.1+ when the client is using
OPEN_CLAIM_FH. The fixes a while back for Stan Hu's case do not seem to
improve things for v4.1, and actually make the behavior a bit worse since we
no longer pass through nfs_lookup_verify_inode(), which would catch the
cases where nlink == 0.
Would you accept work to _always_ revalidate the dentry's parent for
CLAIM_FH? Alternatively, it seems that CLAIM_NULL would be preferable for
this case, though I don't know how the client would know when to decide
between them.
Here's a simple reproducer for convenience, I think we've already all agreed
that the behavior we want is for the second open by `cat` to reflect the
results of the move on the server, or at least eventually later opens would
revalidate the dentry:
#!/bin/bash
set -o xtrace
vers=4.1
exportfs -ua
exportfs -o rw,sec=sys,no_root_squash *:/exports
mkdir /mnt/localhost || true
rm -f /exports/file{1,2}
echo this is file 1 > /exports/file1
echo this is file 2 > /exports/file2
mount -t nfs -ov$vers,sec=sys localhost:/exports /mnt/localhost
tail -f /mnt/localhost/file1 &
sleep 1
# this is file 1
cat /mnt/localhost/file1
# overwrite the file on the server:
mv -f /exports/file2 /exports/file1
# this is file 2
cat /mnt/localhost/file1
killall tail
# this is file 2
cat /mnt/localhost/file1
umount /mnt/localhost
Switching the $vers variable between v4.0 and v4.1 in this script shows the
difference in behavior.
Thanks for any advice,
Ben
On Thu, 2020-01-16 at 08:51 -0500, Benjamin Coddington wrote:
> Hi Trond,
>
> I'd like to fix up lookup revalidation for v4.1+ when the client is
> using
> OPEN_CLAIM_FH. The fixes a while back for Stan Hu's case do not seem
> to
> improve things for v4.1, and actually make the behavior a bit worse
> since we
> no longer pass through nfs_lookup_verify_inode(), which would catch
> the
> cases where nlink == 0.
>
> Would you accept work to _always_ revalidate the dentry's parent for
> CLAIM_FH? Alternatively, it seems that CLAIM_NULL would be
> preferable for
> this case, though I don't know how the client would know when to
> decide
> between them.
>
> Here's a simple reproducer for convenience, I think we've already all
> agreed
> that the behavior we want is for the second open by `cat` to reflect
> the
> results of the move on the server, or at least eventually later opens
> would
> revalidate the dentry:
>
> #!/bin/bash
>
> set -o xtrace
> vers=4.1
>
> exportfs -ua
> exportfs -o rw,sec=sys,no_root_squash *:/exports
>
> mkdir /mnt/localhost || true
>
> rm -f /exports/file{1,2}
>
> echo this is file 1 > /exports/file1
> echo this is file 2 > /exports/file2
>
> mount -t nfs -ov$vers,sec=sys localhost:/exports /mnt/localhost
>
> tail -f /mnt/localhost/file1 &
> sleep 1
>
> # this is file 1
> cat /mnt/localhost/file1
>
> # overwrite the file on the server:
> mv -f /exports/file2 /exports/file1
>
> # this is file 2
> cat /mnt/localhost/file1
>
> killall tail
> # this is file 2
> cat /mnt/localhost/file1
> umount /mnt/localhost
>
>
> Switching the $vers variable between v4.0 and v4.1 in this script
> shows the
> difference in behavior.
>
If somebody needs stronger lookup cache revalidation, then that's what
they have the 'lookupcache=none' mount option for. We have these
'lookupcache' mount options in order to allow users to tailor the
caching behaviour (on a per-mount basis) should the default behaviour
be insufficiently strict.
Since your testcase doesn't use that mount option, I don't see what it
is proving other than what we already know about the default lookup
caching: namely that it sacrifices some accuracy in the interest of
file open performance.
By the way, NFSv4.1 will actually handle this situation better than
NFSv3, since the stateful open ensures that even if the file that won
the race was deleted by the other client, then the server will preserve
that file and its contents until our client calls close().
In conclusion:
* With a default mount option of 'lookupcache=all', we don't promise
100% accuracy in the face of 3rd party client changes to the
namespace. We only promise that we will eventually pick up the
changes. If we're failing to do that, then let's look at why, but
'lookupcache=all' is not guaranteed to immediately find namespace
changes.
* With 'lookupcache=pos', we promise that the client will ignore
negative cached dentries, and hence will always find a file that
was created using exclusive create on the server. However I'd expect
that too to fail your test above.
* If it fails with 'lookupcache=none', then I would agree that we have
a problem.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 16 Jan 2020, at 9:35, Trond Myklebust wrote:
> On Thu, 2020-01-16 at 08:51 -0500, Benjamin Coddington wrote:
>> Hi Trond,
>>
>> I'd like to fix up lookup revalidation for v4.1+ when the client is
>> using
>> OPEN_CLAIM_FH. The fixes a while back for Stan Hu's case do not seem
>> to
>> improve things for v4.1, and actually make the behavior a bit worse
>> since we
>> no longer pass through nfs_lookup_verify_inode(), which would catch
>> the
>> cases where nlink == 0.
>>
>> Would you accept work to _always_ revalidate the dentry's parent for
>> CLAIM_FH? Alternatively, it seems that CLAIM_NULL would be
>> preferable for
>> this case, though I don't know how the client would know when to
>> decide
>> between them.
>>
>> Here's a simple reproducer for convenience, I think we've already all
>> agreed
>> that the behavior we want is for the second open by `cat` to reflect
>> the
>> results of the move on the server, or at least eventually later opens
>> would
>> revalidate the dentry:
>>
>> #!/bin/bash
>>
>> set -o xtrace
>> vers=4.1
>>
>> exportfs -ua
>> exportfs -o rw,sec=sys,no_root_squash *:/exports
>>
>> mkdir /mnt/localhost || true
>>
>> rm -f /exports/file{1,2}
>>
>> echo this is file 1 > /exports/file1
>> echo this is file 2 > /exports/file2
>>
>> mount -t nfs -ov$vers,sec=sys localhost:/exports /mnt/localhost
>>
>> tail -f /mnt/localhost/file1 &
>> sleep 1
>>
>> # this is file 1
>> cat /mnt/localhost/file1
>>
>> # overwrite the file on the server:
>> mv -f /exports/file2 /exports/file1
>>
>> # this is file 2
>> cat /mnt/localhost/file1
>>
>> killall tail
>> # this is file 2
>> cat /mnt/localhost/file1
>> umount /mnt/localhost
>>
>>
>> Switching the $vers variable between v4.0 and v4.1 in this script
>> shows the
>> difference in behavior.
>>
>
> If somebody needs stronger lookup cache revalidation, then that's what
> they have the 'lookupcache=none' mount option for. We have these
> 'lookupcache' mount options in order to allow users to tailor the
> caching behaviour (on a per-mount basis) should the default behaviour
> be insufficiently strict.
>
> Since your testcase doesn't use that mount option, I don't see what it
> is proving other than what we already know about the default lookup
> caching: namely that it sacrifices some accuracy in the interest of
> file open performance.
Thanks for the look. The testcase only provides a comparison of different
behavior between v4.0 and v4.1, which is due to our use of CLAIM_NULL vs
CLAIM_FH.
Indeed, setting lookupcache=none give real-time updates. With default
lookupcache, after the directory attributes time out, the client will
likewise get the namespace right. So, this isn't a major problem, but we do
have some QE folks that are unhappy about it.
Can we improve things a bit for v4.1 without sacrificing performance? I
can't think of a reason to not go back to CLAIM_NULL in nfs4_file_open().
Maybe it is a bit more work on the server to have to do one extra lookup per
open, but we'll end up with the right file each time.
It makes sense to keep CLAIM_FH for recovery, but why keep it for regular
opens?
Ben
On Thu, 2020-01-16 at 10:13 -0500, Benjamin Coddington wrote:
> On 16 Jan 2020, at 9:35, Trond Myklebust wrote:
>
> > On Thu, 2020-01-16 at 08:51 -0500, Benjamin Coddington wrote:
> > > Hi Trond,
> > >
> > > I'd like to fix up lookup revalidation for v4.1+ when the client
> > > is
> > > using
> > > OPEN_CLAIM_FH. The fixes a while back for Stan Hu's case do not
> > > seem
> > > to
> > > improve things for v4.1, and actually make the behavior a bit
> > > worse
> > > since we
> > > no longer pass through nfs_lookup_verify_inode(), which would
> > > catch
> > > the
> > > cases where nlink == 0.
> > >
> > > Would you accept work to _always_ revalidate the dentry's parent
> > > for
> > > CLAIM_FH? Alternatively, it seems that CLAIM_NULL would be
> > > preferable for
> > > this case, though I don't know how the client would know when to
> > > decide
> > > between them.
> > >
> > > Here's a simple reproducer for convenience, I think we've already
> > > all
> > > agreed
> > > that the behavior we want is for the second open by `cat` to
> > > reflect
> > > the
> > > results of the move on the server, or at least eventually later
> > > opens
> > > would
> > > revalidate the dentry:
> > >
> > > #!/bin/bash
> > >
> > > set -o xtrace
> > > vers=4.1
> > >
> > > exportfs -ua
> > > exportfs -o rw,sec=sys,no_root_squash *:/exports
> > >
> > > mkdir /mnt/localhost || true
> > >
> > > rm -f /exports/file{1,2}
> > >
> > > echo this is file 1 > /exports/file1
> > > echo this is file 2 > /exports/file2
> > >
> > > mount -t nfs -ov$vers,sec=sys localhost:/exports /mnt/localhost
> > >
> > > tail -f /mnt/localhost/file1 &
> > > sleep 1
> > >
> > > # this is file 1
> > > cat /mnt/localhost/file1
> > >
> > > # overwrite the file on the server:
> > > mv -f /exports/file2 /exports/file1
> > >
> > > # this is file 2
> > > cat /mnt/localhost/file1
> > >
> > > killall tail
> > > # this is file 2
> > > cat /mnt/localhost/file1
> > > umount /mnt/localhost
> > >
> > >
> > > Switching the $vers variable between v4.0 and v4.1 in this script
> > > shows the
> > > difference in behavior.
> > >
> >
> > If somebody needs stronger lookup cache revalidation, then that's
> > what
> > they have the 'lookupcache=none' mount option for. We have these
> > 'lookupcache' mount options in order to allow users to tailor the
> > caching behaviour (on a per-mount basis) should the default
> > behaviour
> > be insufficiently strict.
> >
> > Since your testcase doesn't use that mount option, I don't see what
> > it
> > is proving other than what we already know about the default lookup
> > caching: namely that it sacrifices some accuracy in the interest of
> > file open performance.
>
> Thanks for the look. The testcase only provides a comparison of
> different
> behavior between v4.0 and v4.1, which is due to our use of CLAIM_NULL
> vs
> CLAIM_FH.
>
> Indeed, setting lookupcache=none give real-time updates. With
> default
> lookupcache, after the directory attributes time out, the client will
> likewise get the namespace right. So, this isn't a major problem,
> but we do
> have some QE folks that are unhappy about it.
>
> Can we improve things a bit for v4.1 without sacrificing
> performance? I
> can't think of a reason to not go back to CLAIM_NULL in
> nfs4_file_open().
> Maybe it is a bit more work on the server to have to do one extra
> lookup per
> open, but we'll end up with the right file each time.
>
> It makes sense to keep CLAIM_FH for recovery, but why keep it for
> regular
> opens?
>
nfs_file_open() is completely the wrong place to perform a lookup. Its
purpose in the VFS is to allow the filesystem to set up state *after*
we've already looked up the dentry, revalidated it and therefore
decided which file to open.
The NFSv4.0 behaviour of performing a new lookup is actually the
aberration here, and is due to the fact that it does not have an open-
by-filehandle operation, so we have no alternative.
As I said, if you want stronger semantics, there are lookupcache mount
options that allow you to tune things. I therefore see no valid reason
to change the existing behaviour, which also matches that of older NFS
versions (i.e. v3 and v2).
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 16 Jan 2020, at 10:38, Trond Myklebust wrote:
> On Thu, 2020-01-16 at 10:13 -0500, Benjamin Coddington wrote:
>> On 16 Jan 2020, at 9:35, Trond Myklebust wrote:
>
> nfs_file_open() is completely the wrong place to perform a lookup. Its
> purpose in the VFS is to allow the filesystem to set up state *after*
> we've already looked up the dentry, revalidated it and therefore
> decided which file to open.
> The NFSv4.0 behaviour of performing a new lookup is actually the
> aberration here, and is due to the fact that it does not have an open-
> by-filehandle operation, so we have no alternative.
Ok, that makes good sense to me.
> As I said, if you want stronger semantics, there are lookupcache mount
> options that allow you to tune things. I therefore see no valid reason
> to change the existing behaviour, which also matches that of older NFS
> versions (i.e. v3 and v2).
Thanks for the discussion. Your point above about v4.0 CLAIM_NULL actually
doing another lookup after we already did a lookup throws things in sharp
relief.
Ben