2021-09-26 18:19:18

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] NFS: Default change_attr_type to NFS4_CHANGE_TYPE_IS_UNDEFINED

From: Trond Myklebust <[email protected]>

Both NFSv3 and NFSv2 generate their change attribute from the ctime
value that was supplied by the server. However the problem is that there
are plenty of servers out there with ctime resolutions of 1ms or worse.
In a modern performance system, this is insufficient when trying to
decide which is the most recent set of attributes when, for instance, a
READ or GETATTR call races with a WRITE or SETATTR.

For this reason, let's revert to labelling the NFSv2/v3 change
attributes as NFS4_CHANGE_TYPE_IS_UNDEFINED. This will ensure we protect
against such races.

Fixes: 7b24dacf0840 ("NFS: Another inode revalidation improvement")
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs3xdr.c | 2 +-
fs/nfs/proc.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index e6eca1d7481b..9274c9c5efea 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2227,7 +2227,7 @@ static int decode_fsinfo3resok(struct xdr_stream *xdr,

/* ignore properties */
result->lease_time = 0;
- result->change_attr_type = NFS4_CHANGE_TYPE_IS_TIME_METADATA;
+ result->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
return 0;
}

diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index ea19dbf12301..ecc4e717808c 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -91,7 +91,7 @@ nfs_proc_get_root(struct nfs_server *server, struct nfs_fh *fhandle,
info->dtpref = fsinfo.tsize;
info->maxfilesize = 0x7FFFFFFF;
info->lease_time = 0;
- info->change_attr_type = NFS4_CHANGE_TYPE_IS_TIME_METADATA;
+ info->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
return 0;
}

--
2.31.1


2021-09-26 19:24:36

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] NFS: Default change_attr_type to NFS4_CHANGE_TYPE_IS_UNDEFINED

Thanks for the quick response! Comments inline.


> On Sep 26, 2021, at 2:16 PM, [email protected] wrote:
>
> From: Trond Myklebust <[email protected]>
>
> Both NFSv3 and NFSv2 generate their change attribute from the ctime
> value that was supplied by the server. However the problem is that there
> are plenty of servers out there with ctime resolutions of 1ms or worse.
> In a modern performance system, this is insufficient when trying to
> decide which is the most recent set of attributes when, for instance, a
> READ or GETATTR call races with a WRITE or SETATTR.

I agree 100% that a legacy NFS client shouldn't rely on ctime
alone for tracking server-side changes, exactly because of
the timestamp resolution issue.


> For this reason, let's revert to labelling the NFSv2/v3 change
> attributes as NFS4_CHANGE_TYPE_IS_UNDEFINED. This will ensure we protect
> against such races.
>
> Fixes: 7b24dacf0840 ("NFS: Another inode revalidation improvement")

Perhaps this should be:

Fixes: 7f08a3359a3c ("NFSv4: Add support for the NFSv4.2 "change_attr_type" attribute")


> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/nfs3xdr.c | 2 +-
> fs/nfs/proc.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
> index e6eca1d7481b..9274c9c5efea 100644
> --- a/fs/nfs/nfs3xdr.c
> +++ b/fs/nfs/nfs3xdr.c
> @@ -2227,7 +2227,7 @@ static int decode_fsinfo3resok(struct xdr_stream *xdr,
>
> /* ignore properties */
> result->lease_time = 0;
> - result->change_attr_type = NFS4_CHANGE_TYPE_IS_TIME_METADATA;
> + result->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
> return 0;
> }
>
> diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
> index ea19dbf12301..ecc4e717808c 100644
> --- a/fs/nfs/proc.c
> +++ b/fs/nfs/proc.c
> @@ -91,7 +91,7 @@ nfs_proc_get_root(struct nfs_server *server, struct nfs_fh *fhandle,
> info->dtpref = fsinfo.tsize;
> info->maxfilesize = 0x7FFFFFFF;
> info->lease_time = 0;
> - info->change_attr_type = NFS4_CHANGE_TYPE_IS_TIME_METADATA;
> + info->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
> return 0;
> }
>

Unfortunately this patch did not change the behavior of fsx, and I
still observe readahead racing with truncation. That's not too
surprising since nfs_inode_attrs_cmp() is already returning zero
in the racing case (see previous e-mail).


fsx-284916 [011] 1238.734038: nfs_aops_readpages: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466 nr_pages=8
fsx-284916 [011] 1238.734047: nfs_initiate_read: fileid=00:28:2 fhandle=0x36fbbe51 offset=102400 count=32768

fsx-284916 [011] 1238.734055: nfs_setattr_enter: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466
fsx-284916 [011] 1238.734055: nfs_writeback_inode_enter: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466
fsx-284916 [011] 1238.734056: nfs_writeback_inode_exit: error=0 (OK) fileid=00:28:2 fhandle=0x36fbbe51 type=8 (REG) version=1753079961650632466 size=206226 cache_validity=0x2000 (DATA_INVAL_DEFER) nfs_
flags=0x4 (ACL_LRU_SET)
fsx-284916 [011] 1238.734080: bprint: nfs3_xdr_dec_setattr3res: task:53611@5 size=0x3ff5a
fsx-284916 [011] 1238.734082: nfs_size_truncate: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466 cursize=206226 newsize=261978
fsx-284916 [011] 1238.734082: nfs_inode_attrs_cmp: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466 res=0
fsx-284916 [011] 1238.734083: nfs_refresh_inode_enter: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466
fsx-284916 [011] 1238.734083: nfs_partial_attr_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466 cache_validity=DATA_INVAL_DEFER fattr_validity=TYPE|MODE|NLINK|OWNER|GROUP|RDEV|SIZE
|PRE_SIZE|SPACE_USED|FSID|FILEID|ATIME|MTIME|CTIME|PRE_MTIME|PRE_CTIME|CHANGE|PRE_CHANGE
fsx-284916 [011] 1238.734083: nfs_check_attrs: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466 valid_flags=
fsx-284916 [011] 1238.734083: nfs_refresh_inode_exit: error=0 (OK) fileid=00:28:2 fhandle=0x36fbbe51 type=8 (REG) version=1753079961650632466 size=261978 cache_validity=0x2000 (DATA_INVAL_DEFER) nfs_flags=0x4 (ACL_LRU_SET)
fsx-284916 [011] 1238.734083: nfs_setattr_exit: error=0 (OK) fileid=00:28:2 fhandle=0x36fbbe51 type=8 (REG) version=1753079961650632466 size=261978 cache_validity=0x2000 (DATA_INVAL_DEFER) nfs_flags=0x4 (ACL_LRU_SET)

kworker/u24:13-17203 [002] 1238.734088: bprint: nfs3_xdr_dec_read3res: task:53610@5 size=0x32592
kworker/u24:13-17203 [002] 1238.734088: nfs_inode_attrs_cmp: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466 res=0
kworker/u24:13-17203 [002] 1238.734089: nfs_refresh_inode_enter: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466
kworker/u24:13-17203 [002] 1238.734089: nfs_partial_attr_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466 cache_validity=INVALID_ATIME|DATA_INVAL_DEFER fattr_validity=TYPE|MODE|NLINK|OWNER|GROUP|RDEV|SIZE|SPACE_USED|FSID|FILEID|ATIME|MTIME|CTIME|CHANGE
kworker/u24:13-17203 [002] 1238.734089: nfs_size_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1753079961650632466 cursize=261978 newsize=206226
kworker/u24:13-17203 [002] 1238.734089: nfs_refresh_inode_exit: error=0 (OK) fileid=00:28:2 fhandle=0x36fbbe51 type=8 (REG) version=1753079961650632466 size=206226 cache_validity=0x2000 (DATA_INVAL_DEFER) nfs_flags=0x4 (ACL_LRU_SET)
kworker/u24:13-17203 [002] 1238.734089: nfs_readpage_done: fileid=00:28:2 fhandle=0x36fbbe51 offset=102400 count=32768 res=32768 status=32768


--
Chuck Lever