2020-11-30 21:27:10

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 0/6] Patches to support NFS re-exporting

From: Trond Myklebust <[email protected]>

These patches fix a number of issues that Hammerspace has hit when doing
re-exporting of NFS.

Jeff Layton (3):
nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
nfsd: allow filesystems to opt out of subtree checking
nfsd: close cached files prior to a REMOVE or RENAME that would
replace target

Trond Myklebust (3):
exportfs: Add a function to return the raw output from fh_to_dentry()
nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
nfsd: Set PF_LOCAL_THROTTLE on local filesystems only

Documentation/filesystems/nfs/exporting.rst | 52 +++++++++++++++++++++
fs/exportfs/expfs.c | 32 +++++++++----
fs/nfs/export.c | 2 +
fs/nfsd/export.c | 6 +++
fs/nfsd/nfs3xdr.c | 7 ++-
fs/nfsd/nfsfh.c | 30 ++++++++++--
fs/nfsd/nfsfh.h | 2 +-
fs/nfsd/vfs.c | 29 ++++++++----
include/linux/exportfs.h | 10 ++++
9 files changed, 146 insertions(+), 24 deletions(-)

--
2.28.0


2020-11-30 21:28:14

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

From: Jeff Layton <[email protected]>

With NFSv3 nfsd will always attempt to send along WCC data to the
client. This generally involves saving off the in-core inode information
prior to doing the operation on the given filehandle, and then issuing a
vfs_getattr to it after the op.

Some filesystems (particularly clustered or networked ones) have an
expensive ->getattr inode operation. Atomicitiy is also often difficult
or impossible to guarantee on such filesystems. For those, we're best
off not trying to provide WCC information to the client at all, and to
simply allow it to poll for that information as needed with a GETATTR
RPC.

This patch adds a new flags field to struct export_operations, and
defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
that nfsd should not attempt to provide WCC info in NFSv3 replies. It
also adds a blurb about the new flags field and flag to the exporting
documentation.

The server will also now skip collecting this information for NFSv2 as
well, since that info is never used there anyway.

Note that this patch does not add this flag to any filesystem
export_operations structures. This was originally developed to allow
reexporting nfs via nfsd. That code is not (and may never be) suitable
for merging into mainline.

Other filesystems may want to consider enabling this flag too. It's hard
to tell however which ones have export operations to enable export via
knfsd and which ones mostly rely on them for open-by-filehandle support,
so I'm leaving that up to the individual maintainers to decide. I am
cc'ing the relevant lists for those filesystems that I think may want to
consider adding this though.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
Documentation/filesystems/nfs/exporting.rst | 27 +++++++++++++++++++++
fs/nfs/export.c | 1 +
fs/nfsd/nfs3xdr.c | 7 ++++--
fs/nfsd/nfsfh.c | 14 +++++++++++
fs/nfsd/nfsfh.h | 2 +-
include/linux/exportfs.h | 2 ++
6 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst
index 33d588a01ace..a3e3805833d1 100644
--- a/Documentation/filesystems/nfs/exporting.rst
+++ b/Documentation/filesystems/nfs/exporting.rst
@@ -154,6 +154,11 @@ struct which has the following members:
to find potential names, and matches inode numbers to find the correct
match.

+ flags
+ Some filesystems may need to be handled differently than others. The
+ export_operations struct also includes a flags field that allows the
+ filesystem to communicate such information to nfsd. See the Export
+ Operations Flags section below for more explanation.

A filehandle fragment consists of an array of 1 or more 4byte words,
together with a one byte "type".
@@ -163,3 +168,25 @@ generated by encode_fh, in which case it will have been padded with
nuls. Rather, the encode_fh routine should choose a "type" which
indicates the decode_fh how much of the filehandle is valid, and how
it should be interpreted.
+
+Export Operations Flags
+-----------------------
+In addition to the operation vector pointers, struct export_operations also
+contains a "flags" field that allows the filesystem to communicate to nfsd
+that it may want to do things differently when dealing with it. The
+following flags are defined:
+
+ EXPORT_OP_NOWCC
+ RFC 1813 recommends that servers always send weak cache consistency
+ (WCC) data to the client after each operation. The server should
+ atomically collect attributes about the inode, do an operation on it,
+ and then collect the attributes afterward. This allows the client to
+ skip issuing GETATTRs in some situations but means that the server
+ is calling vfs_getattr for almost all RPCs. On some filesystems
+ (particularly those that are clustered or networked) this is expensive
+ and atomicity is difficult to guarantee. This flag indicates to nfsd
+ that it should skip providing WCC attributes to the client in NFSv3
+ replies when doing operations on this filesystem. Consider enabling
+ this on filesystems that have an expensive ->getattr inode operation,
+ or when atomicity between pre and post operation attribute collection
+ is impossible to guarantee.
diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index 3430d6891e89..8f4c528865c5 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -171,4 +171,5 @@ const struct export_operations nfs_export_ops = {
.encode_fh = nfs_encode_fh,
.fh_to_dentry = nfs_fh_to_dentry,
.get_parent = nfs_get_parent,
+ .flags = EXPORT_OP_NOWCC,
};
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 2277f83da250..480342675292 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -206,7 +206,7 @@ static __be32 *
encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp)
{
struct dentry *dentry = fhp->fh_dentry;
- if (dentry && d_really_is_positive(dentry)) {
+ if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
__be32 err;
struct kstat stat;

@@ -261,7 +261,7 @@ void fill_pre_wcc(struct svc_fh *fhp)
struct kstat stat;
__be32 err;

- if (fhp->fh_pre_saved)
+ if (fhp->fh_no_wcc || fhp->fh_pre_saved)
return;

inode = d_inode(fhp->fh_dentry);
@@ -287,6 +287,9 @@ void fill_post_wcc(struct svc_fh *fhp)
{
__be32 err;

+ if (fhp->fh_no_wcc)
+ return;
+
if (fhp->fh_post_saved)
printk("nfsd: inode locked twice during operation.\n");

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index c81dbbad8792..0c2ee65e46f3 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -291,6 +291,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)

fhp->fh_dentry = dentry;
fhp->fh_export = exp;
+
+ switch (rqstp->rq_vers) {
+ case 3:
+ if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
+ break;
+ /* Fallthrough */
+ case 2:
+ fhp->fh_no_wcc = true;
+ }
+
return 0;
out:
exp_put(exp);
@@ -559,6 +569,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
*/
set_version_and_fsid_type(fhp, exp, ref_fh);

+ /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
+ fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
+
if (ref_fh == fhp)
fh_put(ref_fh);

@@ -662,6 +675,7 @@ fh_put(struct svc_fh *fhp)
exp_put(exp);
fhp->fh_export = NULL;
}
+ fhp->fh_no_wcc = false;
return;
}

diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 56cfbc361561..fb2b60a76b32 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -35,6 +35,7 @@ typedef struct svc_fh {

bool fh_locked; /* inode locked by us */
bool fh_want_write; /* remount protection taken */
+ bool fh_no_wcc; /* no wcc data needed */
int fh_flags; /* FH flags */
#ifdef CONFIG_NFSD_V3
bool fh_post_saved; /* post-op attrs saved */
@@ -54,7 +55,6 @@ typedef struct svc_fh {
struct kstat fh_post_attr; /* full attrs after operation */
u64 fh_post_change; /* nfsv4 change; see above */
#endif /* CONFIG_NFSD_V3 */
-
} svc_fh;
#define NFSD4_FH_FOREIGN (1<<0)
#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 3ceb72b67a7a..e7de0103a32e 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -213,6 +213,8 @@ struct export_operations {
bool write, u32 *device_generation);
int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
int nr_iomaps, struct iattr *iattr);
+#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */
+ unsigned long flags;
};

extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
--
2.28.0

2020-11-30 23:03:09

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 0/6] Patches to support NFS re-exporting

Hi Trond-

> On Nov 30, 2020, at 4:24 PM, [email protected] wrote:
>
> From: Trond Myklebust <[email protected]>
>
> These patches fix a number of issues that Hammerspace has hit when doing
> re-exporting of NFS.

These do not apply on top of Bruce's changes in the same area.
I've prepared a tree that you can apply onto.

See the cel-next topic branch in this repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git


> Jeff Layton (3):
> nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
> nfsd: allow filesystems to opt out of subtree checking
> nfsd: close cached files prior to a REMOVE or RENAME that would
> replace target
>
> Trond Myklebust (3):
> exportfs: Add a function to return the raw output from fh_to_dentry()
> nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
> nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
>
> Documentation/filesystems/nfs/exporting.rst | 52 +++++++++++++++++++++
> fs/exportfs/expfs.c | 32 +++++++++----
> fs/nfs/export.c | 2 +
> fs/nfsd/export.c | 6 +++
> fs/nfsd/nfs3xdr.c | 7 ++-
> fs/nfsd/nfsfh.c | 30 ++++++++++--
> fs/nfsd/nfsfh.h | 2 +-
> fs/nfsd/vfs.c | 29 ++++++++----
> include/linux/exportfs.h | 10 ++++
> 9 files changed, 146 insertions(+), 24 deletions(-)
>
> --
> 2.28.0
>

--
Chuck Lever



2020-11-30 23:04:07

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 0/6] Patches to support NFS re-exporting

On Mon, 2020-11-30 at 16:40 -0500, Chuck Lever wrote:
> Hi Trond-
>
> > On Nov 30, 2020, at 4:24 PM, [email protected] wrote:
> >
> > From: Trond Myklebust <[email protected]>
> >
> > These patches fix a number of issues that Hammerspace has hit when
> > doing
> > re-exporting of NFS.
>
> These do not apply on top of Bruce's changes in the same area.
> I've prepared a tree that you can apply onto.
>

Hmm... They rebased cleanly on top of your branch, but I'll resend
those rebased patches.

> See the cel-next topic branch in this repo:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
>
> > Jeff Layton (3):
> >  nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
> >  nfsd: allow filesystems to opt out of subtree checking
> >  nfsd: close cached files prior to a REMOVE or RENAME that would
> >    replace target
> >
> > Trond Myklebust (3):
> >  exportfs: Add a function to return the raw output from
> > fh_to_dentry()
> >  nfsd: Fix up nfsd to ensure that timeout errors don't result in
> > ESTALE
> >  nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
> >
> > Documentation/filesystems/nfs/exporting.rst | 52
> > +++++++++++++++++++++
> > fs/exportfs/expfs.c                         | 32 +++++++++----
> > fs/nfs/export.c                             |  2 +
> > fs/nfsd/export.c                            |  6 +++
> > fs/nfsd/nfs3xdr.c                           |  7 ++-
> > fs/nfsd/nfsfh.c                             | 30 ++++++++++--
> > fs/nfsd/nfsfh.h                             |  2 +-
> > fs/nfsd/vfs.c                               | 29 ++++++++----
> > include/linux/exportfs.h                    | 10 ++++
> > 9 files changed, 146 insertions(+), 24 deletions(-)
> >
> > --
> > 2.28.0
> >
>
> --
> Chuck Lever
>
>
>

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-11-30 23:15:21

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

This is great, thanks:

On Mon, Nov 30, 2020 at 04:24:50PM -0500, [email protected] wrote:
> From: Jeff Layton <[email protected]>
>
> With NFSv3 nfsd will always attempt to send along WCC data to the
> client. This generally involves saving off the in-core inode information
> prior to doing the operation on the given filehandle, and then issuing a
> vfs_getattr to it after the op.
>
> Some filesystems (particularly clustered or networked ones) have an
> expensive ->getattr inode operation. Atomicitiy is also often difficult
> or impossible to guarantee on such filesystems. For those, we're best
> off not trying to provide WCC information to the client at all, and to
> simply allow it to poll for that information as needed with a GETATTR
> RPC.
>
> This patch adds a new flags field to struct export_operations, and
> defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
> that nfsd should not attempt to provide WCC info in NFSv3 replies. It
> also adds a blurb about the new flags field and flag to the exporting
> documentation.

In the v4 case I think it should also turn off the "atomic" flag in the
change_info4 structure that's returned by some operations.

(Out of curiosity: have you seen this cause actual bugs?)

--b.

2020-11-30 23:17:47

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations



> On Nov 30, 2020, at 4:24 PM, [email protected] wrote:
>
> From: Jeff Layton <[email protected]>
>
> With NFSv3 nfsd will always attempt to send along WCC data to the
> client. This generally involves saving off the in-core inode information
> prior to doing the operation on the given filehandle, and then issuing a
> vfs_getattr to it after the op.
>
> Some filesystems (particularly clustered or networked ones) have an
> expensive ->getattr inode operation. Atomicitiy is also often difficult
> or impossible to guarantee on such filesystems. For those, we're best
> off not trying to provide WCC information to the client at all, and to
> simply allow it to poll for that information as needed with a GETATTR
> RPC.
>
> This patch adds a new flags field to struct export_operations, and
> defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
> that nfsd should not attempt to provide WCC info in NFSv3 replies. It
> also adds a blurb about the new flags field and flag to the exporting
> documentation.
>
> The server will also now skip collecting this information for NFSv2 as
> well, since that info is never used there anyway.
>
> Note that this patch does not add this flag to any filesystem
> export_operations structures. This was originally developed to allow
> reexporting nfs via nfsd. That code is not (and may never be) suitable
> for merging into mainline.
>
> Other filesystems may want to consider enabling this flag too. It's hard
> to tell however which ones have export operations to enable export via
> knfsd and which ones mostly rely on them for open-by-filehandle support,
> so I'm leaving that up to the individual maintainers to decide. I am
> cc'ing the relevant lists for those filesystems that I think may want to
> consider adding this though.
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Jeff Layton <[email protected]>
> Signed-off-by: Lance Shelton <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>

These seem to apply fine, thanks for resending.

If you post a v3 to address Bruce's comment, can you also
address this checkpatch nit?


WARNING: Prefer 'fallthrough;' over fallthrough comment
#154: FILE: fs/nfsd/nfsfh.c:299:
+ /* Fallthrough */

total: 0 errors, 1 warnings, 120 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.


> ---
> Documentation/filesystems/nfs/exporting.rst | 27 +++++++++++++++++++++
> fs/nfs/export.c | 1 +
> fs/nfsd/nfs3xdr.c | 7 ++++--
> fs/nfsd/nfsfh.c | 14 +++++++++++
> fs/nfsd/nfsfh.h | 2 +-
> include/linux/exportfs.h | 2 ++
> 6 files changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst
> index 33d588a01ace..a3e3805833d1 100644
> --- a/Documentation/filesystems/nfs/exporting.rst
> +++ b/Documentation/filesystems/nfs/exporting.rst
> @@ -154,6 +154,11 @@ struct which has the following members:
> to find potential names, and matches inode numbers to find the correct
> match.
>
> + flags
> + Some filesystems may need to be handled differently than others. The
> + export_operations struct also includes a flags field that allows the
> + filesystem to communicate such information to nfsd. See the Export
> + Operations Flags section below for more explanation.
>
> A filehandle fragment consists of an array of 1 or more 4byte words,
> together with a one byte "type".
> @@ -163,3 +168,25 @@ generated by encode_fh, in which case it will have been padded with
> nuls. Rather, the encode_fh routine should choose a "type" which
> indicates the decode_fh how much of the filehandle is valid, and how
> it should be interpreted.
> +
> +Export Operations Flags
> +-----------------------
> +In addition to the operation vector pointers, struct export_operations also
> +contains a "flags" field that allows the filesystem to communicate to nfsd
> +that it may want to do things differently when dealing with it. The
> +following flags are defined:
> +
> + EXPORT_OP_NOWCC
> + RFC 1813 recommends that servers always send weak cache consistency
> + (WCC) data to the client after each operation. The server should
> + atomically collect attributes about the inode, do an operation on it,
> + and then collect the attributes afterward. This allows the client to
> + skip issuing GETATTRs in some situations but means that the server
> + is calling vfs_getattr for almost all RPCs. On some filesystems
> + (particularly those that are clustered or networked) this is expensive
> + and atomicity is difficult to guarantee. This flag indicates to nfsd
> + that it should skip providing WCC attributes to the client in NFSv3
> + replies when doing operations on this filesystem. Consider enabling
> + this on filesystems that have an expensive ->getattr inode operation,
> + or when atomicity between pre and post operation attribute collection
> + is impossible to guarantee.
> diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> index 3430d6891e89..8f4c528865c5 100644
> --- a/fs/nfs/export.c
> +++ b/fs/nfs/export.c
> @@ -171,4 +171,5 @@ const struct export_operations nfs_export_ops = {
> .encode_fh = nfs_encode_fh,
> .fh_to_dentry = nfs_fh_to_dentry,
> .get_parent = nfs_get_parent,
> + .flags = EXPORT_OP_NOWCC,
> };
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 2277f83da250..480342675292 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -206,7 +206,7 @@ static __be32 *
> encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp)
> {
> struct dentry *dentry = fhp->fh_dentry;
> - if (dentry && d_really_is_positive(dentry)) {
> + if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
> __be32 err;
> struct kstat stat;
>
> @@ -261,7 +261,7 @@ void fill_pre_wcc(struct svc_fh *fhp)
> struct kstat stat;
> __be32 err;
>
> - if (fhp->fh_pre_saved)
> + if (fhp->fh_no_wcc || fhp->fh_pre_saved)
> return;
>
> inode = d_inode(fhp->fh_dentry);
> @@ -287,6 +287,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> {
> __be32 err;
>
> + if (fhp->fh_no_wcc)
> + return;
> +
> if (fhp->fh_post_saved)
> printk("nfsd: inode locked twice during operation.\n");
>
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index c81dbbad8792..0c2ee65e46f3 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -291,6 +291,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>
> fhp->fh_dentry = dentry;
> fhp->fh_export = exp;
> +
> + switch (rqstp->rq_vers) {
> + case 3:
> + if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
> + break;
> + /* Fallthrough */
> + case 2:
> + fhp->fh_no_wcc = true;
> + }
> +
> return 0;
> out:
> exp_put(exp);
> @@ -559,6 +569,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> */
> set_version_and_fsid_type(fhp, exp, ref_fh);
>
> + /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
> + fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
> +
> if (ref_fh == fhp)
> fh_put(ref_fh);
>
> @@ -662,6 +675,7 @@ fh_put(struct svc_fh *fhp)
> exp_put(exp);
> fhp->fh_export = NULL;
> }
> + fhp->fh_no_wcc = false;
> return;
> }
>
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 56cfbc361561..fb2b60a76b32 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -35,6 +35,7 @@ typedef struct svc_fh {
>
> bool fh_locked; /* inode locked by us */
> bool fh_want_write; /* remount protection taken */
> + bool fh_no_wcc; /* no wcc data needed */
> int fh_flags; /* FH flags */
> #ifdef CONFIG_NFSD_V3
> bool fh_post_saved; /* post-op attrs saved */
> @@ -54,7 +55,6 @@ typedef struct svc_fh {
> struct kstat fh_post_attr; /* full attrs after operation */
> u64 fh_post_change; /* nfsv4 change; see above */
> #endif /* CONFIG_NFSD_V3 */
> -
> } svc_fh;
> #define NFSD4_FH_FOREIGN (1<<0)
> #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index 3ceb72b67a7a..e7de0103a32e 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -213,6 +213,8 @@ struct export_operations {
> bool write, u32 *device_generation);
> int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
> int nr_iomaps, struct iattr *iattr);
> +#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */
> + unsigned long flags;
> };
>
> extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> --
> 2.28.0
>

--
Chuck Lever



2020-12-01 00:17:21

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Mon, 2020-11-30 at 16:24 -0500, [email protected] wrote:
> From: Jeff Layton <[email protected]>
>
> With NFSv3 nfsd will always attempt to send along WCC data to the
> client. This generally involves saving off the in-core inode information
> prior to doing the operation on the given filehandle, and then issuing a
> vfs_getattr to it after the op.
>
> Some filesystems (particularly clustered or networked ones) have an
> expensive ->getattr inode operation. Atomicitiy is also often difficult
> or impossible to guarantee on such filesystems. For those, we're best
> off not trying to provide WCC information to the client at all, and to
> simply allow it to poll for that information as needed with a GETATTR
> RPC.
>
> This patch adds a new flags field to struct export_operations, and
> defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
> that nfsd should not attempt to provide WCC info in NFSv3 replies. It
> also adds a blurb about the new flags field and flag to the exporting
> documentation.
>
> The server will also now skip collecting this information for NFSv2 as
> well, since that info is never used there anyway.
>
> Note that this patch does not add this flag to any filesystem
> export_operations structures. This was originally developed to allow
> reexporting nfs via nfsd. That code is not (and may never be) suitable
> for merging into mainline.
>

Probably ought to fix up the above paragraph since we are now merging
this into mainline.

> Other filesystems may want to consider enabling this flag too. It's hard
> to tell however which ones have export operations to enable export via
> knfsd and which ones mostly rely on them for open-by-filehandle support,
> so I'm leaving that up to the individual maintainers to decide. I am
> cc'ing the relevant lists for those filesystems that I think may want to
> consider adding this though.
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Jeff Layton <[email protected]>
> Signed-off-by: Lance Shelton <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>  Documentation/filesystems/nfs/exporting.rst | 27 +++++++++++++++++++++
>  fs/nfs/export.c | 1 +
>  fs/nfsd/nfs3xdr.c | 7 ++++--
>  fs/nfsd/nfsfh.c | 14 +++++++++++
>  fs/nfsd/nfsfh.h | 2 +-
>  include/linux/exportfs.h | 2 ++
>  6 files changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst
> index 33d588a01ace..a3e3805833d1 100644
> --- a/Documentation/filesystems/nfs/exporting.rst
> +++ b/Documentation/filesystems/nfs/exporting.rst
> @@ -154,6 +154,11 @@ struct which has the following members:
>      to find potential names, and matches inode numbers to find the correct
>      match.
>  
>
>
>
> + flags
> + Some filesystems may need to be handled differently than others. The
> + export_operations struct also includes a flags field that allows the
> + filesystem to communicate such information to nfsd. See the Export
> + Operations Flags section below for more explanation.
>  
>
>
>
>  A filehandle fragment consists of an array of 1 or more 4byte words,
>  together with a one byte "type".
> @@ -163,3 +168,25 @@ generated by encode_fh, in which case it will have been padded with
>  nuls. Rather, the encode_fh routine should choose a "type" which
>  indicates the decode_fh how much of the filehandle is valid, and how
>  it should be interpreted.
> +
> +Export Operations Flags
> +-----------------------
> +In addition to the operation vector pointers, struct export_operations also
> +contains a "flags" field that allows the filesystem to communicate to nfsd
> +that it may want to do things differently when dealing with it. The
> +following flags are defined:
> +
> + EXPORT_OP_NOWCC
> + RFC 1813 recommends that servers always send weak cache consistency
> + (WCC) data to the client after each operation. The server should
> + atomically collect attributes about the inode, do an operation on it,
> + and then collect the attributes afterward. This allows the client to
> + skip issuing GETATTRs in some situations but means that the server
> + is calling vfs_getattr for almost all RPCs. On some filesystems
> + (particularly those that are clustered or networked) this is expensive
> + and atomicity is difficult to guarantee. This flag indicates to nfsd
> + that it should skip providing WCC attributes to the client in NFSv3
> + replies when doing operations on this filesystem. Consider enabling
> + this on filesystems that have an expensive ->getattr inode operation,
> + or when atomicity between pre and post operation attribute collection
> + is impossible to guarantee.
> diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> index 3430d6891e89..8f4c528865c5 100644
> --- a/fs/nfs/export.c
> +++ b/fs/nfs/export.c
> @@ -171,4 +171,5 @@ const struct export_operations nfs_export_ops = {
>   .encode_fh = nfs_encode_fh,
>   .fh_to_dentry = nfs_fh_to_dentry,
>   .get_parent = nfs_get_parent,
> + .flags = EXPORT_OP_NOWCC,
>  };
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 2277f83da250..480342675292 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -206,7 +206,7 @@ static __be32 *
>  encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp)
>  {
>   struct dentry *dentry = fhp->fh_dentry;
> - if (dentry && d_really_is_positive(dentry)) {
> + if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
>   __be32 err;
>   struct kstat stat;
>  
>
>
>
> @@ -261,7 +261,7 @@ void fill_pre_wcc(struct svc_fh *fhp)
>   struct kstat stat;
>   __be32 err;
>  
>
>
>
> - if (fhp->fh_pre_saved)
> + if (fhp->fh_no_wcc || fhp->fh_pre_saved)
>   return;
>  
>
>
>
>   inode = d_inode(fhp->fh_dentry);
> @@ -287,6 +287,9 @@ void fill_post_wcc(struct svc_fh *fhp)
>  {
>   __be32 err;
>  
>
>
>
> + if (fhp->fh_no_wcc)
> + return;
> +
>   if (fhp->fh_post_saved)
>   printk("nfsd: inode locked twice during operation.\n");
>  
>
>
>
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index c81dbbad8792..0c2ee65e46f3 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -291,6 +291,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  
>
>
>
>   fhp->fh_dentry = dentry;
>   fhp->fh_export = exp;
> +
> + switch (rqstp->rq_vers) {
> + case 3:
> + if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
> + break;
> + /* Fallthrough */
> + case 2:
> + fhp->fh_no_wcc = true;
> + }
> +
>   return 0;
>  out:
>   exp_put(exp);
> @@ -559,6 +569,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
>   */
>   set_version_and_fsid_type(fhp, exp, ref_fh);
>  
>
>
>
> + /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
> + fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
> +
>   if (ref_fh == fhp)
>   fh_put(ref_fh);
>  
>
>
>
> @@ -662,6 +675,7 @@ fh_put(struct svc_fh *fhp)
>   exp_put(exp);
>   fhp->fh_export = NULL;
>   }
> + fhp->fh_no_wcc = false;
>   return;
>  }
>  
>
>
>
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 56cfbc361561..fb2b60a76b32 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -35,6 +35,7 @@ typedef struct svc_fh {
>  
>
>
>
>   bool fh_locked; /* inode locked by us */
>   bool fh_want_write; /* remount protection taken */
> + bool fh_no_wcc; /* no wcc data needed */
>   int fh_flags; /* FH flags */
>  #ifdef CONFIG_NFSD_V3
>   bool fh_post_saved; /* post-op attrs saved */
> @@ -54,7 +55,6 @@ typedef struct svc_fh {
>   struct kstat fh_post_attr; /* full attrs after operation */
>   u64 fh_post_change; /* nfsv4 change; see above */
>  #endif /* CONFIG_NFSD_V3 */
> -
>  } svc_fh;
>  #define NFSD4_FH_FOREIGN (1<<0)
>  #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index 3ceb72b67a7a..e7de0103a32e 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -213,6 +213,8 @@ struct export_operations {
>   bool write, u32 *device_generation);
>   int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
>   int nr_iomaps, struct iattr *iattr);
> +#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */
> + unsigned long flags;
>  };
>  
>
>
>
>  extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,

--
Jeff Layton <[email protected]>

2020-12-01 00:35:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Mon, 2020-11-30 at 17:58 -0500, J. Bruce Fields wrote:
> This is great, thanks:
>
> On Mon, Nov 30, 2020 at 04:24:50PM -0500, [email protected] wrote:
> > From: Jeff Layton <[email protected]>
> >
> > With NFSv3 nfsd will always attempt to send along WCC data to the
> > client. This generally involves saving off the in-core inode
> > information
> > prior to doing the operation on the given filehandle, and then
> > issuing a
> > vfs_getattr to it after the op.
> >
> > Some filesystems (particularly clustered or networked ones) have an
> > expensive ->getattr inode operation. Atomicitiy is also often
> > difficult
> > or impossible to guarantee on such filesystems. For those, we're
> > best
> > off not trying to provide WCC information to the client at all, and
> > to
> > simply allow it to poll for that information as needed with a
> > GETATTR
> > RPC.
> >
> > This patch adds a new flags field to struct export_operations, and
> > defines a new EXPORT_OP_NOWCC flag that filesystems can use to
> > indicate
> > that nfsd should not attempt to provide WCC info in NFSv3 replies.
> > It
> > also adds a blurb about the new flags field and flag to the
> > exporting
> > documentation.
>
> In the v4 case I think it should also turn off the "atomic" flag in
> the
> change_info4 structure that's returned by some operations.
>
> (Out of curiosity: have you seen this cause actual bugs?)
>

Not so much bugs, but it definitely causes inefficiencies. The client
has to go to the server for every one of the WCC GETATTR calls, and
needs to serialise that with the operations. It's just a latency hog
for very little gain when you are doing bulk writing.


--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-12-01 00:49:07

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Mon, 2020-11-30 at 17:58 -0500, J. Bruce Fields wrote:
> This is great, thanks:
>
> On Mon, Nov 30, 2020 at 04:24:50PM -0500, [email protected] wrote:
> > From: Jeff Layton <[email protected]>
> >
> > With NFSv3 nfsd will always attempt to send along WCC data to the
> > client. This generally involves saving off the in-core inode
> > information
> > prior to doing the operation on the given filehandle, and then
> > issuing a
> > vfs_getattr to it after the op.
> >
> > Some filesystems (particularly clustered or networked ones) have an
> > expensive ->getattr inode operation. Atomicitiy is also often
> > difficult
> > or impossible to guarantee on such filesystems. For those, we're
> > best
> > off not trying to provide WCC information to the client at all, and
> > to
> > simply allow it to poll for that information as needed with a
> > GETATTR
> > RPC.
> >
> > This patch adds a new flags field to struct export_operations, and
> > defines a new EXPORT_OP_NOWCC flag that filesystems can use to
> > indicate
> > that nfsd should not attempt to provide WCC info in NFSv3 replies.
> > It
> > also adds a blurb about the new flags field and flag to the
> > exporting
> > documentation.
>
> In the v4 case I think it should also turn off the "atomic" flag in
> the
> change_info4 structure that's returned by some operations.
>

To answer this comment (which I missed earlier): I don't know that we
can turn off WCC for NFSv4. The GETATTR is a completely separate
operation, so the server would have to second-guess what the client
needs it for in order to optimise it away.

That is why this patch is labelled as being an optimisation for NFSv3
only in the comments above.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-12-01 00:53:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Mon, 2020-11-30 at 18:11 -0500, Chuck Lever wrote:
>
>
> > On Nov 30, 2020, at 4:24 PM, [email protected] wrote:
> >
> > From: Jeff Layton <[email protected]>
> >
> > With NFSv3 nfsd will always attempt to send along WCC data to the
> > client. This generally involves saving off the in-core inode
> > information
> > prior to doing the operation on the given filehandle, and then
> > issuing a
> > vfs_getattr to it after the op.
> >
> > Some filesystems (particularly clustered or networked ones) have an
> > expensive ->getattr inode operation. Atomicitiy is also often
> > difficult
> > or impossible to guarantee on such filesystems. For those, we're
> > best
> > off not trying to provide WCC information to the client at all, and
> > to
> > simply allow it to poll for that information as needed with a
> > GETATTR
> > RPC.
> >
> > This patch adds a new flags field to struct export_operations, and
> > defines a new EXPORT_OP_NOWCC flag that filesystems can use to
> > indicate
> > that nfsd should not attempt to provide WCC info in NFSv3 replies.
> > It
> > also adds a blurb about the new flags field and flag to the
> > exporting
> > documentation.
> >
> > The server will also now skip collecting this information for NFSv2
> > as
> > well, since that info is never used there anyway.
> >
> > Note that this patch does not add this flag to any filesystem
> > export_operations structures. This was originally developed to
> > allow
> > reexporting nfs via nfsd. That code is not (and may never be)
> > suitable
> > for merging into mainline.
> >
> > Other filesystems may want to consider enabling this flag too. It's
> > hard
> > to tell however which ones have export operations to enable export
> > via
> > knfsd and which ones mostly rely on them for open-by-filehandle
> > support,
> > so I'm leaving that up to the individual maintainers to decide. I
> > am
> > cc'ing the relevant lists for those filesystems that I think may
> > want to
> > consider adding this though.
> >
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Jeff Layton <[email protected]>
> > Signed-off-by: Lance Shelton <[email protected]>
> > Signed-off-by: Trond Myklebust <[email protected]>
>
> These seem to apply fine, thanks for resending.
>
> If you post a v3 to address Bruce's comment, can you also
> address this checkpatch nit?

I'm not seeing how I can address Bruce's comment at this time. I can
send you a v3 that changes the comment to the "fallthrough" obscenity.

>
>
> WARNING: Prefer 'fallthrough;' over fallthrough comment
> #154: FILE: fs/nfsd/nfsfh.c:299:
> +               /* Fallthrough */
>
> total: 0 errors, 1 warnings, 120 lines checked
>
> NOTE: For some of the reported defects, checkpatch may be able to
>       mechanically convert to the typical style using --fix or --fix-
> inplace.
>

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-12-01 02:45:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Tue, Dec 01, 2020 at 12:45:16AM +0000, Trond Myklebust wrote:
> On Mon, 2020-11-30 at 17:58 -0500, J. Bruce Fields wrote:
> > This is great, thanks:
> >
> > On Mon, Nov 30, 2020 at 04:24:50PM -0500, [email protected]?wrote:
> > > From: Jeff Layton <[email protected]>
> > >
> > > With NFSv3 nfsd will always attempt to send along WCC data to the
> > > client. This generally involves saving off the in-core inode
> > > information
> > > prior to doing the operation on the given filehandle, and then
> > > issuing a
> > > vfs_getattr to it after the op.
> > >
> > > Some filesystems (particularly clustered or networked ones) have an
> > > expensive ->getattr inode operation. Atomicitiy is also often
> > > difficult
> > > or impossible to guarantee on such filesystems. For those, we're
> > > best
> > > off not trying to provide WCC information to the client at all, and
> > > to
> > > simply allow it to poll for that information as needed with a
> > > GETATTR
> > > RPC.
> > >
> > > This patch adds a new flags field to struct export_operations, and
> > > defines a new EXPORT_OP_NOWCC flag that filesystems can use to
> > > indicate
> > > that nfsd should not attempt to provide WCC info in NFSv3 replies.
> > > It
> > > also adds a blurb about the new flags field and flag to the
> > > exporting
> > > documentation.
> >
> > In the v4 case I think it should also turn off the "atomic" flag in
> > the
> > change_info4 structure that's returned by some operations.
> >
>
> To answer this comment (which I missed earlier): I don't know that we
> can turn off WCC for NFSv4. The GETATTR is a completely separate
> operation, so the server would have to second-guess what the client
> needs it for in order to optimise it away.

In the v4 case, we're setting the "atomic" field in the change_info4
struct to true even though the returned changeattrs clearly aren't
atomic with the operation in the re-export case.

That atomic field is initialized from fh_post_saved, so we just need to
set it to false in the v4 case as we are in the v3 case already.

Yes, it's true, that doesn't allow any optimizations because we still
have to get the post-op change attributes.

But it's a bug we may as well fix while we're here, and it probably
simplifies this patch if anything....

--b.

>
> That is why this patch is labelled as being an optimisation for NFSv3
> only in the comments above.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2020-12-01 03:08:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Mon, 2020-11-30 at 21:28 -0500, J. Bruce Fields wrote:
> On Tue, Dec 01, 2020 at 12:45:16AM +0000, Trond Myklebust wrote:
> > On Mon, 2020-11-30 at 17:58 -0500, J. Bruce Fields wrote:
> > > This is great, thanks:
> > >
> > > On Mon, Nov 30, 2020 at 04:24:50PM -0500,
> > > [email protected] wrote:
> > > > From: Jeff Layton <[email protected]>
> > > >
> > > > With NFSv3 nfsd will always attempt to send along WCC data to
> > > > the
> > > > client. This generally involves saving off the in-core inode
> > > > information
> > > > prior to doing the operation on the given filehandle, and then
> > > > issuing a
> > > > vfs_getattr to it after the op.
> > > >
> > > > Some filesystems (particularly clustered or networked ones)
> > > > have an
> > > > expensive ->getattr inode operation. Atomicitiy is also often
> > > > difficult
> > > > or impossible to guarantee on such filesystems. For those,
> > > > we're
> > > > best
> > > > off not trying to provide WCC information to the client at all,
> > > > and
> > > > to
> > > > simply allow it to poll for that information as needed with a
> > > > GETATTR
> > > > RPC.
> > > >
> > > > This patch adds a new flags field to struct export_operations,
> > > > and
> > > > defines a new EXPORT_OP_NOWCC flag that filesystems can use to
> > > > indicate
> > > > that nfsd should not attempt to provide WCC info in NFSv3
> > > > replies.
> > > > It
> > > > also adds a blurb about the new flags field and flag to the
> > > > exporting
> > > > documentation.
> > >
> > > In the v4 case I think it should also turn off the "atomic" flag
> > > in
> > > the
> > > change_info4 structure that's returned by some operations.
> > >
> >
> > To answer this comment (which I missed earlier): I don't know that
> > we
> > can turn off WCC for NFSv4. The GETATTR is a completely separate
> > operation, so the server would have to second-guess what the client
> > needs it for in order to optimise it away.
>
> In the v4 case, we're setting the "atomic" field in the change_info4
> struct to true even though the returned changeattrs clearly aren't
> atomic with the operation in the re-export case.
>
> That atomic field is initialized from fh_post_saved, so we just need
> to
> set it to false in the v4 case as we are in the v3 case already.
>
> Yes, it's true, that doesn't allow any optimizations because we still
> have to get the post-op change attributes.
>
> But it's a bug we may as well fix while we're here, and it probably
> simplifies this patch if anything....

I'd argue that is a completely separate issue. This is an optimisation
for NFSv3, whereas what you're talking about is atomicity (whether in
general or for NFSv4 only). I'd therefore prefer to make that a
completely separate export flag so that it can be treated as separate
functionality.

A local filesystem might choose to set the 'non-atomic' flag without
wanting to turn off NFSv3 WCC attributes. Yes, the latter are assumed
to be atomic, but a number of commercial servers do abuse that
assumption in practice.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-12-01 03:14:04

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Tue, Dec 01, 2020 at 03:06:46AM +0000, Trond Myklebust wrote:
> A local filesystem might choose to set the 'non-atomic' flag without
> wanting to turn off NFSv3 WCC attributes. Yes, the latter are assumed
> to be atomic, but a number of commercial servers do abuse that
> assumption in practice.

What do you mean by abusing that assumption?

I thought that leaving off the post-op attrs was the v3 protocol's way
of saying that it couldn't give you atomic wcc information.

--b.

2020-12-01 03:18:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Mon, 2020-11-30 at 22:11 -0500, [email protected] wrote:
> On Tue, Dec 01, 2020 at 03:06:46AM +0000, Trond Myklebust wrote:
> > A local filesystem might choose to set the 'non-atomic' flag
> > without
> > wanting to turn off NFSv3 WCC attributes. Yes, the latter are
> > assumed
> > to be atomic, but a number of commercial servers do abuse that
> > assumption in practice.
>
> What do you mean by abusing that assumption?
>
> I thought that leaving off the post-op attrs was the v3 protocol's
> way
> of saying that it couldn't give you atomic wcc information.
>

I mean that a number of commercial servers will happily return NFSv3
pre/post-operation WCC information that is not atomic with the
operation that is supposed to be 'protected'. This is, after all, why
the NFSv4 "struct change_info4" added the 'atomic' field in the first
place.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-12-01 03:25:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Tue, 2020-12-01 at 03:16 +0000, Trond Myklebust wrote:
> On Mon, 2020-11-30 at 22:11 -0500, [email protected] wrote:
> > On Tue, Dec 01, 2020 at 03:06:46AM +0000, Trond Myklebust wrote:
> > > A local filesystem might choose to set the 'non-atomic' flag
> > > without
> > > wanting to turn off NFSv3 WCC attributes. Yes, the latter are
> > > assumed
> > > to be atomic, but a number of commercial servers do abuse that
> > > assumption in practice.
> >
> > What do you mean by abusing that assumption?
> >
> > I thought that leaving off the post-op attrs was the v3 protocol's
> > way
> > of saying that it couldn't give you atomic wcc information.
> >
>
> I mean that a number of commercial servers will happily return NFSv3
> pre/post-operation WCC information that is not atomic with the
> operation that is supposed to be 'protected'. This is, after all, why
> the NFSv4 "struct change_info4" added the 'atomic' field in the first
> place.

BTW: To be fair, so does knfsd...

At Hammerspace, we had some real problems recently due to XFS exports
returning non-atomic values for the "space used" field. Speculative
preallocation is a real bitch:
https://xfs.org/index.php/XFS_FAQ#Q:_What_is_speculative_preallocation.3F

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-12-01 15:09:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Tue, Dec 01, 2020 at 03:16:41AM +0000, Trond Myklebust wrote:
> On Mon, 2020-11-30 at 22:11 -0500, [email protected] wrote:
> > On Tue, Dec 01, 2020 at 03:06:46AM +0000, Trond Myklebust wrote:
> > > A local filesystem might choose to set the 'non-atomic' flag
> > > without
> > > wanting to turn off NFSv3 WCC attributes. Yes, the latter are
> > > assumed
> > > to be atomic, but a number of commercial servers do abuse that
> > > assumption in practice.
> >
> > What do you mean by abusing that assumption?
> >
> > I thought that leaving off the post-op attrs was the v3 protocol's
> > way
> > of saying that it couldn't give you atomic wcc information.
> >
>
> I mean that a number of commercial servers will happily return NFSv3
> pre/post-operation WCC information that is not atomic with the
> operation that is supposed to be 'protected'.

Oh, OK.

But why do *we* want to do that?

If there's some reason a filesystem really needs NFSv3 post-operation
WCC information without providing an atomic guarantee, they can make
that argument when the filesystem's merged.

Separating these two flags on the off chance a future filesystem may
want to violate the protocol in this way seems unnecessary.

--b.

2020-12-01 15:22:19

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Tue, Dec 01, 2020 at 03:23:00AM +0000, Trond Myklebust wrote:
> On Tue, 2020-12-01 at 03:16 +0000, Trond Myklebust wrote:
> > On Mon, 2020-11-30 at 22:11 -0500, [email protected] wrote:
> > > On Tue, Dec 01, 2020 at 03:06:46AM +0000, Trond Myklebust wrote:
> > > > A local filesystem might choose to set the 'non-atomic' flag
> > > > without
> > > > wanting to turn off NFSv3 WCC attributes. Yes, the latter are
> > > > assumed
> > > > to be atomic, but a number of commercial servers do abuse that
> > > > assumption in practice.
> > >
> > > What do you mean by abusing that assumption?
> > >
> > > I thought that leaving off the post-op attrs was the v3 protocol's
> > > way
> > > of saying that it couldn't give you atomic wcc information.
> > >
> >
> > I mean that a number of commercial servers will happily return NFSv3
> > pre/post-operation WCC information that is not atomic with the
> > operation that is supposed to be 'protected'. This is, after all, why
> > the NFSv4 "struct change_info4" added the 'atomic' field in the first
> > place.
>
> BTW: To be fair, so does knfsd...
>
> At Hammerspace, we had some real problems recently due to XFS exports
> returning non-atomic values for the "space used" field. Speculative
> preallocation is a real bitch:
> https://xfs.org/index.php/XFS_FAQ#Q:_What_is_speculative_preallocation.3F

So you think xfs should omit v3 post-operation attributes and still set
the atomic bit in v4 replies?

Would that have helped in the cases you saw? It seems like speculative
preallocation isn't a problem with atomicity exactly--it couldn't be
avoided by applications cooperating with some locking scheme, for
example, if I'm understanding right.

--b.

2020-12-01 15:53:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Tue, 2020-12-01 at 10:19 -0500, [email protected] wrote:
> On Tue, Dec 01, 2020 at 03:23:00AM +0000, Trond Myklebust wrote:
> > On Tue, 2020-12-01 at 03:16 +0000, Trond Myklebust wrote:
> > > On Mon, 2020-11-30 at 22:11 -0500, [email protected] wrote:
> > > > On Tue, Dec 01, 2020 at 03:06:46AM +0000, Trond Myklebust
> > > > wrote:
> > > > > A local filesystem might choose to set the 'non-atomic' flag
> > > > > without
> > > > > wanting to turn off NFSv3 WCC attributes. Yes, the latter are
> > > > > assumed
> > > > > to be atomic, but a number of commercial servers do abuse
> > > > > that
> > > > > assumption in practice.
> > > >
> > > > What do you mean by abusing that assumption?
> > > >
> > > > I thought that leaving off the post-op attrs was the v3
> > > > protocol's
> > > > way
> > > > of saying that it couldn't give you atomic wcc information.
> > > >
> > >
> > > I mean that a number of commercial servers will happily return
> > > NFSv3
> > > pre/post-operation WCC information that is not atomic with the
> > > operation that is supposed to be 'protected'. This is, after all,
> > > why
> > > the NFSv4 "struct change_info4" added the 'atomic' field in the
> > > first
> > > place.
> >
> > BTW: To be fair, so does knfsd...
> >
> > At Hammerspace, we had some real problems recently due to XFS
> > exports
> > returning non-atomic values for the "space used" field. Speculative
> > preallocation is a real bitch:
> > https://xfs.org/index.php/XFS_FAQ#Q:_What_is_speculative_preallocation.3F
>
> So you think xfs should omit v3 post-operation attributes and still
> set
> the atomic bit in v4 replies?
>
> Would that have helped in the cases you saw?  It seems like
> speculative
> preallocation isn't a problem with atomicity exactly--it couldn't be
> avoided by applications cooperating with some locking scheme, for
> example, if I'm understanding right.
>

Locking doesn't help. This isn't even something that needs multiple
clients. XFS will happily give the client that sends the WRITE one
answer for 'space used' in the WCC attributes, and then a different
answer in a subsequent GETATTR (no change in mtime, ctime or change
attribute) once the speculative allocation has been resolved.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-12-01 19:47:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

On Mon, Nov 30, 2020 at 05:58:42PM -0500, J. Bruce Fields wrote:
> This is great, thanks:
>
> On Mon, Nov 30, 2020 at 04:24:50PM -0500, [email protected] wrote:
> > From: Jeff Layton <[email protected]>
> >
> > With NFSv3 nfsd will always attempt to send along WCC data to the
> > client. This generally involves saving off the in-core inode information
> > prior to doing the operation on the given filehandle, and then issuing a
> > vfs_getattr to it after the op.
> >
> > Some filesystems (particularly clustered or networked ones) have an
> > expensive ->getattr inode operation. Atomicitiy is also often difficult
> > or impossible to guarantee on such filesystems. For those, we're best
> > off not trying to provide WCC information to the client at all, and to
> > simply allow it to poll for that information as needed with a GETATTR
> > RPC.
> >
> > This patch adds a new flags field to struct export_operations, and
> > defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
> > that nfsd should not attempt to provide WCC info in NFSv3 replies. It
> > also adds a blurb about the new flags field and flag to the exporting
> > documentation.
>
> In the v4 case I think it should also turn off the "atomic" flag in the
> change_info4 structure that's returned by some operations.

And then it looks to me like all you need is something like the
following, no need for a fh_no_wcc field or anything, just skip the
stuff you don't want in fill_post_wcc when the flag is set:

--b.

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index bd4edf904bba..0b51f9dd0752 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -289,13 +289,16 @@ void fill_post_wcc(struct svc_fh *fhp)
{
bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE);
struct inode *inode = d_inode(fhp->fh_dentry);
+ struct export_operations *ops = inode->i_sb->s_export_op;

if (fhp->fh_post_saved)
printk("nfsd: inode locked twice during operation.\n");

fhp->fh_post_saved = true;

- if (!v4 || !inode->i_sb->s_export_op->fetch_iversion) {
+ if (ops->flags & EXPORT_OP_NOWCC)
+ fhp->fh_post_saved = false;
+ else if (!v4 || !ops->fetch_iversion) {
__be32 err = fh_getattr(fhp, &fhp->fh_post_attr);
if (err) {
fhp->fh_post_saved = false;