2022-07-28 14:30:33

by Jeff Layton

[permalink] [raw]
Subject: [RFC PATCH] nfs: don't skip CTO on v2/3 mounts, regardless of order of reference puts

Olga reported a case of data corruption in a situation where two clients
(v3 and v4) were alternately doing open/write/close the same file.

Looking at captures, the v3 client failed to perform any of the GETATTR
calls for CTO during one of the events, leading it to overwrite some
data that had been written by the v4 client.

We have two calls that work similarly: put_nfs_open_context and
put_nfs_open_context_sync. The only difference is the is_sync parameter
that gets passed to close_context. The only caller of the _sync variant
is in the close codepath.

On a v2/3 mount, if the last reference is not put by
put_nfs_open_context_sync, then CTO revalidation never happens. Fix this
by adding a new flag to nfs_open_context and set that in
put_nfs_open_context_sync. In nfs_close_context, check for that flag
when we're checking is_sync and treat them as equivalent.

Cc: Scott Mayhew <[email protected]>
Cc: Ben Coddington <[email protected]>
Reported-by: Olga Kornieskaia <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/inode.c | 3 ++-
include/linux/nfs_fs.h | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

I'm not sure this is the right fix. Maybe we'd be better off just
ignoring the is_sync parameter in nfs_close_context? It's probably
functionally equivalent anyway.

Thoughts?

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b4e46b0ffa2d..58b506a0a2f2 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1005,7 +1005,7 @@ void nfs_close_context(struct nfs_open_context *ctx, int is_sync)

if (!(ctx->mode & FMODE_WRITE))
return;
- if (!is_sync)
+ if (!is_sync && !test_bit(NFS_CONTEXT_DO_CLOSE, &ctx->flags))
return;
inode = d_inode(ctx->dentry);
if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
@@ -1091,6 +1091,7 @@ EXPORT_SYMBOL_GPL(put_nfs_open_context);

static void put_nfs_open_context_sync(struct nfs_open_context *ctx)
{
+ set_bit(NFS_CONTEXT_DO_CLOSE, &ctx->flags);
__put_nfs_open_context(ctx, 1);
}

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a17c337dbdf1..faff0d60aad2 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -85,8 +85,9 @@ struct nfs_open_context {
unsigned long flags;
#define NFS_CONTEXT_RESEND_WRITES (1)
#define NFS_CONTEXT_BAD (2)
-#define NFS_CONTEXT_UNLOCK (3)
+#define NFS_CONTEXT_UNLOCK (3)
#define NFS_CONTEXT_FILE_OPEN (4)
+#define NFS_CONTEXT_DO_CLOSE (5)
int error;

struct list_head list;
--
2.37.1


2022-07-28 14:53:49

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [RFC PATCH] nfs: don't skip CTO on v2/3 mounts, regardless of order of reference puts

On Thu, Jul 28, 2022 at 10:30 AM Jeff Layton <[email protected]> wrote:
>
> Olga reported a case of data corruption in a situation where two clients
> (v3 and v4) were alternately doing open/write/close the same file.

My thoughts: I did report it but this problem has been impossible to
reproduce. Thus perhaps this proposal might be the right thing that
needs to be done anyway and that's fine but it's still unclear to me
why the problem isn't more reproducible and whether or not this
proposal will fix it.

> Looking at captures, the v3 client failed to perform any of the GETATTR
> calls for CTO during one of the events, leading it to overwrite some
> data that had been written by the v4 client.
>
> We have two calls that work similarly: put_nfs_open_context and
> put_nfs_open_context_sync. The only difference is the is_sync parameter
> that gets passed to close_context. The only caller of the _sync variant
> is in the close codepath.
>
> On a v2/3 mount, if the last reference is not put by
> put_nfs_open_context_sync, then CTO revalidation never happens. Fix this
> by adding a new flag to nfs_open_context and set that in
> put_nfs_open_context_sync. In nfs_close_context, check for that flag
> when we're checking is_sync and treat them as equivalent.
>
> Cc: Scott Mayhew <[email protected]>
> Cc: Ben Coddington <[email protected]>
> Reported-by: Olga Kornieskaia <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfs/inode.c | 3 ++-
> include/linux/nfs_fs.h | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> I'm not sure this is the right fix. Maybe we'd be better off just
> ignoring the is_sync parameter in nfs_close_context? It's probably
> functionally equivalent anyway.
>
> Thoughts?
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index b4e46b0ffa2d..58b506a0a2f2 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1005,7 +1005,7 @@ void nfs_close_context(struct nfs_open_context *ctx, int is_sync)
>
> if (!(ctx->mode & FMODE_WRITE))
> return;
> - if (!is_sync)
> + if (!is_sync && !test_bit(NFS_CONTEXT_DO_CLOSE, &ctx->flags))
> return;
> inode = d_inode(ctx->dentry);
> if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
> @@ -1091,6 +1091,7 @@ EXPORT_SYMBOL_GPL(put_nfs_open_context);
>
> static void put_nfs_open_context_sync(struct nfs_open_context *ctx)
> {
> + set_bit(NFS_CONTEXT_DO_CLOSE, &ctx->flags);
> __put_nfs_open_context(ctx, 1);
> }
>
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index a17c337dbdf1..faff0d60aad2 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -85,8 +85,9 @@ struct nfs_open_context {
> unsigned long flags;
> #define NFS_CONTEXT_RESEND_WRITES (1)
> #define NFS_CONTEXT_BAD (2)
> -#define NFS_CONTEXT_UNLOCK (3)
> +#define NFS_CONTEXT_UNLOCK (3)
> #define NFS_CONTEXT_FILE_OPEN (4)
> +#define NFS_CONTEXT_DO_CLOSE (5)
> int error;
>
> struct list_head list;
> --
> 2.37.1
>

2022-07-28 14:58:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC PATCH] nfs: don't skip CTO on v2/3 mounts, regardless of order of reference puts

On Thu, 2022-07-28 at 10:24 -0400, Jeff Layton wrote:
> Olga reported a case of data corruption in a situation where two
> clients
> (v3 and v4) were alternately doing open/write/close the same file.
>
> Looking at captures, the v3 client failed to perform any of the
> GETATTR
> calls for CTO during one of the events, leading it to overwrite some
> data that had been written by the v4 client.
>
> We have two calls that work similarly: put_nfs_open_context and
> put_nfs_open_context_sync. The only difference is the is_sync
> parameter
> that gets passed to close_context. The only caller of the _sync
> variant
> is in the close codepath.
>
> On a v2/3 mount, if the last reference is not put by
> put_nfs_open_context_sync, then CTO revalidation never happens. Fix
> this
> by adding a new flag to nfs_open_context and set that in
> put_nfs_open_context_sync. In nfs_close_context, check for that flag
> when we're checking is_sync and treat them as equivalent.
>
> Cc: Scott Mayhew <[email protected]>
> Cc: Ben Coddington <[email protected]>
> Reported-by: Olga Kornieskaia <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
>  fs/nfs/inode.c         | 3 ++-
>  include/linux/nfs_fs.h | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> I'm not sure this is the right fix. Maybe we'd be better off just
> ignoring the is_sync parameter in nfs_close_context? It's probably
> functionally equivalent anyway.
>
> Thoughts?
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index b4e46b0ffa2d..58b506a0a2f2 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1005,7 +1005,7 @@ void nfs_close_context(struct nfs_open_context
> *ctx, int is_sync)
>  
>         if (!(ctx->mode & FMODE_WRITE))
>                 return;
> -       if (!is_sync)
> +       if (!is_sync && !test_bit(NFS_CONTEXT_DO_CLOSE, &ctx->flags))
>                 return;
>         inode = d_inode(ctx->dentry);
>         if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))

NACK. If the 'is_sync' flag is not set, then it is because the function
is being called from a context where it is not safe to do a synchronous
RPC call.


--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-07-28 15:01:23

by Jeff Layton

[permalink] [raw]
Subject: Re: [RFC PATCH] nfs: don't skip CTO on v2/3 mounts, regardless of order of reference puts

On Thu, 2022-07-28 at 14:53 +0000, Trond Myklebust wrote:
> On Thu, 2022-07-28 at 10:24 -0400, Jeff Layton wrote:
> > Olga reported a case of data corruption in a situation where two
> > clients
> > (v3 and v4) were alternately doing open/write/close the same file.
> >
> > Looking at captures, the v3 client failed to perform any of the
> > GETATTR
> > calls for CTO during one of the events, leading it to overwrite some
> > data that had been written by the v4 client.
> >
> > We have two calls that work similarly: put_nfs_open_context and
> > put_nfs_open_context_sync. The only difference is the is_sync
> > parameter
> > that gets passed to close_context. The only caller of the _sync
> > variant
> > is in the close codepath.
> >
> > On a v2/3 mount, if the last reference is not put by
> > put_nfs_open_context_sync, then CTO revalidation never happens. Fix
> > this
> > by adding a new flag to nfs_open_context and set that in
> > put_nfs_open_context_sync. In nfs_close_context, check for that flag
> > when we're checking is_sync and treat them as equivalent.
> >
> > Cc: Scott Mayhew <[email protected]>
> > Cc: Ben Coddington <[email protected]>
> > Reported-by: Olga Kornieskaia <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > ?fs/nfs/inode.c???????? | 3 ++-
> > ?include/linux/nfs_fs.h | 3 ++-
> > ?2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > I'm not sure this is the right fix. Maybe we'd be better off just
> > ignoring the is_sync parameter in nfs_close_context? It's probably
> > functionally equivalent anyway.
> >
> > Thoughts?
> >
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > index b4e46b0ffa2d..58b506a0a2f2 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -1005,7 +1005,7 @@ void nfs_close_context(struct nfs_open_context
> > *ctx, int is_sync)
> > ?
> > ????????if (!(ctx->mode & FMODE_WRITE))
> > ????????????????return;
> > -???????if (!is_sync)
> > +???????if (!is_sync && !test_bit(NFS_CONTEXT_DO_CLOSE, &ctx->flags))
> > ????????????????return;
> > ????????inode = d_inode(ctx->dentry);
> > ????????if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
>
> NACK. If the 'is_sync' flag is not set, then it is because the function
> is being called from a context where it is not safe to do a synchronous
> RPC call.
>

Ok. Any thoughts on the right way to fix this then? It seems like
skipping revalidation because the last put wasn't in the close codepath
is the wrong thing to do. Should we schedule it to a workqueue?
--
Jeff Layton <[email protected]>

2022-07-28 15:16:24

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC PATCH] nfs: don't skip CTO on v2/3 mounts, regardless of order of reference puts

On Thu, 2022-07-28 at 10:59 -0400, Jeff Layton wrote:
> On Thu, 2022-07-28 at 14:53 +0000, Trond Myklebust wrote:
> > On Thu, 2022-07-28 at 10:24 -0400, Jeff Layton wrote:
> > > Olga reported a case of data corruption in a situation where two
> > > clients
> > > (v3 and v4) were alternately doing open/write/close the same
> > > file.
> > >
> > > Looking at captures, the v3 client failed to perform any of the
> > > GETATTR
> > > calls for CTO during one of the events, leading it to overwrite
> > > some
> > > data that had been written by the v4 client.
> > >
> > > We have two calls that work similarly: put_nfs_open_context and
> > > put_nfs_open_context_sync. The only difference is the is_sync
> > > parameter
> > > that gets passed to close_context. The only caller of the _sync
> > > variant
> > > is in the close codepath.
> > >
> > > On a v2/3 mount, if the last reference is not put by
> > > put_nfs_open_context_sync, then CTO revalidation never happens.
> > > Fix
> > > this
> > > by adding a new flag to nfs_open_context and set that in
> > > put_nfs_open_context_sync. In nfs_close_context, check for that
> > > flag
> > > when we're checking is_sync and treat them as equivalent.
> > >
> > > Cc: Scott Mayhew <[email protected]>
> > > Cc: Ben Coddington <[email protected]>
> > > Reported-by: Olga Kornieskaia <[email protected]>
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > >  fs/nfs/inode.c         | 3 ++-
> > >  include/linux/nfs_fs.h | 3 ++-
> > >  2 files changed, 4 insertions(+), 2 deletions(-)
> > >
> > > I'm not sure this is the right fix. Maybe we'd be better off just
> > > ignoring the is_sync parameter in nfs_close_context? It's
> > > probably
> > > functionally equivalent anyway.
> > >
> > > Thoughts?
> > >
> > > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > > index b4e46b0ffa2d..58b506a0a2f2 100644
> > > --- a/fs/nfs/inode.c
> > > +++ b/fs/nfs/inode.c
> > > @@ -1005,7 +1005,7 @@ void nfs_close_context(struct
> > > nfs_open_context
> > > *ctx, int is_sync)
> > >  
> > >         if (!(ctx->mode & FMODE_WRITE))
> > >                 return;
> > > -       if (!is_sync)
> > > +       if (!is_sync && !test_bit(NFS_CONTEXT_DO_CLOSE, &ctx-
> > > >flags))
> > >                 return;
> > >         inode = d_inode(ctx->dentry);
> > >         if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
> >
> > NACK. If the 'is_sync' flag is not set, then it is because the
> > function
> > is being called from a context where it is not safe to do a
> > synchronous
> > RPC call.
> >
>
> Ok. Any thoughts on the right way to fix this then? It seems like
> skipping revalidation because the last put wasn't in the close
> codepath
> is the wrong thing to do. Should we schedule it to a workqueue?

If the attributes are marked as up to date, because they were already
revalidated after the I/O was performed, then we're fine. If they are
not up to date, then that is expected to trigger a cache invalidation
on the next open when we compare the client's cached values to the
server-retrieved values for the file attributes.

i.e. this ought to be just a performance problem, and not a correctness
problem, if attribute cache revalidation is working as designed.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]