2020-11-18 00:25:19

by Anchal Agarwal

[permalink] [raw]
Subject: [PATCH] NFS: Retry the CLOSE if the embedded GETATTR is rejected with ERR_STALE

If our CLOSE RPC call is rejected with an ERR_STALE error, then we
should remove the GETATTR call from the compound RPC and retry.
This could happen in a scenario where two clients tries to access
the same file. One client opens the file and the other client removes
the file while it's opened by first client. When the first client
attempts to close the file the server returns ESTALE and the file ends
up being leaked on the server. This depends on how nfs server is
configured and is not reproducible if running against nfsd.

Signed-off-by: Anchal Agarwal <[email protected]>
---
fs/nfs/nfs4proc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9e0ca9b2b210..40e4259bc83e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3548,6 +3548,7 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
res_stateid = &calldata->res.stateid;
renew_lease(server, calldata->timestamp);
break;
+ case -ESTALE:
case -NFS4ERR_ACCESS:
if (calldata->arg.bitmask != NULL) {
calldata->arg.bitmask = NULL;
--
2.16.6


2020-11-18 03:18:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFS: Retry the CLOSE if the embedded GETATTR is rejected with ERR_STALE

On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> If our CLOSE RPC call is rejected with an ERR_STALE error, then we
> should remove the GETATTR call from the compound RPC and retry.
> This could happen in a scenario where two clients tries to access
> the same file. One client opens the file and the other client removes
> the file while it's opened by first client. When the first client
> attempts to close the file the server returns ESTALE and the file
> ends
> up being leaked on the server. This depends on how nfs server is
> configured and is not reproducible if running against nfsd.

That would be a seriously broken server. If you return NFS4ERR_STALE to
the client, you cannot expect any further interaction with that file
from the client. It won't try to send CLOSE or DELEGRETURN or any other
stateful operation.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-11-18 21:40:22

by Anchal Agarwal

[permalink] [raw]
Subject: Re: [PATCH] NFS: Retry the CLOSE if the embedded GETATTR is rejected with ERR_STALE

On Wed, Nov 18, 2020 at 03:17:20AM +0000, Trond Myklebust wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> > If our CLOSE RPC call is rejected with an ERR_STALE error, then we
> > should remove the GETATTR call from the compound RPC and retry.
> > This could happen in a scenario where two clients tries to access
> > the same file. One client opens the file and the other client removes
> > the file while it's opened by first client. When the first client
> > attempts to close the file the server returns ESTALE and the file
> > ends
> > up being leaked on the server. This depends on how nfs server is
> > configured and is not reproducible if running against nfsd.
>
> That would be a seriously broken server. If you return NFS4ERR_STALE to
> the client, you cannot expect any further interaction with that file
> from the client. It won't try to send CLOSE or DELEGRETURN or any other
> stateful operation.
>
In this scenario, the setup we have at EFS is more of a distributed fashion. Multiple
clients are connected to multiple servers with a common filesystem. So the above
scenario leads to leaked open file handles on the client that tries to close deleted
file. So I was of the view, in that case client could retry close without getattr
in the close sequence without anything to do on server side.

Thanks,
Anchal Agarwal
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2020-11-18 22:14:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFS: Retry the CLOSE if the embedded GETATTR is rejected with ERR_STALE

On Wed, 2020-11-18 at 21:29 +0000, Anchal Agarwal wrote:
> On Wed, Nov 18, 2020 at 03:17:20AM +0000, Trond Myklebust wrote:
> > CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the
> > sender and know the content is safe.
> >
> >
> >
> > On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> > > If our CLOSE RPC call is rejected with an ERR_STALE error, then
> > > we
> > > should remove the GETATTR call from the compound RPC and retry.
> > > This could happen in a scenario where two clients tries to access
> > > the same file. One client opens the file and the other client
> > > removes
> > > the file while it's opened by first client. When the first client
> > > attempts to close the file the server returns ESTALE and the file
> > > ends
> > > up being leaked on the server. This depends on how nfs server is
> > > configured and is not reproducible if running against nfsd.
> >
> > That would be a seriously broken server. If you return
> > NFS4ERR_STALE to
> > the client, you cannot expect any further interaction with that
> > file
> > from the client. It won't try to send CLOSE or DELEGRETURN or any
> > other
> > stateful operation.
> >
> In this scenario, the setup we have at EFS is more of a distributed
> fashion. Multiple
> clients are connected to multiple servers with a common filesystem.
> So the above
> scenario leads to leaked open file handles on the client that tries
> to close deleted
> file. So I was of the view, in that case client could retry close
> without getattr
> in the close sequence without anything to do on server side.


If you send the client an NFS4ERR_STALE, you are telling it that its
access to the file has been revoked. That is not a temporary error, it
is a fatal one. The client is not responsible for cleaning up any
state.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-11-19 19:25:03

by Anchal Agarwal

[permalink] [raw]
Subject: Re: [PATCH] NFS: Retry the CLOSE if the embedded GETATTR is rejected with ERR_STALE

On Wed, Nov 18, 2020 at 10:13:16PM +0000, Trond Myklebust wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Wed, 2020-11-18 at 21:29 +0000, Anchal Agarwal wrote:
> > On Wed, Nov 18, 2020 at 03:17:20AM +0000, Trond Myklebust wrote:
> > > CAUTION: This email originated from outside of the organization. Do
> > > not click links or open attachments unless you can confirm the
> > > sender and know the content is safe.
> > >
> > >
> > >
> > > On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> > > > If our CLOSE RPC call is rejected with an ERR_STALE error, then
> > > > we
> > > > should remove the GETATTR call from the compound RPC and retry.
> > > > This could happen in a scenario where two clients tries to access
> > > > the same file. One client opens the file and the other client
> > > > removes
> > > > the file while it's opened by first client. When the first client
> > > > attempts to close the file the server returns ESTALE and the file
> > > > ends
> > > > up being leaked on the server. This depends on how nfs server is
> > > > configured and is not reproducible if running against nfsd.
> > >
> > > That would be a seriously broken server. If you return
> > > NFS4ERR_STALE to
> > > the client, you cannot expect any further interaction with that
> > > file
> > > from the client. It won't try to send CLOSE or DELEGRETURN or any
> > > other
> > > stateful operation.
> > >
> > In this scenario, the setup we have at EFS is more of a distributed
> > fashion. Multiple
> > clients are connected to multiple servers with a common filesystem.
> > So the above
> > scenario leads to leaked open file handles on the client that tries
> > to close deleted
> > file. So I was of the view, in that case client could retry close
> > without getattr
> > in the close sequence without anything to do on server side.
>
>
> If you send the client an NFS4ERR_STALE, you are telling it that its
> access to the file has been revoked. That is not a temporary error, it
> is a fatal one. The client is not responsible for cleaning up any
> state.
>
Ok, I get what you are saying. So from what I am understanding this is not
a valid error to be sent to client on close call and its the server who is doing
something fatally wrong and should be cleaning up its own state or basically not
be allowing to let this scenario happen.
Thanks for bearing with me.

--
Anchal
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>