2023-02-13 22:57:19

by NeilBrown

[permalink] [raw]
Subject: [PATCH] NFS: avoid infinite NFS4ERR_OLD_STATEID loops


Linux-NFS responds to NFS4ERR_OLD_STATEID by simply retrying the
request, hoping to make use of an updated stateid that might have
arrived from the server. This is usually successful.

However if the client and server get out-of-sync for some reason and if
there is no new stateid to try, this can result in an indefinite loop
which looks a bit like a DoS attack.

This can particularly happen when a server replies with success to an
OPEN request, but fails a subsequent GETATTR. This has been observed
with Netapp and Hitachi servers when a concurrent unlink from a
different client removes the file between the OPEN and the GETATTR. The
GETATTR returns NFS4ERR_STALE.

In this situation Linux-NFS ignores the successful open and in
particular does not record the new stateid. If the same file was
already open, those active opens will now have an OLD_STATEID.

When the Netapp/Hitachi servers receive subseqent WRITE requests with
the STALE file handle and OLD stateid, they choose to report the
OLD_STATEID error rather than the STALE error. This causes Linux-NFS to
loop indefinitely.

To protect against this or other scenarios that might result in old
stateids being used, this patch limits the number of retry attempts when
OLD_STATEID is received. The first 4 retries are sent immediately after
an error. After that the standard timed back-off retry scheme is used
until a 2-second delay is reached. After than the OLD_STATEID error is
mapped to ESTALE so the write fails rather than hanging indefinitely.

This pattern of server behaviour can be demonstrated - and so the patch
can be tested - by modifying the Linux NFS server as follows:

1/ The second time a file is opened, unlink it. This simulates
a race with another client, without needing to have a race:

> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> if (reclaim && !status)
> nn->somebody_reclaimed = true;
> out:
> + if (!status && open->op_stateid.si_generation > 1) {
> + printk("Opening gen %d\n", (int)open->op_stateid.si_generation);
> + vfs_unlink(mnt_user_ns(resfh->fh_export->ex_path.mnt),
> + resfh->fh_dentry->d_parent->d_inode,
> + resfh->fh_dentry, NULL);
> + }
> if (open->op_filp) {
> fput(open->op_filp);
> open->op_filp = NULL;

2/ When a GETATTR op is attempted on an unlinked file, return ESTALE

> @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> if (status)
> return status;
>
> + if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) {
> + printk("Return Estale for unlinked file\n");
> + return nfserr_stale;
> + }
> +
> if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1)
> return nfserr_inval;

Then mount the filesystem and

Thread 1 Thread 2
open a file
open the same file (will fail)
write to that file

I use this shell fragment, using 'sleep' for synchronisation.
The use of /bin/echo ensures the write is flushed when /bin/echo closes
the fd on exit.

(
/bin/echo hello
sleep 3
/bin/echo there
) > /import/A/afile &
sleep 3
cat /import/A/afile

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/nfs4proc.c | 21 +++++++++++++++++++--
include/linux/nfs_xdr.h | 2 ++
2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 40d749f29ed3..ee77bf42ec38 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -408,7 +408,7 @@ static long nfs4_update_delay(long *timeout)
long ret;
if (!timeout)
return NFS4_POLL_RETRY_MAX;
- if (*timeout <= 0)
+ if (*timeout <= NFS4_POLL_RETRY_MIN)
*timeout = NFS4_POLL_RETRY_MIN;
if (*timeout > NFS4_POLL_RETRY_MAX)
*timeout = NFS4_POLL_RETRY_MAX;
@@ -562,9 +562,24 @@ static int nfs4_do_handle_exception(struct nfs_server *server,
return 0;

case -NFS4ERR_RETRY_UNCACHED_REP:
- case -NFS4ERR_OLD_STATEID:
exception->retry = 1;
break;
+ case -NFS4ERR_OLD_STATEID:
+ /* Simple retry is usually safe, but buggy
+ * servers can cause DoS - so be careful.
+ */
+ if (exception->timeout > HZ*2) {
+ ret = -ESTALE;
+ } else if (exception->timeout < 5) {
+ /* 5 tries with no delay */
+ exception->retry = 1;
+ exception->timeout += 1;
+ ret = 0;
+ } else {
+ exception->delay = 1;
+ ret = 0;
+ }
+ break;
case -NFS4ERR_BADOWNER:
/* The following works around a Linux server bug! */
case -NFS4ERR_BADNAME:
@@ -5506,10 +5521,12 @@ static int nfs4_write_done_cb(struct rpc_task *task,
.inode = hdr->inode,
.state = hdr->args.context->state,
.stateid = &hdr->args.stateid,
+ .timeout = hdr->timeout,
};
task->tk_status = nfs4_async_handle_exception(task,
NFS_SERVER(inode), task->tk_status,
&exception);
+ hdr->timeout = exception.timeout;
if (exception.retry) {
rpc_restart_call_prepare(task);
return -EAGAIN;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index e86cf6642d21..0c791e5b213c 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1625,6 +1625,8 @@ struct nfs_pgio_header {
unsigned int good_bytes; /* boundary of good data */
unsigned long flags;

+ long timeout;
+
/*
* rpc data
*/
--
2.39.1



2023-02-14 02:57:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFS: avoid infinite NFS4ERR_OLD_STATEID loops

On Tue, 2023-02-14 at 09:57 +1100, NeilBrown wrote:
>
> Linux-NFS responds to NFS4ERR_OLD_STATEID by simply retrying the
> request, hoping to make use of an updated stateid that might have
> arrived from the server.  This is usually successful.
>
> However if the client and server get out-of-sync for some reason and
> if
> there is no new stateid to try, this can result in an indefinite loop
> which looks a bit like a DoS attack.
>
> This can particularly happen when a server replies with success to an
> OPEN request, but fails a subsequent GETATTR.  This has been observed
> with Netapp and Hitachi servers when a concurrent unlink from a
> different client removes the file between the OPEN and the GETATTR. 
> The
> GETATTR returns NFS4ERR_STALE.

Then they are both badly broken servers, and people should complain to
NetApp and Hitachi. We're still not fixing their server bugs in the
Linux client.

NACK...
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]



2023-02-14 03:07:49

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] NFS: avoid infinite NFS4ERR_OLD_STATEID loops

On Tue, 14 Feb 2023, Trond Myklebust wrote:
> On Tue, 2023-02-14 at 09:57 +1100, NeilBrown wrote:
> >
> > Linux-NFS responds to NFS4ERR_OLD_STATEID by simply retrying the
> > request, hoping to make use of an updated stateid that might have
> > arrived from the server.  This is usually successful.
> >
> > However if the client and server get out-of-sync for some reason and
> > if
> > there is no new stateid to try, this can result in an indefinite loop
> > which looks a bit like a DoS attack.
> >
> > This can particularly happen when a server replies with success to an
> > OPEN request, but fails a subsequent GETATTR.  This has been observed
> > with Netapp and Hitachi servers when a concurrent unlink from a
> > different client removes the file between the OPEN and the GETATTR. 
> > The
> > GETATTR returns NFS4ERR_STALE.
>
> Then they are both badly broken servers, and people should complain to
> NetApp and Hitachi. We're still not fixing their server bugs in the
> Linux client.

It's not about fixing a server bug. It is defensive programming. We
shouldn't let a buggy server cause a melt-down.

>
> NACK...

Oh well, I tried.

Thanks,
NeilBrown


> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>
>