2016-09-21 18:03:49

by J. Bruce Fields

[permalink] [raw]
Subject: [0/2] make nfsd's setclientid behavior migration-friendly

Clients mounting multiple servers with the "migration" option may find
some mounts are made from the incorrect server.

I think this is really a bug in RFC 7931, and that RFC and the client
need fixing, but this is easy to mitigate on the server. I'll make an
attempt at a client patch too.

--b.



2016-09-21 18:03:49

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 2/2] nfsd4: setclientid_confirm with unmatched verifier should fail

From: "J. Bruce Fields" <[email protected]>

A setclientid_confirm with (clientid, verifier) both matching an
existing confirmed record is assumed to be a replay, but if the verifier
doesn't match, it shouldn't be.

This would be a very rare case, except that clients following
https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the
failure.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a204d7e109d4..4b657337cbcd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3224,9 +3224,10 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp,
goto out;
/* cases below refer to rfc 3530 section 14.2.34: */
if (!unconf || !same_verf(&confirm, &unconf->cl_confirm)) {
- if (conf && !unconf) /* case 2: probable retransmit */
+ if (conf && same_verf(&confirm, &conf->cl_confirm)) {
+ /* case 2: probable retransmit */
status = nfs_ok;
- else /* case 4: client hasn't noticed we rebooted yet? */
+ } else /* case 4: client hasn't noticed we rebooted yet? */
status = nfserr_stale_clientid;
goto out;
}
--
2.7.4


2016-09-21 18:03:49

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 1/2] nfsd: randomize SETCLIENTID reply to help distinguish servers

From: "J. Bruce Fields" <[email protected]>

NFSv4.1 has built-in trunking support that allows a client to determine
whether two connections to two different IP addresses are actually to
the same server. NFSv4.0 does not, but RFC 7931 attempts to provide
clients a means to do this, basically by performing a SETCLIENTID to one
address and confirming it with a SETCLIENTID_CONFIRM to the other.

Linux clients since 05f4c350ee02 "NFS: Discover NFSv4 server trunking
when mounting" implement a variation on this suggestion. It is possible
that other clients do too.

This depends on the clientid and verifier not being accepted by an
unrelated server. Since both are 64-bit values, that would be very
unlikely if they were random numbers. But they aren't:

knfsd generates the 64-bit clientid by concatenating the 32-bit boot
time (in seconds) and a counter. This makes collisions between
clientids generated by the same server extremely unlikely. But
collisions are very likely between clientids generated by servers that
boot at the same time, and it's quite common for multiple servers to
boot at the same time. The verifier is a concatenation of the
SETCLIENTID time (in seconds) and a counter, so again collisions between
different servers are likely if multiple SETCLIENTIDs are done at the
same time, which is a common case.

Therefore recent NFSv4.0 clients may decide two different servers are
really the same, and mount a filesystem from the wrong server.

Fortunately the Linux client, since 55b9df93ddd6 "nfsv4/v4.1: Verify the
client owner id during trunking detection", only does this when given
the non-default "migration" mount option.

The fault is really with RFC 7931, and needs a client fix, but in the
meantime we can mitigate the chance of these collisions by randomizing
the starting value of the counters used to generate clientids and
verifiers.

Reported-by: Frank Sorenson <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfsctl.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 65ad0165a94f..36b2af931e06 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1216,6 +1216,8 @@ static __net_init int nfsd_init_net(struct net *net)
goto out_idmap_error;
nn->nfsd4_lease = 90; /* default lease time */
nn->nfsd4_grace = 90;
+ nn->clverifier_counter = prandom_u32();
+ nn->clientid_counter = prandom_u32();
return 0;

out_idmap_error:
--
2.7.4


2016-09-22 11:07:07

by Jeff Layton

[permalink] [raw]
Subject: Re: [0/2] make nfsd's setclientid behavior migration-friendly

On Wed, 2016-09-21 at 14:03 -0400, J. Bruce Fields wrote:
> Clients mounting multiple servers with the "migration" option may find
> some mounts are made from the incorrect server.
>
> I think this is really a bug in RFC 7931, and that RFC and the client
> need fixing, but this is easy to mitigate on the server. I'll make an
> attempt at a client patch too.
>
> --b.
>
>

Both look reasonable to me:

Reviewed-by: Jeff Layton <[email protected]>

2016-09-22 14:46:59

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [0/2] make nfsd's setclientid behavior migration-friendly

On Thu, Sep 22, 2016 at 07:07:03AM -0400, Jeff Layton wrote:
> On Wed, 2016-09-21 at 14:03 -0400, J. Bruce Fields wrote:
> > Clients mounting multiple servers with the "migration" option may find
> > some mounts are made from the incorrect server.
> >
> > I think this is really a bug in RFC 7931, and that RFC and the client
> > need fixing, but this is easy to mitigate on the server. I'll make an
> > attempt at a client patch too.
> >
> > --b.
> >
> >
>
> Both look reasonable to me:
>
> Reviewed-by: Jeff Layton <[email protected]>

Thanks. The below (untested) is what I was thinking of for the client.

--b.

commit 0d210faff69c
Author: J. Bruce Fields <[email protected]>
Date: Wed Sep 21 15:49:21 2016 -0400

nfs: fix false positives in nfs40_walk_client_list()

It's possible that two different servers can return the same (clientid,
verifier) pair purely by coincidence. Both are 64-bit values, but
depending on the server implementation, they can be highly predictable
and collisions may be quite likely, especially when there are lots of
servers.

So, check for this case. If the clientid and verifier both match, then
we actually know they *can't* be the same server, since a new
SETCLIENTID to an already-known server should have changed the verifier.

This helps fix a bug that could cause the client to mount a filesystem
from the wrong server.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index cd3b7cfdde16..a8cdb94d313c 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -461,6 +461,11 @@ static bool nfs4_match_client_owner_id(const struct nfs_client *clp1,
return strcmp(clp1->cl_owner_id, clp2->cl_owner_id) == 0;
}

+static bool nfs4_same_verifier(nfs4_verifier *v1, nfs4_verifier *v2)
+{
+ return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));
+}
+
/**
* nfs40_walk_client_list - Find server that recognizes a client ID
*
@@ -518,7 +523,20 @@ int nfs40_walk_client_list(struct nfs_client *new,

if (!nfs4_match_client_owner_id(pos, new))
continue;
-
+ /*
+ * We just sent a new SETCLIENTID, which should have
+ * caused the server to return a new cl_confirm. So if
+ * cl_confirm is the same, then this is a different
+ * server that just returned the same cl_confirm by
+ * coincidence:
+ */
+ if (nfs4_same_verifier(&pos->cl_confirm, &new->cl_confirm))
+ continue;
+ /*
+ * But if the cl_confirm's are different, then the only
+ * way that a SETCLIENTID_CONFIRM to pos can succeed is
+ * if new and pos point to the same server:
+ */
atomic_inc(&pos->cl_count);
spin_unlock(&nn->nfs_client_lock);


2016-09-22 15:36:11

by Jeff Layton

[permalink] [raw]
Subject: Re: [0/2] make nfsd's setclientid behavior migration-friendly

On Thu, 2016-09-22 at 10:46 -0400, J. Bruce Fields wrote:
> On Thu, Sep 22, 2016 at 07:07:03AM -0400, Jeff Layton wrote:
> >
> > On Wed, 2016-09-21 at 14:03 -0400, J. Bruce Fields wrote:
> > >
> > > Clients mounting multiple servers with the "migration" option may find
> > > some mounts are made from the incorrect server.
> > >
> > > I think this is really a bug in RFC 7931, and that RFC and the client
> > > need fixing, but this is easy to mitigate on the server. I'll make an
> > > attempt at a client patch too.
> > >
> > > --b.
> > >
> > >
> >
> > Both look reasonable to me:
> >
> > Reviewed-by: Jeff Layton <[email protected]>
>
> Thanks. The below (untested) is what I was thinking of for the client.
>
> --b.
>
> commit 0d210faff69c
> Author: J. Bruce Fields <[email protected]>
> Date: Wed Sep 21 15:49:21 2016 -0400
>
> nfs: fix false positives in nfs40_walk_client_list()
>
> It's possible that two different servers can return the same (clientid,
> verifier) pair purely by coincidence. Both are 64-bit values, but
> depending on the server implementation, they can be highly predictable
> and collisions may be quite likely, especially when there are lots of
> servers.
>
> So, check for this case. If the clientid and verifier both match, then
> we actually know they *can't* be the same server, since a new
> SETCLIENTID to an already-known server should have changed the verifier.
>
> This helps fix a bug that could cause the client to mount a filesystem
> from the wrong server.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index cd3b7cfdde16..a8cdb94d313c 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -461,6 +461,11 @@ static bool nfs4_match_client_owner_id(const struct nfs_client *clp1,
> return strcmp(clp1->cl_owner_id, clp2->cl_owner_id) == 0;
> }
>
> +static bool nfs4_same_verifier(nfs4_verifier *v1, nfs4_verifier *v2)
> +{
> + return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));
> +}
> +
> /**
> * nfs40_walk_client_list - Find server that recognizes a client ID
> *
> @@ -518,7 +523,20 @@ int nfs40_walk_client_list(struct nfs_client *new,
>
> if (!nfs4_match_client_owner_id(pos, new))
> continue;
> -
> + /*
> + * We just sent a new SETCLIENTID, which should have
> + * caused the server to return a new cl_confirm. So if
> + * cl_confirm is the same, then this is a different
> + * server that just returned the same cl_confirm by
> + * coincidence:
> + */
> + if (nfs4_same_verifier(&pos->cl_confirm, &new->cl_confirm))
> + continue;
> + /*
> + * But if the cl_confirm's are different, then the only
> + * way that a SETCLIENTID_CONFIRM to pos can succeed is
> + * if new and pos point to the same server:
> + */
> atomic_inc(&pos->cl_count);
> spin_unlock(&nn->nfs_client_lock);
>

Looks ok too. Trying to graft trunking onto v4.0 seems pretty kludgy in
general, so that's probably the best you can do.

Acked-by: Jeff Layton <[email protected]>

2016-09-22 20:23:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [0/2] make nfsd's setclientid behavior migration-friendly

On Thu, Sep 22, 2016 at 11:36:07AM -0400, Jeff Layton wrote:
> On Thu, 2016-09-22 at 10:46 -0400, J. Bruce Fields wrote:
> > On Thu, Sep 22, 2016 at 07:07:03AM -0400, Jeff Layton wrote:
> > >
> > > On Wed, 2016-09-21 at 14:03 -0400, J. Bruce Fields wrote:
> > > >
> > > > Clients mounting multiple servers with the "migration" option may find
> > > > some mounts are made from the incorrect server.
> > > >
> > > > I think this is really a bug in RFC 7931, and that RFC and the client
> > > > need fixing, but this is easy to mitigate on the server. I'll make an
> > > > attempt at a client patch too.
> > > >
> > > > --b.
> > > >
> > > >
> > >
> > > Both look reasonable to me:
> > >
> > > Reviewed-by: Jeff Layton <[email protected]>
> >
> > Thanks. The below (untested) is what I was thinking of for the client.
> >
> > --b.
> >
> > commit 0d210faff69c
> > Author: J. Bruce Fields <[email protected]>
> > Date: Wed Sep 21 15:49:21 2016 -0400
> >
> > nfs: fix false positives in nfs40_walk_client_list()
> >
> > It's possible that two different servers can return the same (clientid,
> > verifier) pair purely by coincidence. Both are 64-bit values, but
> > depending on the server implementation, they can be highly predictable
> > and collisions may be quite likely, especially when there are lots of
> > servers.
> >
> > So, check for this case. If the clientid and verifier both match, then
> > we actually know they *can't* be the same server, since a new
> > SETCLIENTID to an already-known server should have changed the verifier.
> >
> > This helps fix a bug that could cause the client to mount a filesystem
> > from the wrong server.
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
> >
> > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > index cd3b7cfdde16..a8cdb94d313c 100644
> > --- a/fs/nfs/nfs4client.c
> > +++ b/fs/nfs/nfs4client.c
> > @@ -461,6 +461,11 @@ static bool nfs4_match_client_owner_id(const struct nfs_client *clp1,
> > return strcmp(clp1->cl_owner_id, clp2->cl_owner_id) == 0;
> > }
> >
> > +static bool nfs4_same_verifier(nfs4_verifier *v1, nfs4_verifier *v2)
> > +{
> > + return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));
> > +}
> > +
> > /**
> > * nfs40_walk_client_list - Find server that recognizes a client ID
> > *
> > @@ -518,7 +523,20 @@ int nfs40_walk_client_list(struct nfs_client *new,
> >
> > if (!nfs4_match_client_owner_id(pos, new))
> > continue;
> > -
> > + /*
> > + * We just sent a new SETCLIENTID, which should have
> > + * caused the server to return a new cl_confirm. So if
> > + * cl_confirm is the same, then this is a different
> > + * server that just returned the same cl_confirm by
> > + * coincidence:
> > + */
> > + if (nfs4_same_verifier(&pos->cl_confirm, &new->cl_confirm))
> > + continue;
> > + /*
> > + * But if the cl_confirm's are different, then the only
> > + * way that a SETCLIENTID_CONFIRM to pos can succeed is
> > + * if new and pos point to the same server:
> > + */
> > atomic_inc(&pos->cl_count);
> > spin_unlock(&nn->nfs_client_lock);
> >
>
> Looks ok too. Trying to graft trunking onto v4.0 seems pretty kludgy in
> general, so that's probably the best you can do.
>
> Acked-by: Jeff Layton <[email protected]>

Ugh, I totally missed that this loop in nfs40_walk_client list counts on
the new client itself being in the list so that the normal case is
handled on the last iteration with new == pos. So I need:

+ if ((new != pos) && nfs4_same_verifier(&pos->cl_confirm,
+ &new->cl_confirm))
+ continue;

I wonder if that code's a little too clever for its own good--but I
don't think I want to fool with it.

--b.