[[ This implements an idea I had while discussing the issues around
NFSv4 client identity. It isn't a solution, but instead aims to make
the problem more immediately visible so that sites can "fix" their
configuration before they get strange errors.
I'm not convinced it is a good idea, but it seems worthy of a little
discussion at least.
There is a follow up patch to nfs-utils which expands this more
towards a b-grade solution, but again if very open for discussion.
]]
The default cl_owner_id is based on host name which may be common
amongst different network namespaces. If two clients in different
network namespaces with the same cl_owner_id attempt to mount from the
same server, problem will occur as each client can "steal" the lease
from the other.
To make this failure more immediate, keep track of all cl_owner_id in
use, and return an error if there is an attempt to use one in two
different network namespaces.
We use an rhl_table which is a resizeable hash table which can contain
multiple entries with the same key. All nfs_client are added to this
table with the cl_owner_id as the key.
This doesn't fix the problem of non-unique client identifier, it only
makes it more visible and causes a hard immediate failure. The failure
is reported to userspace which *could* make a policy decision to set
/sys/fs/nfs/net/nfs_client/identifier
to a random value and retry the mount.
Note that this *could* cause a regression of configurations which are
currently working, is different network namespaces only mount from
different servers. Such configurations are fragile and supporting them
might not be justified. It would require rejecting mounts only when the
netns is different and the server is the same, but at the point when we
commit to the client identifer, we don't have a unique identity for the
server.
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/nfs4_fs.h | 7 +++
fs/nfs/nfs4client.c | 1 +
fs/nfs/nfs4proc.c | 89 +++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4state.c | 4 ++
fs/nfs/nfs4super.c | 8 ++++
include/linux/nfs_fs_sb.h | 2 +
6 files changed, 111 insertions(+)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 84f39b6f1b1e..83a70ea300d1 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -336,6 +336,13 @@ extern void nfs4_update_changeattr(struct inode *dir,
extern int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
struct page **pages);
+/* Interfaces for owner_id hash table, which ensures no two
+ * clients in the name net use the same owner_id.
+ */
+void nfs4_drop_owner_id(struct nfs_client *clp);
+int nfs4_init_owner_id_hash(void);
+void nfs4_destroy_owner_id_hash(void);
+
#if defined(CONFIG_NFS_V4_1)
extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
extern int nfs4_proc_create_session(struct nfs_client *, const struct cred *);
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 47a6cf892c95..2a659d14b663 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -289,6 +289,7 @@ static void nfs4_shutdown_client(struct nfs_client *clp)
nfs_idmap_delete(clp);
rpc_destroy_wait_queue(&clp->cl_rpcwaitq);
+ nfs4_drop_owner_id(clp);
kfree(clp->cl_serverowner);
kfree(clp->cl_serverscope);
kfree(clp->cl_implid);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 0e0db6c27619..8f171e8b45b4 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -55,6 +55,8 @@
#include <linux/utsname.h>
#include <linux/freezer.h>
#include <linux/iversion.h>
+#include <linux/rhashtable.h>
+#include <linux/xxhash.h>
#include "nfs4_fs.h"
#include "delegation.h"
@@ -6240,6 +6242,89 @@ nfs4_get_uniquifier(struct nfs_client *clp, char *buf, size_t buflen)
return strlen(buf);
}
+static u32 owner_id_hash(const void *obj, u32 len, u32 seed)
+{
+ const char *const *idp = obj;
+ const char *id = *idp;
+
+ return xxh32(id, strlen(id), seed);
+}
+
+static int owner_id_cmp(struct rhashtable_compare_arg *arg,
+ const void *obj)
+{
+ const char *const *idp = arg->key;
+ const char *id = *idp;
+ const struct nfs_client *clp = obj;
+
+ return strcmp(id, clp->cl_owner_id);
+}
+
+static struct rhltable nfs4_owner_id;
+static DEFINE_SPINLOCK(nfs4_owner_id_lock);
+const struct rhashtable_params nfs4_owner_id_params = {
+ .key_offset = offsetof(struct nfs_client, cl_owner_id),
+ .head_offset = offsetof(struct nfs_client, cl_owner_id_hash),
+ .key_len = 1,
+ .automatic_shrinking = true,
+ .hashfn = owner_id_hash,
+ .obj_cmpfn = owner_id_cmp,
+};
+
+int nfs4_init_owner_id_hash(void)
+{
+ return rhltable_init(&nfs4_owner_id, &nfs4_owner_id_params);
+}
+
+void nfs4_destroy_owner_id_hash(void)
+{
+ rhltable_destroy(&nfs4_owner_id);
+}
+
+static bool nfs4_record_owner_id(struct nfs_client *clp)
+{
+ /*
+ * Any two nfs_client with the same cl_owner_id must
+ * have the same cl_net.
+ */
+ int ret;
+ int loops = 0;
+ struct rhlist_head *h;
+
+ /* rhltable_insert can fail for transient reasons, so loop a few times */
+ while (loops < 5) {
+ spin_lock(&nfs4_owner_id_lock);
+ rcu_read_lock();
+ h = rhltable_lookup(&nfs4_owner_id, &clp->cl_owner_id,
+ nfs4_owner_id_params);
+ if (h &&
+ container_of(h, struct nfs_client,
+ cl_owner_id_hash)->cl_net != clp->cl_net) {
+ rcu_read_unlock();
+ spin_unlock(&nfs4_owner_id_lock);
+ return false;
+ }
+ rcu_read_unlock();
+ ret = rhltable_insert(&nfs4_owner_id,
+ &clp->cl_owner_id_hash,
+ nfs4_owner_id_params);
+ spin_unlock(&nfs4_owner_id_lock);
+ if (ret == 0)
+ return true;
+ /* Transient error */
+ msleep(20);
+ loops += 1;
+ }
+ return false;
+}
+
+void nfs4_drop_owner_id(struct nfs_client *clp)
+{
+ if (clp->cl_owner_id)
+ rhltable_remove(&nfs4_owner_id, &clp->cl_owner_id_hash,
+ nfs4_owner_id_params);
+}
+
static int
nfs4_init_nonuniform_client_string(struct nfs_client *clp)
{
@@ -6289,6 +6374,8 @@ nfs4_init_nonuniform_client_string(struct nfs_client *clp)
rcu_read_unlock();
clp->cl_owner_id = str;
+ if (!nfs4_record_owner_id(clp))
+ return -EADDRINUSE;
return 0;
}
@@ -6331,6 +6418,8 @@ nfs4_init_uniform_client_string(struct nfs_client *clp)
clp->rpc_ops->version, clp->cl_minorversion,
clp->cl_rpcclient->cl_nodename);
clp->cl_owner_id = str;
+ if (!nfs4_record_owner_id(clp))
+ return -EADDRINUSE;
return 0;
}
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index f5a62c0d999b..a5cd5c2d6dac 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2276,6 +2276,10 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
case -EINTR:
case -ERESTARTSYS:
break;
+ case -EADDRINUSE: /* cl_owner_id not unique */
+ dprintk("NFS: client id \"%s\" is used in other network namespace\n",
+ clp->cl_owner_id);
+ break;
case -ETIMEDOUT:
if (clnt->cl_softrtry)
break;
diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
index d09bcfd7db89..62f6d15c2bf9 100644
--- a/fs/nfs/nfs4super.c
+++ b/fs/nfs/nfs4super.c
@@ -280,11 +280,17 @@ static int __init init_nfs_v4(void)
if (err)
goto out2;
+ err = nfs4_init_owner_id_hash();
+ if (err)
+ goto out3;
+
#ifdef CONFIG_NFS_V4_2
nfs42_ssc_register_ops();
#endif
register_nfs_version(&nfs_v4);
return 0;
+out3:
+ nfs4_unregister_sysctl();
out2:
nfs_idmap_quit();
out1:
@@ -298,6 +304,8 @@ static void __exit exit_nfs_v4(void)
/* Not called in the _init(), conditionally loaded */
nfs4_pnfs_v3_ds_connect_unload();
+ nfs4_destroy_owner_id_hash();
+
unregister_nfs_version(&nfs_v4);
#ifdef CONFIG_NFS_V4_2
nfs4_xattr_cache_exit();
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index ca0959e51e81..a9da00b6aeb6 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -8,6 +8,7 @@
#include <linux/wait.h>
#include <linux/nfs_xdr.h>
#include <linux/sunrpc/xprt.h>
+#include <linux/rhashtable-types.h>
#include <linux/atomic.h>
#include <linux/refcount.h>
@@ -84,6 +85,7 @@ struct nfs_client {
/* Client owner identifier */
const char * cl_owner_id;
+ struct rhlist_head cl_owner_id_hash;
u32 cl_cb_ident; /* v4.0 callback identifier */
const struct nfs4_minor_version_ops *cl_mvops;
--
2.35.1
Hi Neil-
> On Mar 24, 2022, at 8:24 PM, NeilBrown <[email protected]> wrote:
>
>
> [[ This implements an idea I had while discussing the issues around
> NFSv4 client identity. It isn't a solution, but instead aims to make
> the problem more immediately visible so that sites can "fix" their
> configuration before they get strange errors.
> I'm not convinced it is a good idea, but it seems worthy of a little
> discussion at least.
> There is a follow up patch to nfs-utils which expands this more
> towards a b-grade solution, but again if very open for discussion.
> ]]
>
> The default cl_owner_id is based on host name which may be common
> amongst different network namespaces. If two clients in different
> network namespaces with the same cl_owner_id attempt to mount from the
> same server, problem will occur as each client can "steal" the lease
> from the other.
The immediate issue, IIUC, is that this helps only when the potentially
conflicting containers are located on the same physical host. I would
expect there are similar, but less probable, collision risks across
a population of clients.
I guess I was also under the impression that NFS mount(2) can return
EADDRINUSE already, but I might be wrong about that.
In regard to the general issues around client ID, my approach has been
to increase the visibility of CLID_INUSE errors. At least on the
server, there are trace points that fire when this happens. Perhaps
the server and client could be more verbose about this situation.
Our server might also track the last few boot verifiers for a client
ID, and if it sees them repeating, issue a warning? That becomes
onerous if there are more than a handful of clients with the same
co_ownerid, however.
> To make this failure more immediate, keep track of all cl_owner_id in
> use, and return an error if there is an attempt to use one in two
> different network namespaces.
>
> We use an rhl_table which is a resizeable hash table which can contain
> multiple entries with the same key. All nfs_client are added to this
> table with the cl_owner_id as the key.
>
> This doesn't fix the problem of non-unique client identifier, it only
> makes it more visible and causes a hard immediate failure. The failure
> is reported to userspace which *could* make a policy decision to set
> /sys/fs/nfs/net/nfs_client/identifier
> to a random value and retry the mount.
>
> Note that this *could* cause a regression of configurations which are
> currently working, is different network namespaces only mount from
> different servers. Such configurations are fragile and supporting them
> might not be justified. It would require rejecting mounts only when the
> netns is different and the server is the same, but at the point when we
> commit to the client identifer, we don't have a unique identity for the
> server.
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/nfs/nfs4_fs.h | 7 +++
> fs/nfs/nfs4client.c | 1 +
> fs/nfs/nfs4proc.c | 89 +++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs4state.c | 4 ++
> fs/nfs/nfs4super.c | 8 ++++
> include/linux/nfs_fs_sb.h | 2 +
> 6 files changed, 111 insertions(+)
>
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index 84f39b6f1b1e..83a70ea300d1 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -336,6 +336,13 @@ extern void nfs4_update_changeattr(struct inode *dir,
> extern int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
> struct page **pages);
>
> +/* Interfaces for owner_id hash table, which ensures no two
> + * clients in the name net use the same owner_id.
> + */
> +void nfs4_drop_owner_id(struct nfs_client *clp);
> +int nfs4_init_owner_id_hash(void);
> +void nfs4_destroy_owner_id_hash(void);
> +
> #if defined(CONFIG_NFS_V4_1)
> extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
> extern int nfs4_proc_create_session(struct nfs_client *, const struct cred *);
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 47a6cf892c95..2a659d14b663 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -289,6 +289,7 @@ static void nfs4_shutdown_client(struct nfs_client *clp)
> nfs_idmap_delete(clp);
>
> rpc_destroy_wait_queue(&clp->cl_rpcwaitq);
> + nfs4_drop_owner_id(clp);
> kfree(clp->cl_serverowner);
> kfree(clp->cl_serverscope);
> kfree(clp->cl_implid);
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 0e0db6c27619..8f171e8b45b4 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -55,6 +55,8 @@
> #include <linux/utsname.h>
> #include <linux/freezer.h>
> #include <linux/iversion.h>
> +#include <linux/rhashtable.h>
> +#include <linux/xxhash.h>
>
> #include "nfs4_fs.h"
> #include "delegation.h"
> @@ -6240,6 +6242,89 @@ nfs4_get_uniquifier(struct nfs_client *clp, char *buf, size_t buflen)
> return strlen(buf);
> }
>
> +static u32 owner_id_hash(const void *obj, u32 len, u32 seed)
> +{
> + const char *const *idp = obj;
> + const char *id = *idp;
> +
> + return xxh32(id, strlen(id), seed);
> +}
> +
> +static int owner_id_cmp(struct rhashtable_compare_arg *arg,
> + const void *obj)
> +{
> + const char *const *idp = arg->key;
> + const char *id = *idp;
> + const struct nfs_client *clp = obj;
> +
> + return strcmp(id, clp->cl_owner_id);
> +}
> +
> +static struct rhltable nfs4_owner_id;
> +static DEFINE_SPINLOCK(nfs4_owner_id_lock);
> +const struct rhashtable_params nfs4_owner_id_params = {
> + .key_offset = offsetof(struct nfs_client, cl_owner_id),
> + .head_offset = offsetof(struct nfs_client, cl_owner_id_hash),
> + .key_len = 1,
> + .automatic_shrinking = true,
> + .hashfn = owner_id_hash,
> + .obj_cmpfn = owner_id_cmp,
> +};
> +
> +int nfs4_init_owner_id_hash(void)
> +{
> + return rhltable_init(&nfs4_owner_id, &nfs4_owner_id_params);
> +}
> +
> +void nfs4_destroy_owner_id_hash(void)
> +{
> + rhltable_destroy(&nfs4_owner_id);
> +}
> +
> +static bool nfs4_record_owner_id(struct nfs_client *clp)
> +{
> + /*
> + * Any two nfs_client with the same cl_owner_id must
> + * have the same cl_net.
> + */
> + int ret;
> + int loops = 0;
> + struct rhlist_head *h;
> +
> + /* rhltable_insert can fail for transient reasons, so loop a few times */
> + while (loops < 5) {
> + spin_lock(&nfs4_owner_id_lock);
> + rcu_read_lock();
> + h = rhltable_lookup(&nfs4_owner_id, &clp->cl_owner_id,
> + nfs4_owner_id_params);
> + if (h &&
> + container_of(h, struct nfs_client,
> + cl_owner_id_hash)->cl_net != clp->cl_net) {
> + rcu_read_unlock();
> + spin_unlock(&nfs4_owner_id_lock);
> + return false;
> + }
> + rcu_read_unlock();
> + ret = rhltable_insert(&nfs4_owner_id,
> + &clp->cl_owner_id_hash,
> + nfs4_owner_id_params);
> + spin_unlock(&nfs4_owner_id_lock);
> + if (ret == 0)
> + return true;
> + /* Transient error */
> + msleep(20);
> + loops += 1;
> + }
> + return false;
> +}
> +
> +void nfs4_drop_owner_id(struct nfs_client *clp)
> +{
> + if (clp->cl_owner_id)
> + rhltable_remove(&nfs4_owner_id, &clp->cl_owner_id_hash,
> + nfs4_owner_id_params);
> +}
> +
> static int
> nfs4_init_nonuniform_client_string(struct nfs_client *clp)
> {
> @@ -6289,6 +6374,8 @@ nfs4_init_nonuniform_client_string(struct nfs_client *clp)
> rcu_read_unlock();
>
> clp->cl_owner_id = str;
> + if (!nfs4_record_owner_id(clp))
> + return -EADDRINUSE;
> return 0;
> }
>
> @@ -6331,6 +6418,8 @@ nfs4_init_uniform_client_string(struct nfs_client *clp)
> clp->rpc_ops->version, clp->cl_minorversion,
> clp->cl_rpcclient->cl_nodename);
> clp->cl_owner_id = str;
> + if (!nfs4_record_owner_id(clp))
> + return -EADDRINUSE;
> return 0;
> }
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index f5a62c0d999b..a5cd5c2d6dac 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2276,6 +2276,10 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
> case -EINTR:
> case -ERESTARTSYS:
> break;
> + case -EADDRINUSE: /* cl_owner_id not unique */
> + dprintk("NFS: client id \"%s\" is used in other network namespace\n",
> + clp->cl_owner_id);
> + break;
> case -ETIMEDOUT:
> if (clnt->cl_softrtry)
> break;
> diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
> index d09bcfd7db89..62f6d15c2bf9 100644
> --- a/fs/nfs/nfs4super.c
> +++ b/fs/nfs/nfs4super.c
> @@ -280,11 +280,17 @@ static int __init init_nfs_v4(void)
> if (err)
> goto out2;
>
> + err = nfs4_init_owner_id_hash();
> + if (err)
> + goto out3;
> +
> #ifdef CONFIG_NFS_V4_2
> nfs42_ssc_register_ops();
> #endif
> register_nfs_version(&nfs_v4);
> return 0;
> +out3:
> + nfs4_unregister_sysctl();
> out2:
> nfs_idmap_quit();
> out1:
> @@ -298,6 +304,8 @@ static void __exit exit_nfs_v4(void)
> /* Not called in the _init(), conditionally loaded */
> nfs4_pnfs_v3_ds_connect_unload();
>
> + nfs4_destroy_owner_id_hash();
> +
> unregister_nfs_version(&nfs_v4);
> #ifdef CONFIG_NFS_V4_2
> nfs4_xattr_cache_exit();
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index ca0959e51e81..a9da00b6aeb6 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -8,6 +8,7 @@
> #include <linux/wait.h>
> #include <linux/nfs_xdr.h>
> #include <linux/sunrpc/xprt.h>
> +#include <linux/rhashtable-types.h>
>
> #include <linux/atomic.h>
> #include <linux/refcount.h>
> @@ -84,6 +85,7 @@ struct nfs_client {
>
> /* Client owner identifier */
> const char * cl_owner_id;
> + struct rhlist_head cl_owner_id_hash;
>
> u32 cl_cb_ident; /* v4.0 callback identifier */
> const struct nfs4_minor_version_ops *cl_mvops;
> --
> 2.35.1
>
--
Chuck Lever
On Sat, 26 Mar 2022, Chuck Lever III wrote:
> Hi Neil-
>
> > On Mar 24, 2022, at 8:24 PM, NeilBrown <[email protected]> wrote:
> >
> >
> > [[ This implements an idea I had while discussing the issues around
> > NFSv4 client identity. It isn't a solution, but instead aims to make
> > the problem more immediately visible so that sites can "fix" their
> > configuration before they get strange errors.
> > I'm not convinced it is a good idea, but it seems worthy of a little
> > discussion at least.
> > There is a follow up patch to nfs-utils which expands this more
> > towards a b-grade solution, but again if very open for discussion.
> > ]]
> >
> > The default cl_owner_id is based on host name which may be common
> > amongst different network namespaces. If two clients in different
> > network namespaces with the same cl_owner_id attempt to mount from the
> > same server, problem will occur as each client can "steal" the lease
> > from the other.
>
> The immediate issue, IIUC, is that this helps only when the potentially
> conflicting containers are located on the same physical host. I would
> expect there are similar, but less probable, collision risks across
> a population of clients.
I see that as a separate issue - certainly similar but probably
requiring a separate solution. I had hope to propose a (partial)
solution to that the same time, but it proved challenging.
I would like to automatically set nfs.nfs4_unique_id to something based
on /etc/machine_id if it isn't otherwise set.
- we could auto-generate /etc/modprobe.d/00-nfs-identity.conf
but I suspect that would over-ride anything on the kernel command
line.
- we could run a tool at boot and when udev notices that the module is
loaded, and set the parameter if it isn't already set, but that might
happen after the first mount
- we could get mount.nfs to check-and-set, but that might run in a mount
namespace which sees a different /etc/machine-id
- we could change the kernel to support another module parameter.
nfs.nfs4_unique_id_default, and set that via /etc/modprobe.d
Then the kernel uses it only if nfs4_unique_id is not set.
I think this idea would be sufficiently safe if we could make it work.
I can't see how to make it work without a kernel change - and I don't
really like the kernel change I suggested.
>
> I guess I was also under the impression that NFS mount(2) can return
> EADDRINUSE already, but I might be wrong about that.
Maybe it could return EADDRINUSE if all privileged ports were in use ...
I'd need to check that.
>
> In regard to the general issues around client ID, my approach has been
> to increase the visibility of CLID_INUSE errors. At least on the
> server, there are trace points that fire when this happens. Perhaps
> the server and client could be more verbose about this situation.
I found the RFC a bit unclear here, but my understanding is that
CLID_INUSE will only get generated if krb5 authentication is used for
EXCHANGE_ID, and the credentials for the clients are different.
This is certainly a case worth handling well, but I doubt it would
affect a high proportion of NFS installations.
Or have I misunderstood?
>
> Our server might also track the last few boot verifiers for a client
> ID, and if it sees them repeating, issue a warning? That becomes
> onerous if there are more than a handful of clients with the same
> co_ownerid, however.
That's an interesting idea. There would be no guarantees, but I would
probably report some errors, which would be enough to flag to the
problem to the sysadmin.
Thanks,
NeilBrown
> On Mar 28, 2022, at 12:32 AM, NeilBrown <[email protected]> wrote:
>
> On Sat, 26 Mar 2022, Chuck Lever III wrote:
>> Hi Neil-
>>
>>> On Mar 24, 2022, at 8:24 PM, NeilBrown <[email protected]> wrote:
>>>
>>>
>>> [[ This implements an idea I had while discussing the issues around
>>> NFSv4 client identity. It isn't a solution, but instead aims to make
>>> the problem more immediately visible so that sites can "fix" their
>>> configuration before they get strange errors.
>>> I'm not convinced it is a good idea, but it seems worthy of a little
>>> discussion at least.
>>> There is a follow up patch to nfs-utils which expands this more
>>> towards a b-grade solution, but again if very open for discussion.
>>> ]]
>>>
>>> The default cl_owner_id is based on host name which may be common
>>> amongst different network namespaces. If two clients in different
>>> network namespaces with the same cl_owner_id attempt to mount from the
>>> same server, problem will occur as each client can "steal" the lease
>>> from the other.
>>
>> The immediate issue, IIUC, is that this helps only when the potentially
>> conflicting containers are located on the same physical host. I would
>> expect there are similar, but less probable, collision risks across
>> a population of clients.
>
> I see that as a separate issue - certainly similar but probably
> requiring a separate solution. I had hope to propose a (partial)
> solution to that the same time, but it proved challenging.
>
> I would like to automatically set nfs.nfs4_unique_id to something based
> on /etc/machine_id if it isn't otherwise set.
> - we could auto-generate /etc/modprobe.d/00-nfs-identity.conf
> but I suspect that would over-ride anything on the kernel command
> line.
> - we could run a tool at boot and when udev notices that the module is
> loaded, and set the parameter if it isn't already set, but that might
> happen after the first mount
> - we could get mount.nfs to check-and-set, but that might run in a mount
> namespace which sees a different /etc/machine-id
> - we could change the kernel to support another module parameter.
> nfs.nfs4_unique_id_default, and set that via /etc/modprobe.d
> Then the kernel uses it only if nfs4_unique_id is not set.
- Repurpose nfs.nfs4_unique_id to be the system's default uniquifier,
and use the sysfs API for replacing it as needed.
Some logic would be needed when building initramfs to sort out whether
to use the module parameter or the boot parameter. </handwave>
> I think this idea would be sufficiently safe if we could make it work.
> I can't see how to make it work without a kernel change - and I don't
> really like the kernel change I suggested.
I like the idea of having a fairly reliable default uniquifier for
init_ns and then educating the container folks about how to set up
a uniquifier properly when materializing a container instance.
>> I guess I was also under the impression that NFS mount(2) can return
>> EADDRINUSE already, but I might be wrong about that.
>
> Maybe it could return EADDRINUSE if all privileged ports were in use ...
> I'd need to check that.
>
>>
>> In regard to the general issues around client ID, my approach has been
>> to increase the visibility of CLID_INUSE errors. At least on the
>> server, there are trace points that fire when this happens. Perhaps
>> the server and client could be more verbose about this situation.
>
> I found the RFC a bit unclear here, but my understanding is that
> CLID_INUSE will only get generated if krb5 authentication is used for
> EXCHANGE_ID, and the credentials for the clients are different.
> This is certainly a case worth handling well, but I doubt it would
> affect a high proportion of NFS installations.
>
> Or have I misunderstood?
That sounds right to me. I don't believe the Linux NFS server ever
returns CLID_INUSE if the principal/flavor is AUTH_SYS.
Note also that the protocol allows a CLID_INUSE response to return
the IP address of the conflicting client, but NFSD doesn't do that,
I guess as a security measure.
>> Our server might also track the last few boot verifiers for a client
>> ID, and if it sees them repeating, issue a warning? That becomes
>> onerous if there are more than a handful of clients with the same
>> co_ownerid, however.
>
> That's an interesting idea. There would be no guarantees, but I would
> probably report some errors, which would be enough to flag to the
> problem to the sysadmin.
>
> Thanks,
> NeilBrown
--
Chuck Lever
On Mon, Mar 28, 2022 at 02:08:31PM +0000, Chuck Lever III wrote:
> Note also that the protocol allows a CLID_INUSE response to return
> the IP address of the conflicting client, but NFSD doesn't do that,
> I guess as a security measure.
The information disclosure sounds mild and not too surprising--mountd
already hands out information about v2/v3 clients (used by "showmount").
Chances are we just didn't get around to it. If it'd help debug this
kind of problem, I think it'd be worth it.
--b.