In order to set a unique but persistent value for the nfs client's ID, the
client exposes a per-network-namespace sysfs file that can be used to
configure a "uniquifier" for this value.
However any userspace mechanism used to configure this value must do so in
the potentially small window between either (1) the nfs module getting
loaded or (2) the creation of a new network namespace, and the client
sending SETCLIENTID or EXCHANGE_ID.
In (1) the time between these events can be very small if the kernel loads
the nfs module as triggered by the first mount request. That leaves little
time for another process such as a userspace helper to lookup or generate a
uniquifier and write it to the kernel before the kernel attempts to create
and use the identifier.
In (2) the network namespace may be created but network configuration and
processes within that namespace that may trigger on the creation of the
sysfs file are not ready, or the setup of filesystems and tools for that
namespace may happen in parallel.
Fix this by creating a new nfs module parameter "nfs4_unique_id_timeout"
that will allow userspace a tunable window of time to uniquify the client.
When set, the client waits for a uniquifier to be set before sending
SETCLIENTID or EXCHANGE_ID. The parameter defaults to 500ms. Setting the
parameter to zero reverts any waiting for a uniquifier.
A tunable delay can accommodate situations where the size of the race
window needs to be modified due to hardware differences or various
approaches to container initialization with respect to the use of NFS
within those containers.
Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4proc.c | 4 ++++
fs/nfs/super.c | 4 ++++
fs/nfs/sysfs.c | 3 +++
fs/nfs/sysfs.h | 2 ++
5 files changed, 14 insertions(+)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 3e344bec3647..052805c3cfc0 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -544,6 +544,7 @@ extern bool recover_lost_locks;
#define NFS4_CLIENT_ID_UNIQ_LEN (64)
extern char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN];
+extern unsigned int nfs4_client_id_uniquifier_timeout;
extern int nfs4_try_get_tree(struct fs_context *);
extern int nfs4_get_referral_tree(struct fs_context *);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 3106bd28b113..2ddffd799c7f 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6135,6 +6135,10 @@ nfs4_get_uniquifier(struct nfs_client *clp, char *buf, size_t buflen)
buf[0] = '\0';
if (nn_clp) {
+ if (!nn_clp->user_uniquified && nfs4_client_id_uniquifier_timeout)
+ wait_for_completion_interruptible_timeout(&nn_clp->uniquified,
+ msecs_to_jiffies(nfs4_client_id_uniquifier_timeout));
+
rcu_read_lock();
id = rcu_dereference(nn_clp->identifier);
if (id)
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 4034102010f0..cad5acd1f79d 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1329,6 +1329,7 @@ unsigned short max_session_slots = NFS4_DEF_SLOT_TABLE_SIZE;
unsigned short max_session_cb_slots = NFS4_DEF_CB_SLOT_TABLE_SIZE;
unsigned short send_implementation_id = 1;
char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN] = "";
+unsigned int nfs4_client_id_uniquifier_timeout = 500;
bool recover_lost_locks = false;
EXPORT_SYMBOL_GPL(nfs_callback_nr_threads);
@@ -1339,6 +1340,7 @@ EXPORT_SYMBOL_GPL(max_session_slots);
EXPORT_SYMBOL_GPL(max_session_cb_slots);
EXPORT_SYMBOL_GPL(send_implementation_id);
EXPORT_SYMBOL_GPL(nfs4_client_id_uniquifier);
+EXPORT_SYMBOL_GPL(nfs4_client_id_uniquifier_timeout);
EXPORT_SYMBOL_GPL(recover_lost_locks);
#define NFS_CALLBACK_MAXPORTNR (65535U)
@@ -1370,6 +1372,7 @@ module_param(nfs_idmap_cache_timeout, int, 0644);
module_param(nfs4_disable_idmapping, bool, 0644);
module_param_string(nfs4_unique_id, nfs4_client_id_uniquifier,
NFS4_CLIENT_ID_UNIQ_LEN, 0600);
+module_param_named(nfs4_unique_id_timeout, nfs4_client_id_uniquifier_timeout, int, 0644);
MODULE_PARM_DESC(nfs4_disable_idmapping,
"Turn off NFSv4 idmapping when using 'sec=sys'");
module_param(max_session_slots, ushort, 0644);
@@ -1382,6 +1385,7 @@ module_param(send_implementation_id, ushort, 0644);
MODULE_PARM_DESC(send_implementation_id,
"Send implementation ID with NFSv4.1 exchange_id");
MODULE_PARM_DESC(nfs4_unique_id, "nfs_client_id4 uniquifier string");
+MODULE_PARM_DESC(nfs4_unique_id_timeout, "msecs to wait for nfs_client_id4 uniquifier string");
module_param(recover_lost_locks, bool, 0644);
MODULE_PARM_DESC(recover_lost_locks,
diff --git a/fs/nfs/sysfs.c b/fs/nfs/sysfs.c
index be05522a2c8b..a54d342bc381 100644
--- a/fs/nfs/sysfs.c
+++ b/fs/nfs/sysfs.c
@@ -117,6 +117,8 @@ static ssize_t nfs_netns_identifier_store(struct kobject *kobj,
synchronize_rcu();
kfree(old);
}
+ c->user_uniquified = true;
+ complete(&c->uniquified);
return count;
}
@@ -171,6 +173,7 @@ static struct nfs_netns_client *nfs_netns_client_alloc(struct kobject *parent,
if (p) {
if (net != &init_net)
assign_unique_clientid(p);
+ init_completion(&p->uniquified);
p->net = net;
p->kobject.kset = nfs_client_kset;
if (kobject_init_and_add(&p->kobject, &nfs_netns_client_type,
diff --git a/fs/nfs/sysfs.h b/fs/nfs/sysfs.h
index 5501ef573c32..f733439a1084 100644
--- a/fs/nfs/sysfs.h
+++ b/fs/nfs/sysfs.h
@@ -12,6 +12,8 @@ struct nfs_netns_client {
struct kobject kobject;
struct net *net;
const char __rcu *identifier;
+ bool user_uniquified;
+ struct completion uniquified;
};
extern struct kobject *nfs_client_kobj;
--
2.31.1
Trond and Anna, this patch makes obvious one problem with the udev
approach.
We cannot depend fully on stable uniquifiers. The conversation on the
userspace side has come full circle to asserting we use a mount option.
Do you still want us to keep this approach, or will you accept a mount
option as:
- first mount in a namespace sets the identifier
- subsequent mounts with conflicting identifiers return an error
If we keep this udev approach, this patch isn't necessary but does
greatly
reduce the chances of having clients with unstable identifiers.
No one can be blamed for ignoring this, but it would be a relief to get
this
problem solved so it can stop burning our time.
Please advise,
Ben
On 16 Feb 2022, at 12:42, Benjamin Coddington wrote:
> In order to set a unique but persistent value for the nfs client's ID,
> the
> client exposes a per-network-namespace sysfs file that can be used to
> configure a "uniquifier" for this value.
>
> However any userspace mechanism used to configure this value must do
> so in
> the potentially small window between either (1) the nfs module getting
> loaded or (2) the creation of a new network namespace, and the client
> sending SETCLIENTID or EXCHANGE_ID.
>
> In (1) the time between these events can be very small if the kernel
> loads
> the nfs module as triggered by the first mount request. That leaves
> little
> time for another process such as a userspace helper to lookup or
> generate a
> uniquifier and write it to the kernel before the kernel attempts to
> create
> and use the identifier.
>
> In (2) the network namespace may be created but network configuration
> and
> processes within that namespace that may trigger on the creation of
> the
> sysfs file are not ready, or the setup of filesystems and tools for
> that
> namespace may happen in parallel.
>
> Fix this by creating a new nfs module parameter
> "nfs4_unique_id_timeout"
> that will allow userspace a tunable window of time to uniquify the
> client.
> When set, the client waits for a uniquifier to be set before sending
> SETCLIENTID or EXCHANGE_ID. The parameter defaults to 500ms. Setting
> the
> parameter to zero reverts any waiting for a uniquifier.
>
> A tunable delay can accommodate situations where the size of the race
> window needs to be modified due to hardware differences or various
> approaches to container initialization with respect to the use of NFS
> within those containers.
>
> Signed-off-by: Benjamin Coddington <[email protected]>
> ---
> fs/nfs/nfs4_fs.h | 1 +
> fs/nfs/nfs4proc.c | 4 ++++
> fs/nfs/super.c | 4 ++++
> fs/nfs/sysfs.c | 3 +++
> fs/nfs/sysfs.h | 2 ++
> 5 files changed, 14 insertions(+)
>
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index 3e344bec3647..052805c3cfc0 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -544,6 +544,7 @@ extern bool recover_lost_locks;
>
> #define NFS4_CLIENT_ID_UNIQ_LEN (64)
> extern char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN];
> +extern unsigned int nfs4_client_id_uniquifier_timeout;
>
> extern int nfs4_try_get_tree(struct fs_context *);
> extern int nfs4_get_referral_tree(struct fs_context *);
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 3106bd28b113..2ddffd799c7f 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -6135,6 +6135,10 @@ nfs4_get_uniquifier(struct nfs_client *clp,
> char *buf, size_t buflen)
> buf[0] = '\0';
>
> if (nn_clp) {
> + if (!nn_clp->user_uniquified && nfs4_client_id_uniquifier_timeout)
> + wait_for_completion_interruptible_timeout(&nn_clp->uniquified,
> + msecs_to_jiffies(nfs4_client_id_uniquifier_timeout));
> +
> rcu_read_lock();
> id = rcu_dereference(nn_clp->identifier);
> if (id)
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 4034102010f0..cad5acd1f79d 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -1329,6 +1329,7 @@ unsigned short max_session_slots =
> NFS4_DEF_SLOT_TABLE_SIZE;
> unsigned short max_session_cb_slots = NFS4_DEF_CB_SLOT_TABLE_SIZE;
> unsigned short send_implementation_id = 1;
> char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN] = "";
> +unsigned int nfs4_client_id_uniquifier_timeout = 500;
> bool recover_lost_locks = false;
>
> EXPORT_SYMBOL_GPL(nfs_callback_nr_threads);
> @@ -1339,6 +1340,7 @@ EXPORT_SYMBOL_GPL(max_session_slots);
> EXPORT_SYMBOL_GPL(max_session_cb_slots);
> EXPORT_SYMBOL_GPL(send_implementation_id);
> EXPORT_SYMBOL_GPL(nfs4_client_id_uniquifier);
> +EXPORT_SYMBOL_GPL(nfs4_client_id_uniquifier_timeout);
> EXPORT_SYMBOL_GPL(recover_lost_locks);
>
> #define NFS_CALLBACK_MAXPORTNR (65535U)
> @@ -1370,6 +1372,7 @@ module_param(nfs_idmap_cache_timeout, int,
> 0644);
> module_param(nfs4_disable_idmapping, bool, 0644);
> module_param_string(nfs4_unique_id, nfs4_client_id_uniquifier,
> NFS4_CLIENT_ID_UNIQ_LEN, 0600);
> +module_param_named(nfs4_unique_id_timeout,
> nfs4_client_id_uniquifier_timeout, int, 0644);
> MODULE_PARM_DESC(nfs4_disable_idmapping,
> "Turn off NFSv4 idmapping when using 'sec=sys'");
> module_param(max_session_slots, ushort, 0644);
> @@ -1382,6 +1385,7 @@ module_param(send_implementation_id, ushort,
> 0644);
> MODULE_PARM_DESC(send_implementation_id,
> "Send implementation ID with NFSv4.1 exchange_id");
> MODULE_PARM_DESC(nfs4_unique_id, "nfs_client_id4 uniquifier string");
> +MODULE_PARM_DESC(nfs4_unique_id_timeout, "msecs to wait for
> nfs_client_id4 uniquifier string");
>
> module_param(recover_lost_locks, bool, 0644);
> MODULE_PARM_DESC(recover_lost_locks,
> diff --git a/fs/nfs/sysfs.c b/fs/nfs/sysfs.c
> index be05522a2c8b..a54d342bc381 100644
> --- a/fs/nfs/sysfs.c
> +++ b/fs/nfs/sysfs.c
> @@ -117,6 +117,8 @@ static ssize_t nfs_netns_identifier_store(struct
> kobject *kobj,
> synchronize_rcu();
> kfree(old);
> }
> + c->user_uniquified = true;
> + complete(&c->uniquified);
> return count;
> }
>
> @@ -171,6 +173,7 @@ static struct nfs_netns_client
> *nfs_netns_client_alloc(struct kobject *parent,
> if (p) {
> if (net != &init_net)
> assign_unique_clientid(p);
> + init_completion(&p->uniquified);
> p->net = net;
> p->kobject.kset = nfs_client_kset;
> if (kobject_init_and_add(&p->kobject, &nfs_netns_client_type,
> diff --git a/fs/nfs/sysfs.h b/fs/nfs/sysfs.h
> index 5501ef573c32..f733439a1084 100644
> --- a/fs/nfs/sysfs.h
> +++ b/fs/nfs/sysfs.h
> @@ -12,6 +12,8 @@ struct nfs_netns_client {
> struct kobject kobject;
> struct net *net;
> const char __rcu *identifier;
> + bool user_uniquified;
> + struct completion uniquified;
> };
>
> extern struct kobject *nfs_client_kobj;
> --
> 2.31.1
On 17 Feb 2022, at 7:48, Benjamin Coddington wrote:
> Trond and Anna, this patch makes obvious one problem with the udev
> approach.
> We cannot depend fully on stable uniquifiers. The conversation on the
> userspace side has come full circle to asserting we use a mount
> option.
>
> Do you still want us to keep this approach, or will you accept a mount
> option as:
> - first mount in a namespace sets the identifier
> - subsequent mounts with conflicting identifiers return an error
>
> If we keep this udev approach, this patch isn't necessary but does
> greatly
> reduce the chances of having clients with unstable identifiers.
>
> No one can be blamed for ignoring this, but it would be a relief to
> get this
> problem solved so it can stop burning our time.
>
> Please advise,
> Ben
Attempts to work toward solutions in this area seem to be ignored, the
questions still stand. Until we can sort out and agree on a direction,
self-NACK to this patch.
Ben
On Mon, 2022-02-28 at 11:07 -0500, Benjamin Coddington wrote:
> On 17 Feb 2022, at 7:48, Benjamin Coddington wrote:
>
> > Trond and Anna, this patch makes obvious one problem with the udev
> > approach.
> > We cannot depend fully on stable uniquifiers. The conversation on
> > the
> > userspace side has come full circle to asserting we use a mount
> > option.
> >
> > Do you still want us to keep this approach, or will you accept a
> > mount
> > option as:
> > - first mount in a namespace sets the identifier
> > - subsequent mounts with conflicting identifiers return an error
> >
> > If we keep this udev approach, this patch isn't necessary but does
> > greatly
> > reduce the chances of having clients with unstable identifiers.
> >
> > No one can be blamed for ignoring this, but it would be a relief to
> > get this
> > problem solved so it can stop burning our time.
> >
> > Please advise,
> > Ben
>
> Attempts to work toward solutions in this area seem to be ignored,
> the
> questions still stand. Until we can sort out and agree on a
> direction,
> self-NACK to this patch.
A new mount option doesn't solve any problems that we can't solve with
the existing framework.
The problem we're trying to solve isn't "How do I set the stable
uniquifier in a container?"
You can already do that by manually setting the correct value in
/sys/fs/nfs/net/nfs_client/identifier. There is no need for a new
kernel mount option.
The real problem that we are trying to solve is: "How do I ensure that
a stable uniquifier is always set by the user?"
While you could have "mount" do that, it is not going to be solved by
adding a mandatory mount option. It is far better to have a separate
helper that generates and persists that stable uniquifier in a well-
known location. If you want to call that helper from 'mount' when it
sees that the /sys/fs/... pseudo-file is empty, then by all means,
however the point is that the solution needs to apply to existing
setups without any further changes.
I'm happy to set up a kernel patch that initialises the pseudo-file
based uniquifier for init_net from the module parameter. That could
help ensure that we won't override any existing setting for the host.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
> On Feb 28, 2022, at 12:24 PM, Trond Myklebust <[email protected]> wrote:
>
>> Attempts to work toward solutions in this area seem to be ignored,
>> the
>> questions still stand. Until we can sort out and agree on a
>> direction,
>> self-NACK to this patch.
>
> A new mount option doesn't solve any problems that we can't solve with
> the existing framework.
I don't think a mount option was proposed. Rather, the mechanics
of the udev rule would be done by mount.nfs without any changes
to the administrative interface.
--
Chuck Lever
On Mon, 2022-02-28 at 19:04 +0000, Chuck Lever III wrote:
>
>
> > On Feb 28, 2022, at 12:24 PM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > > Attempts to work toward solutions in this area seem to be
> > > ignored,
> > > the
> > > questions still stand. Until we can sort out and agree on a
> > > direction,
> > > self-NACK to this patch.
> >
> > A new mount option doesn't solve any problems that we can't solve
> > with
> > the existing framework.
>
> I don't think a mount option was proposed. Rather, the mechanics
> of the udev rule would be done by mount.nfs without any changes
> to the administrative interface.
>
That's not how I read this proposal:
> Do you still want us to keep this approach, or will you accept a
mount
> option as:
> - first mount in a namespace sets the identifier
> - subsequent mounts with conflicting identifiers return an error
Which is why I responded as I did.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Mon, 2022-02-28 at 20:15 +0000, Chuck Lever III wrote:
>
>
> > On Feb 28, 2022, at 3:06 PM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Mon, 2022-02-28 at 19:04 +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On Feb 28, 2022, at 12:24 PM, Trond Myklebust
> > > > <[email protected]> wrote:
> > > >
> > > > > Attempts to work toward solutions in this area seem to be
> > > > > ignored,
> > > > > the
> > > > > questions still stand. Until we can sort out and agree on a
> > > > > direction,
> > > > > self-NACK to this patch.
> > > >
> > > > A new mount option doesn't solve any problems that we can't
> > > > solve
> > > > with
> > > > the existing framework.
> > >
> > > I don't think a mount option was proposed. Rather, the mechanics
> > > of the udev rule would be done by mount.nfs without any changes
> > > to the administrative interface.
> > >
> >
> > That's not how I read this proposal:
> >
> > > Do you still want us to keep this approach, or will you accept a
> > mount
> > > option as:
> > > - first mount in a namespace sets the identifier
> > > - subsequent mounts with conflicting identifiers return an error
> >
> >
> > Which is why I responded as I did.
>
> Ah! Well, I read "mount option" as "mount alternative", I guess
> I was filling in some context from previous dialog on the topic.
>
> I agree with you that an actual mount option -- that is, a new
> administrative interface -- is not desirable or necessary.
>
> But I assert that having mount.nfs take care of initializing the
> uniquifier for the net namespace is convenient, and can be done
> automatically and reliably.
>
Again, if we do this, then I'd want the actual tool that initialises
the uniquifier to be a separate one that gets called by mount if
necessary.
Otherwise we'll have to consider duplicating that code in busybox and
all the other tools that have private implementations of 'mount'.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
> On Feb 28, 2022, at 3:06 PM, Trond Myklebust <[email protected]> wrote:
>
> On Mon, 2022-02-28 at 19:04 +0000, Chuck Lever III wrote:
>>
>>
>>> On Feb 28, 2022, at 12:24 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>>> Attempts to work toward solutions in this area seem to be
>>>> ignored,
>>>> the
>>>> questions still stand. Until we can sort out and agree on a
>>>> direction,
>>>> self-NACK to this patch.
>>>
>>> A new mount option doesn't solve any problems that we can't solve
>>> with
>>> the existing framework.
>>
>> I don't think a mount option was proposed. Rather, the mechanics
>> of the udev rule would be done by mount.nfs without any changes
>> to the administrative interface.
>>
>
> That's not how I read this proposal:
>
>> Do you still want us to keep this approach, or will you accept a
> mount
>> option as:
>> - first mount in a namespace sets the identifier
>> - subsequent mounts with conflicting identifiers return an error
>
>
> Which is why I responded as I did.
Ah! Well, I read "mount option" as "mount alternative", I guess
I was filling in some context from previous dialog on the topic.
I agree with you that an actual mount option -- that is, a new
administrative interface -- is not desirable or necessary.
But I assert that having mount.nfs take care of initializing the
uniquifier for the net namespace is convenient, and can be done
automatically and reliably.
--
Chuck Lever
> On Feb 28, 2022, at 3:42 PM, Trond Myklebust <[email protected]> wrote:
>
> On Mon, 2022-02-28 at 20:15 +0000, Chuck Lever III wrote:
>>
>>
>>> On Feb 28, 2022, at 3:06 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>> On Mon, 2022-02-28 at 19:04 +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On Feb 28, 2022, at 12:24 PM, Trond Myklebust
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> Attempts to work toward solutions in this area seem to be
>>>>>> ignored,
>>>>>> the
>>>>>> questions still stand. Until we can sort out and agree on a
>>>>>> direction,
>>>>>> self-NACK to this patch.
>>>>>
>>>>> A new mount option doesn't solve any problems that we can't
>>>>> solve
>>>>> with
>>>>> the existing framework.
>>>>
>>>> I don't think a mount option was proposed. Rather, the mechanics
>>>> of the udev rule would be done by mount.nfs without any changes
>>>> to the administrative interface.
>>>>
>>>
>>> That's not how I read this proposal:
>>>
>>>> Do you still want us to keep this approach, or will you accept a
>>> mount
>>>> option as:
>>>> - first mount in a namespace sets the identifier
>>>> - subsequent mounts with conflicting identifiers return an error
>>>
>>>
>>> Which is why I responded as I did.
>>
>> Ah! Well, I read "mount option" as "mount alternative", I guess
>> I was filling in some context from previous dialog on the topic.
>>
>> I agree with you that an actual mount option -- that is, a new
>> administrative interface -- is not desirable or necessary.
>>
>> But I assert that having mount.nfs take care of initializing the
>> uniquifier for the net namespace is convenient, and can be done
>> automatically and reliably.
>>
>
> Again, if we do this, then I'd want the actual tool that initialises
> the uniquifier to be a separate one that gets called by mount if
> necessary.
> Otherwise we'll have to consider duplicating that code in busybox and
> all the other tools that have private implementations of 'mount'.
I don't have a problem with mount.nfs invoking a separate tool or
doing it as part of a systemd target that precedes remote-fs. The
tricky thing here seems to be guaranteeing that the uniquifier is
set before the first mount can be done.
--
Chuck Lever