2022-07-07 01:14:56

by NeilBrown

[permalink] [raw]
Subject: Problem with providing implementation id in NFSv4.1


In NFSv4.1 when we EXCHANGE_ID to talk to a new server - possibly a PNFS
Data Server that we haven't talked to before - we by default send an
implementation id. This is created from several fields obtained from
utsname().
utsname() depends on current->nsproxy, and will crash if that is NULL.

When a process exits it calls, among other things,

exit_task_namespaces(tsk);
exit_task_work(tsk);

exit_task_namespaces() will set ->nsproxy to NULL
exit_task_work() will run delayed work items, including fput() on all
files that were still open when the process exited. This will cause any
pending writes to be flushed for NFS.

So if a process writes to a file on a PNFS server, exits, and the MDS
tells the client to send the data to a DS which it hasn't established a
connection with before, then it will crash in encode_exchange_id().

That order of calls in do_exit() is deliberate so we cannot swap them - see
Commit: 8aac62706ada ("move exit_task_namespaces() outside of exit_notify()")

The options that I can see are:
1/ generate the implementation-id string at mount time and keep it
around much like we do for cl_owner_id
2/ Check current->nsproxy in encode_exchange_id() and skip the
implementation id if ->nsproxy is not available.
Note that there is no risk for a race with testing ->nxproxy.

Doesn't anyone have a strong opinion of which is best. I'm inclined to
go with '2', but mostly because it is less coding.

Thanks,
NeilBrown


2022-07-07 04:19:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: Problem with providing implementation id in NFSv4.1

On Thu, 2022-07-07 at 11:13 +1000, NeilBrown wrote:
>
> In NFSv4.1 when we EXCHANGE_ID to talk to a new server - possibly a
> PNFS
> Data Server that we haven't talked to before - we by default send an
> implementation id.  This is created from several fields obtained from
> utsname().
> utsname() depends on current->nsproxy, and will crash if that is
> NULL.
>
> When a process exits it calls, among other things,
>
>         exit_task_namespaces(tsk);
>         exit_task_work(tsk);
>
> exit_task_namespaces() will set ->nsproxy to NULL
> exit_task_work() will run delayed work items, including fput() on all
> files that were still open when the process exited.  This will cause
> any
> pending writes to be flushed for NFS.
>
> So if a process writes to a file on a PNFS server, exits, and the MDS
> tells the client to send the data to a DS which it hasn't established
> a
> connection with before, then it will crash in encode_exchange_id().
>
> That order of calls in do_exit() is deliberate so we cannot swap them
> - see
> Commit: 8aac62706ada ("move exit_task_namespaces() outside of
> exit_notify()")
>
> The options that I can see are:
> 1/ generate the implementation-id string at mount time and keep it
>    around much like we do for cl_owner_id
> 2/ Check current->nsproxy in encode_exchange_id() and skip the
>    implementation id if ->nsproxy is not available.
>    Note that there is no risk for a race with testing ->nxproxy.
>
> Doesn't anyone have a strong opinion of which is best.  I'm inclined
> to
> go with '2', but mostly because it is less coding.
>

I'd be fine with ignoring the implementation id in that case. The
protocol doesn't require it, so servers are expecte to be able to deal
with that case.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-07-08 07:11:30

by NeilBrown

[permalink] [raw]
Subject: Re: Problem with providing implementation id in NFSv4.1

On Thu, 07 Jul 2022, Trond Myklebust wrote:
> On Thu, 2022-07-07 at 11:13 +1000, NeilBrown wrote:
> >
> > In NFSv4.1 when we EXCHANGE_ID to talk to a new server - possibly a
> > PNFS
> > Data Server that we haven't talked to before - we by default send an
> > implementation id.  This is created from several fields obtained from
> > utsname().
> > utsname() depends on current->nsproxy, and will crash if that is
> > NULL.
> >
> > When a process exits it calls, among other things,
> >
> >         exit_task_namespaces(tsk);
> >         exit_task_work(tsk);
> >
> > exit_task_namespaces() will set ->nsproxy to NULL
> > exit_task_work() will run delayed work items, including fput() on all
> > files that were still open when the process exited.  This will cause
> > any
> > pending writes to be flushed for NFS.
> >
> > So if a process writes to a file on a PNFS server, exits, and the MDS
> > tells the client to send the data to a DS which it hasn't established
> > a
> > connection with before, then it will crash in encode_exchange_id().
> >
> > That order of calls in do_exit() is deliberate so we cannot swap them
> > - see
> > Commit: 8aac62706ada ("move exit_task_namespaces() outside of
> > exit_notify()")
> >
> > The options that I can see are:
> > 1/ generate the implementation-id string at mount time and keep it
> >    around much like we do for cl_owner_id
> > 2/ Check current->nsproxy in encode_exchange_id() and skip the
> >    implementation id if ->nsproxy is not available.
> >    Note that there is no risk for a race with testing ->nxproxy.
> >
> > Doesn't anyone have a strong opinion of which is best.  I'm inclined
> > to
> > go with '2', but mostly because it is less coding.
> >
>
> I'd be fine with ignoring the implementation id in that case. The
> protocol doesn't require it, so servers are expecte to be able to deal
> with that case.

Thanks for the quick reply. I agree with you.
However it turns out that this isn't a problem in mainline.

When you close a file the ->flush() happens earlier in do_exit() and the
name spaces are still around. However if you have a file mapped and
have dirty pages, then mainline doesn't force a flush on final unmap, or
exit of the process that had it mapped.

As I explained in
https://lore.kernel.org/all/150304037195.30218.15740287358704674869.stgit@noble/

I think this is wrong, so I have an fsync() call in nfs_file_release().
This *is* run after nsproxy has been cleared.

So mainline doesn't need this fix, but I do.

Thanks,
NeilBrown