Hi Trond-
We (Oracle) had another (fairly rare) instance of a weekend maintenance
window where an NFS server's IP address changed while there were mounted
clients. It brought up the issue again of how we (the Linux NFS community)
would like to deal with cases where a client administrator has to deal
with a moribund mount (like that alliteration :-).
Does remounting with "soft" work today? That seems like the most direct
way to deal with this particular situation.
--
Chuck Lever
On Wed, Oct 02 2019, Chuck Lever wrote:
> Hi Trond-
>
> We (Oracle) had another (fairly rare) instance of a weekend maintenance
> window where an NFS server's IP address changed while there were mounted
> clients. It brought up the issue again of how we (the Linux NFS community)
> would like to deal with cases where a client administrator has to deal
> with a moribund mount (like that alliteration :-).
What exactly is the problem that this caused?
As I understand it, a moribund mount can still be unmounted with "-l"
and processes accessing it can still be killed ... except....
There are some waits the VFS/MM which are not TASK_KILLABLE and
probably should be. I think that "we" definitely want "someone" to
track them down and fix them.
>
> Does remounting with "soft" work today? That seems like the most direct
> way to deal with this particular situation.
I don't think this does work, and it would be non-trivial (but maybe not
impossible) to mark all the outstanding RPCs as also "soft".
If we wanted to follow a path like this (and I suspect we don't), I
would hope that we could expose the server connection (shared among
multiple mounts) in sysfs somewhere, and could then set "soft" (or
"dead") on that connection, rather than having to do it on every mount
from the particular server.
NeilBrown
>
>
> --
> Chuck Lever
> On Oct 2, 2019, at 8:27 PM, NeilBrown <[email protected]> wrote:
>
> On Wed, Oct 02 2019, Chuck Lever wrote:
>
>> Hi Trond-
>>
>> We (Oracle) had another (fairly rare) instance of a weekend maintenance
>> window where an NFS server's IP address changed while there were mounted
>> clients. It brought up the issue again of how we (the Linux NFS community)
>> would like to deal with cases where a client administrator has to deal
>> with a moribund mount (like that alliteration :-).
>
> What exactly is the problem that this caused?
>
> As I understand it, a moribund mount can still be unmounted with "-l"
> and processes accessing it can still be killed
I was asking about "-o remount,soft" because I was not certain
about the outcome last time this conversation was in full swing.
The gist then is that we want "umount -l" and "umount -f" to
work reliably and as advertised?
> ... except....
> There are some waits the VFS/MM which are not TASK_KILLABLE and
> probably should be. I think that "we" definitely want "someone" to
> track them down and fix them.
I agree... and "someone" could mean me or someone here at Oracle.
>> Does remounting with "soft" work today? That seems like the most direct
>> way to deal with this particular situation.
>
> I don't think this does work, and it would be non-trivial (but maybe not
> impossible) to mark all the outstanding RPCs as also "soft".
The problem I've observed with umount is umount_begin does the
killall_tasks call, then the client issues some additional requests.
Those are the requests that get stuck before umount_end can finally
shutdown the RPC client. umount_end is never called because those
requests are "hard".
We have rpc_killall_tasks which loops over all of an rpc_clnt's
outstanding RPC tasks. nfs_umount_begin could do something like
- set the rpc_clnt's "soft" flag
- kill all tasks
Then any new tasks would timeout eventually. Just a thought, maybe
not a good one.
There's also using SOFTCONN for all tasks after killall is called:
if the client can't reconnect to the server, these tasks would fail
immediately.
> If we wanted to follow a path like this (and I suspect we don't), I
> would hope that we could expose the server connection (shared among
> multiple mounts) in sysfs somewhere, and could then set "soft" (or
> "dead") on that connection, rather than having to do it on every mount
> from the particular server.
I think of your use case from last time: client shutdown should be
reliable. Seems like making "umount -f" reliable would be better
for that use case, and would work for the "make client mount points
recoverable after server dies" case too.
--
Chuck Lever
On Thu, 2019-10-03 at 09:01 -0400, Chuck Lever wrote:
> > On Oct 2, 2019, at 8:27 PM, NeilBrown <[email protected]> wrote:
> >
> > On Wed, Oct 02 2019, Chuck Lever wrote:
> >
> > > Hi Trond-
> > >
> > > We (Oracle) had another (fairly rare) instance of a weekend
> > > maintenance
> > > window where an NFS server's IP address changed while there were
> > > mounted
> > > clients. It brought up the issue again of how we (the Linux NFS
> > > community)
> > > would like to deal with cases where a client administrator has to
> > > deal
> > > with a moribund mount (like that alliteration :-).
> >
> > What exactly is the problem that this caused?
> >
> > As I understand it, a moribund mount can still be unmounted with "-
> > l"
> > and processes accessing it can still be killed
>
> I was asking about "-o remount,soft" because I was not certain
> about the outcome last time this conversation was in full swing.
> The gist then is that we want "umount -l" and "umount -f" to
> work reliably and as advertised?
'umount -l' and 'umount -f' are both inherently flawed. The former
because it just hides the hanging RPC calls in the kernel (causing
resource leaks left, right and center), and the latter because it is a
single point-in-time operation. When you do 'umount -f', it will try to
kill all pending RPC calls, but it does nothing to prevent further
calls from being scheduled.
So yes, at some point it would be good to be able to kill requests from
a permanently hanging server through some other means.
One of the ideas that I do like, is being able to remount as 'soft' so
that the RPC calls simply time out. That solves the problem, and does
not compromise the case where the server comes back up, and we remount
the super block in order to continue operations.
That said, there are a few impediments to making that work. As far as I
can tell, none are insurmountable, but they need to be solved.
For instance, one such impediment is the fact that the way soft mounts
work these days is by tagging each RPC task with the flag RPC_TASK_SOFT
(and/or RPC_TASK_TIMEOUT depending on which error value you want the
call to return). This tag is set in task->tk_flags, which is assumed
constant throughout the lifetime of the RPC task. This is why we can
test RPC_IS_SOFT(task) before deciding how we want to call
rpc_sleep_on(). If a third party wants to change that tag, and the wake
up the task in order to have it try to time out, then code snippets
like the following in xprt_reserve_xprt()
if (RPC_IS_SOFT(task))
rpc_sleep_on_timeout(&xprt->sending, task, NULL,
xprt_request_timeout(req));
else
rpc_sleep_on(&xprt->sending, task, NULL);
would need to be replaced by something that is atomic.
>
>
> > ... except....
> > There are some waits the VFS/MM which are not TASK_KILLABLE and
> > probably should be. I think that "we" definitely want "someone" to
> > track them down and fix them.
>
> I agree... and "someone" could mean me or someone here at Oracle.
>
>
> > > Does remounting with "soft" work today? That seems like the most
> > > direct
> > > way to deal with this particular situation.
> >
> > I don't think this does work, and it would be non-trivial (but
> > maybe not
> > impossible) to mark all the outstanding RPCs as also "soft".
>
> The problem I've observed with umount is umount_begin does the
> killall_tasks call, then the client issues some additional requests.
> Those are the requests that get stuck before umount_end can finally
> shutdown the RPC client. umount_end is never called because those
> requests are "hard".
>
> We have rpc_killall_tasks which loops over all of an rpc_clnt's
> outstanding RPC tasks. nfs_umount_begin could do something like
>
> - set the rpc_clnt's "soft" flag
> - kill all tasks
>
> Then any new tasks would timeout eventually. Just a thought, maybe
> not a good one.
>
> There's also using SOFTCONN for all tasks after killall is called:
> if the client can't reconnect to the server, these tasks would fail
> immediately.
>
>
> > If we wanted to follow a path like this (and I suspect we don't), I
> > would hope that we could expose the server connection (shared among
> > multiple mounts) in sysfs somewhere, and could then set "soft" (or
> > "dead") on that connection, rather than having to do it on every
> > mount
> > from the particular server.
>
> I think of your use case from last time: client shutdown should be
> reliable. Seems like making "umount -f" reliable would be better
> for that use case, and would work for the "make client mount points
> recoverable after server dies" case too.
'umount -f' is intended as a point in time operation, which is why it
is implemented as 'umount_begin' in const struct super_operations
nfs_sops. It is not intended to act as a state changing operation on
the super block. If it were, it would need to ensure that we also hide
such a super block from being found when you try to mount again, and it
would need to ensure that you don't inadvertently end up with a
surviving duplicate.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]