2014-03-07 15:02:50

by Steve Dickson

[permalink] [raw]
Subject: [PATCH] Stop Background mounts hang from hanging

Background mounts hang forever due to the kernel not returning
the time out error. The proposed fix is twofold, one in the kernel
and one in the mounting code.

The kernel patch stop the server trunking code from endlessly
looping in the kernel on -ETIMEDOUT errors. Instead, the code
will now return the error, allowing the mount to go into
the background.

Unfortunately, it takes over 5 mins for this timeout to
happen, due the default retry strategy, which is unacceptable
for background mounts.

So the patch I will be proposing for the mount code will be
to append the "retrans=1,timeo=100" mount options to the parent
mount of the background mount (when they don't exist). This
causes the parent mount to timeout in ~25sec.

When parent mount times out, those options will be remove (when
if they were added) before the child mount is done, in background.

Steve Dickson (1):
NFSv4: Don't retry server trunking discovery on timeouts

fs/nfs/nfs4state.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)



2014-03-07 15:26:08

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFSv4: Don't retry server trunking discovery on timeouts


On Mar 7, 2014, at 10:02, Steve Dickson <[email protected]> wrote:

> To allow background mounts to process into background,
> server trunking discovery needs to return the -ETIMEDOUT
> error to the mount command instead of endless retrying in
> the kernel.
>
> Signed-off-by: Steve Dickson <[email protected]>
> ---
> fs/nfs/nfs4state.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index e1a4721..be4b33f 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2085,6 +2085,7 @@ again:
> break;
> case -NFS4ERR_DELAY:
> case -ETIMEDOUT:
> + break;
> case -EAGAIN:
> ssleep(1);
> case -NFS4ERR_STALE_CLIENTID:
> --
> 1.7.1
>

This will cause NFS4ERR_DELAY to interrupt the mount as well. We don?t want that...
_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-07 15:36:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Stop Background mounts hang from hanging


On Mar 7, 2014, at 10:02, Steve Dickson <[email protected]> wrote:

> Background mounts hang forever due to the kernel not returning
> the time out error. The proposed fix is twofold, one in the kernel
> and one in the mounting code.
>
> The kernel patch stop the server trunking code from endlessly
> looping in the kernel on -ETIMEDOUT errors. Instead, the code
> will now return the error, allowing the mount to go into
> the background.
>
> Unfortunately, it takes over 5 mins for this timeout to
> happen, due the default retry strategy, which is unacceptable
> for background mounts.
>
> So the patch I will be proposing for the mount code will be
> to append the "retrans=1,timeo=100" mount options to the parent
> mount of the background mount (when they don't exist). This
> causes the parent mount to timeout in ~25sec.

We already have a ?retry=? option for mount.nfs. According to the manpage, that should be used to specify the timeout value. Why not reuse that?

Also, it really would be better if that timeout were under control of the mount utility itself. How about if we allow the use of alarm() to interrupt that particular RPC call?

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-07 16:10:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Stop Background mounts hang from hanging


On Mar 7, 2014, at 10:47, Steve Dickson <[email protected]> wrote:

>
>
> On 03/07/2014 10:36 AM, Trond Myklebust wrote:
>>
>> On Mar 7, 2014, at 10:02, Steve Dickson <[email protected]> wrote:
>>
>>> Background mounts hang forever due to the kernel not returning
>>> the time out error. The proposed fix is twofold, one in the kernel
>>> and one in the mounting code.
>>>
>>> The kernel patch stop the server trunking code from endlessly
>>> looping in the kernel on -ETIMEDOUT errors. Instead, the code
>>> will now return the error, allowing the mount to go into
>>> the background.
>>>
>>> Unfortunately, it takes over 5 mins for this timeout to
>>> happen, due the default retry strategy, which is unacceptable
>>> for background mounts.
>>>
>>> So the patch I will be proposing for the mount code will be
>>> to append the "retrans=1,timeo=100" mount options to the parent
>>> mount of the background mount (when they don't exist). This
>>> causes the parent mount to timeout in ~25sec.
>>
>> We already have a ?retry=? option for mount.nfs. According to the manpage, that should be used to specify the timeout value. Why not reuse that?
> Because it didn't work... retrans and timeo had most effect on the initial times set
> in nfs_init_timeout_values()
>
>>
>> Also, it really would be better if that timeout were under control of the mount utility itself.
> Using those options, it is under the control of mount, unless I'm misunderstanding you...
>
>> How about if we allow the use of alarm() to interrupt that particular RPC call?
> Why just use the mechanisms that already exist? Why invent a new one? Was my reasoning...

alarm() is hardly a ?new? mechanism. It is the standard way of doing this thing in user space, and should, in fact, already work with existing kernels, since they allow fatal signals to interrupt all killable NFS and RPC sleeps.

The point is that relying on ?retrans? and ?timeo? in this context is likely to be frustrating. ?retrans? and ?timeo? act on a per RPC call, and there are many RPC calls involved in a single NFSv4/v4.1 mount call. Furthermore, the server may reply with something like DELAY or equivalent, which doesn?t trigger a timeout, but keeps the kernel retrying the same RPC call over and over again.
Then there is the possibility that the hang may occur somewhere other than in the one place you chose (for instance in the path walk). What then?

We can?t and we won?t add a load of stuff to the kernel to catch all the possible sources of delay for a mount operation. That?s why if we can do it in userspace, then we should.
_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-07 19:25:04

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH] Stop Background mounts hang from hanging

Sorry... a long "lunch"... It is Friday! ;-)


On 03/07/2014 11:10 AM, Trond Myklebust wrote:
>
> On Mar 7, 2014, at 10:47, Steve Dickson <[email protected]> wrote:
>
>>
>>
>> On 03/07/2014 10:36 AM, Trond Myklebust wrote:
>>>
>>> On Mar 7, 2014, at 10:02, Steve Dickson <[email protected]> wrote:
>>>
>>>> Background mounts hang forever due to the kernel not returning
>>>> the time out error. The proposed fix is twofold, one in the kernel
>>>> and one in the mounting code.
>>>>
>>>> The kernel patch stop the server trunking code from endlessly
>>>> looping in the kernel on -ETIMEDOUT errors. Instead, the code
>>>> will now return the error, allowing the mount to go into
>>>> the background.
>>>>
>>>> Unfortunately, it takes over 5 mins for this timeout to
>>>> happen, due the default retry strategy, which is unacceptable
>>>> for background mounts.
>>>>
>>>> So the patch I will be proposing for the mount code will be
>>>> to append the "retrans=1,timeo=100" mount options to the parent
>>>> mount of the background mount (when they don't exist). This
>>>> causes the parent mount to timeout in ~25sec.
>>>
>>> We already have a ?retry=? option for mount.nfs. According to the manpage, that should be used to specify the timeout value. Why not reuse that?
>> Because it didn't work... retrans and timeo had most effect on the initial times set
>> in nfs_init_timeout_values()
>>
>>>
>>> Also, it really would be better if that timeout were under control of the mount utility itself.
>> Using those options, it is under the control of mount, unless I'm misunderstanding you...
>>
>>> How about if we allow the use of alarm() to interrupt that particular RPC call?
>> Why just use the mechanisms that already exist? Why invent a new one? Was my reasoning...
>
> alarm() is hardly a ?new? mechanism. It is the standard way of doing
> this thing in user space, and should, in fact, already work with existing kernels,
> since they allow fatal signals to interrupt all killable NFS and RPC sleeps.
I meant a new mechanism to the mount command... not that alarm() is a new mechanism.

>
> The point is that relying on ?retrans? and ?timeo? in this context is likely to be frustrating.
> ?retrans? and ?timeo? act on a per RPC call, and there are many RPC calls involved in a single
> NFSv4/v4.1 mount call.
Right and that's what I was thinking we needed...

Reading the nfs(5) man page it says "a timeout or failure causes the
mount(8) command to fork a child..." The key word being "a". So we what
the mount to go into background on the first timeout, which what happens
when you set ?retrans? and ?timeo?.

Plus it appears v3 works that way... On the first timeout the mount command forks...

> Furthermore, the server may reply with something like DELAY or equivalent,
> which doesn?t trigger a timeout, but keeps the kernel retrying the same RPC
> call over and over again.
> Then there is the possibility that the hang may occur somewhere other than in the
> one place you chose (for instance in the path walk). What then?
I was say that's a kernel bug... a timeout is a very legitimate error to return.

>
> We can?t and we won?t add a load of stuff to the kernel to catch all the possible
> sources of delay for a mount operation.
A "delay" is different than an timeout. A timeout is an error and a delay is not...
Looping in the kernel *forever* due to a timeout error that is easily
manged by the userspace, is a kernel bug... IMHO...

The client should not make the assumption the userspace does not
want to know about timeout errors. On the contrary, it *needs* to
know about these errors so they can do something about it...

> That?s why if we can do it in userspace, then we should.
This is exactly what I want to do... Have the userspace manage
timeouts... But it has to get them, to manage them.

Using the alarm() system call bases the decision of when
to fork on a arbitrary number of seconds. Using ?retrans?
and ?timeo? bases that decision on an *actual* timeout

Why interrupt a perfectly good RPC just because due
to an arbitrary number of seconds? Let the RPC
timeout and simply report that fact....

steved.

2014-03-07 15:47:51

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH] Stop Background mounts hang from hanging



On 03/07/2014 10:36 AM, Trond Myklebust wrote:
>
> On Mar 7, 2014, at 10:02, Steve Dickson <[email protected]> wrote:
>
>> Background mounts hang forever due to the kernel not returning
>> the time out error. The proposed fix is twofold, one in the kernel
>> and one in the mounting code.
>>
>> The kernel patch stop the server trunking code from endlessly
>> looping in the kernel on -ETIMEDOUT errors. Instead, the code
>> will now return the error, allowing the mount to go into
>> the background.
>>
>> Unfortunately, it takes over 5 mins for this timeout to
>> happen, due the default retry strategy, which is unacceptable
>> for background mounts.
>>
>> So the patch I will be proposing for the mount code will be
>> to append the "retrans=1,timeo=100" mount options to the parent
>> mount of the background mount (when they don't exist). This
>> causes the parent mount to timeout in ~25sec.
>
> We already have a ?retry=? option for mount.nfs. According to the manpage, that should be used to specify the timeout value. Why not reuse that?
Because it didn't work... retrans and timeo had most effect on the initial times set
in nfs_init_timeout_values()

>
> Also, it really would be better if that timeout were under control of the mount utility itself.
Using those options, it is under the control of mount, unless I'm misunderstanding you...

> How about if we allow the use of alarm() to interrupt that particular RPC call?
Why just use the mechanisms that already exist? Why invent a new one? Was my reasoning...

steved.

>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2014-03-07 15:02:49

by Steve Dickson

[permalink] [raw]
Subject: [PATCH] NFSv4: Don't retry server trunking discovery on timeouts

To allow background mounts to process into background,
server trunking discovery needs to return the -ETIMEDOUT
error to the mount command instead of endless retrying in
the kernel.

Signed-off-by: Steve Dickson <[email protected]>
---
fs/nfs/nfs4state.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e1a4721..be4b33f 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2085,6 +2085,7 @@ again:
break;
case -NFS4ERR_DELAY:
case -ETIMEDOUT:
+ break;
case -EAGAIN:
ssleep(1);
case -NFS4ERR_STALE_CLIENTID:
--
1.7.1


2014-03-07 19:45:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Stop Background mounts hang from hanging


On Mar 7, 2014, at 14:24, Steve Dickson <[email protected]> wrote:

> Sorry... a long "lunch"... It is Friday! ;-)
>
>
> On 03/07/2014 11:10 AM, Trond Myklebust wrote:
>>
>> On Mar 7, 2014, at 10:47, Steve Dickson <[email protected]> wrote:
>>
>>>
>>>
>>> On 03/07/2014 10:36 AM, Trond Myklebust wrote:
>>>>
>>>> On Mar 7, 2014, at 10:02, Steve Dickson <[email protected]> wrote:
>>>>
>>>>> Background mounts hang forever due to the kernel not returning
>>>>> the time out error. The proposed fix is twofold, one in the kernel
>>>>> and one in the mounting code.
>>>>>
>>>>> The kernel patch stop the server trunking code from endlessly
>>>>> looping in the kernel on -ETIMEDOUT errors. Instead, the code
>>>>> will now return the error, allowing the mount to go into
>>>>> the background.
>>>>>
>>>>> Unfortunately, it takes over 5 mins for this timeout to
>>>>> happen, due the default retry strategy, which is unacceptable
>>>>> for background mounts.
>>>>>
>>>>> So the patch I will be proposing for the mount code will be
>>>>> to append the "retrans=1,timeo=100" mount options to the parent
>>>>> mount of the background mount (when they don't exist). This
>>>>> causes the parent mount to timeout in ~25sec.
>>>>
>>>> We already have a ?retry=? option for mount.nfs. According to the manpage, that should be used to specify the timeout value. Why not reuse that?
>>> Because it didn't work... retrans and timeo had most effect on the initial times set
>>> in nfs_init_timeout_values()
>>>
>>>>
>>>> Also, it really would be better if that timeout were under control of the mount utility itself.
>>> Using those options, it is under the control of mount, unless I'm misunderstanding you...
>>>
>>>> How about if we allow the use of alarm() to interrupt that particular RPC call?
>>> Why just use the mechanisms that already exist? Why invent a new one? Was my reasoning...
>>
>> alarm() is hardly a ?new? mechanism. It is the standard way of doing
>> this thing in user space, and should, in fact, already work with existing kernels,
>> since they allow fatal signals to interrupt all killable NFS and RPC sleeps.
> I meant a new mechanism to the mount command... not that alarm() is a new mechanism.
>
>>
>> The point is that relying on ?retrans? and ?timeo? in this context is likely to be frustrating.
>> ?retrans? and ?timeo? act on a per RPC call, and there are many RPC calls involved in a single
>> NFSv4/v4.1 mount call.
> Right and that's what I was thinking we needed...
>
> Reading the nfs(5) man page it says "a timeout or failure causes the
> mount(8) command to fork a child..." The key word being "a". So we what
> the mount to go into background on the first timeout, which what happens
> when you set ?retrans? and ?timeo?.
>
> Plus it appears v3 works that way... On the first timeout the mount command forks?

Why do we document the internals of how ?mount? operates? There is no reason why the user should care.

In fact, a much better way to achieve the exact same result entirely in user space would be simply to fork immediately, set a timer, and then wait for either SIGCHLD to tell us that the child is done mounting, or SIGALRM to tell us that we've timed out.

>> Furthermore, the server may reply with something like DELAY or equivalent,
>> which doesn?t trigger a timeout, but keeps the kernel retrying the same RPC
>> call over and over again.
>> Then there is the possibility that the hang may occur somewhere other than in the
>> one place you chose (for instance in the path walk). What then?
> I was say that's a kernel bug... a timeout is a very legitimate error to return.

We?d have to somehow let the VFS and the NFS layers know that this is a special path walk that can time out. Why should we do that?

>> We can?t and we won?t add a load of stuff to the kernel to catch all the possible
>> sources of delay for a mount operation.
> A "delay" is different than an timeout. A timeout is an error and a delay is not...
> Looping in the kernel *forever* due to a timeout error that is easily
> manged by the userspace, is a kernel bug... IMHO...
>
> The client should not make the assumption the userspace does not
> want to know about timeout errors. On the contrary, it *needs* to
> know about these errors so they can do something about it...
>
>> That?s why if we can do it in userspace, then we should.
> This is exactly what I want to do... Have the userspace manage
> timeouts... But it has to get them, to manage them.
>
> Using the alarm() system call bases the decision of when
> to fork on a arbitrary number of seconds. Using ?retrans?
> and ?timeo? bases that decision on an *actual* timeout
>
> Why interrupt a perfectly good RPC just because due
> to an arbitrary number of seconds? Let the RPC
> timeout and simply report that fact?.

The only difference I can see between the two is that in one case, the timer is set by userland, while in the other, it is set by the kernel. Again, why should a user care?

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-07 20:20:25

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH] Stop Background mounts hang from hanging



On 03/07/2014 02:45 PM, Trond Myklebust wrote:
>
> On Mar 7, 2014, at 14:24, Steve Dickson <[email protected]> wrote:
>
>> Sorry... a long "lunch"... It is Friday! ;-)
>>
>>
>> On 03/07/2014 11:10 AM, Trond Myklebust wrote:
>>>
>>> On Mar 7, 2014, at 10:47, Steve Dickson <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On 03/07/2014 10:36 AM, Trond Myklebust wrote:
>>>>>
>>>>> On Mar 7, 2014, at 10:02, Steve Dickson <[email protected]> wrote:
>>>>>
>>>>>> Background mounts hang forever due to the kernel not returning
>>>>>> the time out error. The proposed fix is twofold, one in the kernel
>>>>>> and one in the mounting code.
>>>>>>
>>>>>> The kernel patch stop the server trunking code from endlessly
>>>>>> looping in the kernel on -ETIMEDOUT errors. Instead, the code
>>>>>> will now return the error, allowing the mount to go into
>>>>>> the background.
>>>>>>
>>>>>> Unfortunately, it takes over 5 mins for this timeout to
>>>>>> happen, due the default retry strategy, which is unacceptable
>>>>>> for background mounts.
>>>>>>
>>>>>> So the patch I will be proposing for the mount code will be
>>>>>> to append the "retrans=1,timeo=100" mount options to the parent
>>>>>> mount of the background mount (when they don't exist). This
>>>>>> causes the parent mount to timeout in ~25sec.
>>>>>
>>>>> We already have a ?retry=? option for mount.nfs. According to the manpage, that should be used to specify the timeout value. Why not reuse that?
>>>> Because it didn't work... retrans and timeo had most effect on the initial times set
>>>> in nfs_init_timeout_values()
>>>>
>>>>>
>>>>> Also, it really would be better if that timeout were under control of the mount utility itself.
>>>> Using those options, it is under the control of mount, unless I'm misunderstanding you...
>>>>
>>>>> How about if we allow the use of alarm() to interrupt that particular RPC call?
>>>> Why just use the mechanisms that already exist? Why invent a new one? Was my reasoning...
>>>
>>> alarm() is hardly a ?new? mechanism. It is the standard way of doing
>>> this thing in user space, and should, in fact, already work with existing kernels,
>>> since they allow fatal signals to interrupt all killable NFS and RPC sleeps.
>> I meant a new mechanism to the mount command... not that alarm() is a new mechanism.
>>
>>>
>>> The point is that relying on ?retrans? and ?timeo? in this context is likely to be frustrating.
>>> ?retrans? and ?timeo? act on a per RPC call, and there are many RPC calls involved in a single
>>> NFSv4/v4.1 mount call.
>> Right and that's what I was thinking we needed...
>>
>> Reading the nfs(5) man page it says "a timeout or failure causes the
>> mount(8) command to fork a child..." The key word being "a". So we what
>> the mount to go into background on the first timeout, which what happens
>> when you set ?retrans? and ?timeo?.
>>
>> Plus it appears v3 works that way... On the first timeout the mount command forks?
>
> Why do we document the internals of how ?mount? operates? There is no reason why the user should care.
I guess I never looked at it that way... Documenting the cause and effect
of a mount option was a good thing..

>
> In fact, a much better way to achieve the exact same result entirely in user space would be simply
> to fork immediately, set a timer, and then wait for either SIGCHLD to tell us that the
> child is done mounting, or SIGALRM to tell us that we've timed out.
As soon as I got push back from the kernel not returning valid error... ;-)
I thought about forking immediately and just letting the mount endlessly
spin in the kernel... 8-)

I just don't think introducing interrupts/signals to in-flight RPCs is
a good idea...

>
>>> Furthermore, the server may reply with something like DELAY or equivalent,
>>> which doesn?t trigger a timeout, but keeps the kernel retrying the same RPC
>>> call over and over again.
>>> Then there is the possibility that the hang may occur somewhere other than in the
>>> one place you chose (for instance in the path walk). What then?
>> I was say that's a kernel bug... a timeout is a very legitimate error to return.
>
> We?d have to somehow let the VFS and the NFS layers know that this is a special
> path walk that can time out. Why should we do that?
Educate me... Why in the world the VFS layer care about a network time out?

>
>>> We can?t and we won?t add a load of stuff to the kernel to catch all the possible
>>> sources of delay for a mount operation.
>> A "delay" is different than an timeout. A timeout is an error and a delay is not...
>> Looping in the kernel *forever* due to a timeout error that is easily
>> manged by the userspace, is a kernel bug... IMHO...
>>
>> The client should not make the assumption the userspace does not
>> want to know about timeout errors. On the contrary, it *needs* to
>> know about these errors so they can do something about it...
>>
>>> That?s why if we can do it in userspace, then we should.
>> This is exactly what I want to do... Have the userspace manage
>> timeouts... But it has to get them, to manage them.
>>
>> Using the alarm() system call bases the decision of when
>> to fork on a arbitrary number of seconds. Using ?retrans?
>> and ?timeo? bases that decision on an *actual* timeout
>>
>> Why interrupt a perfectly good RPC just because due
>> to an arbitrary number of seconds? Let the RPC
>> timeout and simply report that fact?.
>
> The only difference I can see between the two is that in
> one case, the timer is set by userland, while in the other,
> it is set by the kernel. Again, why should a user care?
They don't and shouldn't... My point is the alarm timeout
is arbitrary... the kernel time out is because of a change
of "state" (aka an error). I would much rather base decisions
on facts verses arbitrary guesses..

At the end of the day... I'll go ahead and fork immediately on
background mounts and let the mount spin in the kernel,
but not returning timeout error, which are very manageable errors,
is not a good thing... IMHO...

steved.