2016-02-12 06:45:11

by Toshiaki Makita

[permalink] [raw]
Subject: [PATCH nfs-utils] statd: Don't unregister statd service on failing to execute callout

statd calls atexit(statd_unregister) to unregister statd service on exit,
which actually has a side-effect that ha_callout() unregisters statd
service even when the child callout process exits on execl() failure.

Certain clustering software's deployment script adds -H option with its
specified file non-existent, when it is configured not to use callout.
In other words, -H seems to be used no matter if callout is needed or not,
but when callout is unnecessary, the specified callout program is not
deployed.
This causes statd not to work once a lock is requested by its NFS client,
as execl() in ha_callout() results in ENOENT and exit() of the child
process calls exit-handler statd_unregister(). Eventually, the NFS client
gets stuck with messages "lockd: cannot monitor xxx" on the NFS server.

Although this may not be an expected way of using -H option, it would be
better if statd could continue to work even in that situation. Also,
execl() could fail for other reasons like ENFILE and EIO, where statd
service should not be unregistered as well.
Call _exit(), which does not call any exit-handlers, instead of exit() to
take care of those situations and make statd more reliable.

Signed-off-by: Toshiaki Makita <[email protected]>
---
support/include/ha-callout.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/support/include/ha-callout.h b/support/include/ha-callout.h
index 1164336..a454bdb 100644
--- a/support/include/ha-callout.h
+++ b/support/include/ha-callout.h
@@ -47,7 +47,7 @@ ha_callout(char *event, char *arg1, char *arg2, int arg3)
arg3 < 0 ? NULL : buf,
NULL);
perror("execl");
- exit(2);
+ _exit(2);
case -1: perror("fork");
break;
default: pid = waitpid(pid, &ret, 0);
--
1.7.1





2016-02-12 15:40:36

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils] statd: Don't unregister statd service on failing to execute callout

Hi-


> On Feb 12, 2016, at 1:41 AM, Toshiaki Makita <[email protected]> wrote:
>
> statd calls atexit(statd_unregister) to unregister statd service on exit,
> which actually has a side-effect that ha_callout() unregisters statd
> service even when the child callout process exits on execl() failure.
>
> Certain clustering software's deployment script adds -H option with its
> specified file non-existent, when it is configured not to use callout.
> In other words, -H seems to be used no matter if callout is needed or not,
> but when callout is unnecessary, the specified callout program is not
> deployed.
> This causes statd not to work once a lock is requested by its NFS client,
> as execl() in ha_callout() results in ENOENT and exit() of the child
> process calls exit-handler statd_unregister(). Eventually, the NFS client
> gets stuck with messages "lockd: cannot monitor xxx" on the NFS server.
>
> Although this may not be an expected way of using -H option, it would be
> better if statd could continue to work even in that situation. Also,
> execl() could fail for other reasons like ENFILE and EIO, where statd
> service should not be unregistered as well.
> Call _exit(), which does not call any exit-handlers, instead of exit() to
> take care of those situations and make statd more reliable.

OK, but I think the explanation could be simpler? It doesn't
seem like a matter of "it would be better if" but more like
"a forked child must not unregister the statd RPC server."
In other words, this seems like a real bug to me, not an
enhancement.

Reviewed-by: Chuck Lever <[email protected]>


> Signed-off-by: Toshiaki Makita <[email protected]>
> ---
> support/include/ha-callout.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/support/include/ha-callout.h b/support/include/ha-callout.h
> index 1164336..a454bdb 100644
> --- a/support/include/ha-callout.h
> +++ b/support/include/ha-callout.h
> @@ -47,7 +47,7 @@ ha_callout(char *event, char *arg1, char *arg2, int arg3)
> arg3 < 0 ? NULL : buf,
> NULL);
> perror("execl");
> - exit(2);
> + _exit(2);
> case -1: perror("fork");
> break;
> default: pid = waitpid(pid, &ret, 0);
> --
> 1.7.1
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever





2016-02-15 07:44:35

by Toshiaki Makita

[permalink] [raw]
Subject: Re: [PATCH nfs-utils] statd: Don't unregister statd service on failing to execute callout

On 2016/02/13 0:40, Chuck Lever wrote:
> Hi-

Hi,

>> On Feb 12, 2016, at 1:41 AM, Toshiaki Makita <[email protected]> wrote:
>>
>> statd calls atexit(statd_unregister) to unregister statd service on exit,
>> which actually has a side-effect that ha_callout() unregisters statd
>> service even when the child callout process exits on execl() failure.
>>
>> Certain clustering software's deployment script adds -H option with its
>> specified file non-existent, when it is configured not to use callout.
>> In other words, -H seems to be used no matter if callout is needed or not,
>> but when callout is unnecessary, the specified callout program is not
>> deployed.
>> This causes statd not to work once a lock is requested by its NFS client,
>> as execl() in ha_callout() results in ENOENT and exit() of the child
>> process calls exit-handler statd_unregister(). Eventually, the NFS client
>> gets stuck with messages "lockd: cannot monitor xxx" on the NFS server.
>>
>> Although this may not be an expected way of using -H option, it would be
>> better if statd could continue to work even in that situation. Also,
>> execl() could fail for other reasons like ENFILE and EIO, where statd
>> service should not be unregistered as well.
>> Call _exit(), which does not call any exit-handlers, instead of exit() to
>> take care of those situations and make statd more reliable.
>
> OK, but I think the explanation could be simpler? It doesn't
> seem like a matter of "it would be better if" but more like
> "a forked child must not unregister the statd RPC server."
> In other words, this seems like a real bug to me, not an
> enhancement.

Thank you for your feedback.

Here is a simpler one.
If it looks fine to you, I'll send v2 with your Reviewed-by.

---

-Although this may not be an expected way of using -H option, it would be
-better if statd could continue to work even in that situation. Also,
-execl() could fail for other reasons like ENFILE and EIO, where statd
-service should not be unregistered as well.
-Call _exit(), which does not call any exit-handlers, instead of exit() to
-take care of those situations and make statd more reliable.

+Also, execl() could fail for other reasons like ENFILE or EIO as well.
+
+A forked child must not unregister the statd RPC server, so use
+_exit(), which does not call any exit-handlers, instead of exit().

---
Regards,
Toshiaki Makita



2016-02-15 14:46:49

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils] statd: Don't unregister statd service on failing to execute callout


> On Feb 15, 2016, at 2:43 AM, Toshiaki Makita <[email protected]> wrote:
>
> On 2016/02/13 0:40, Chuck Lever wrote:
>> Hi-
>
> Hi,
>
>>> On Feb 12, 2016, at 1:41 AM, Toshiaki Makita <[email protected]> wrote:
>>>
>>> statd calls atexit(statd_unregister) to unregister statd service on exit,
>>> which actually has a side-effect that ha_callout() unregisters statd
>>> service even when the child callout process exits on execl() failure.
>>>
>>> Certain clustering software's deployment script adds -H option with its
>>> specified file non-existent, when it is configured not to use callout.
>>> In other words, -H seems to be used no matter if callout is needed or not,
>>> but when callout is unnecessary, the specified callout program is not
>>> deployed.
>>> This causes statd not to work once a lock is requested by its NFS client,
>>> as execl() in ha_callout() results in ENOENT and exit() of the child
>>> process calls exit-handler statd_unregister(). Eventually, the NFS client
>>> gets stuck with messages "lockd: cannot monitor xxx" on the NFS server.
>>>
>>> Although this may not be an expected way of using -H option, it would be
>>> better if statd could continue to work even in that situation. Also,
>>> execl() could fail for other reasons like ENFILE and EIO, where statd
>>> service should not be unregistered as well.
>>> Call _exit(), which does not call any exit-handlers, instead of exit() to
>>> take care of those situations and make statd more reliable.
>>
>> OK, but I think the explanation could be simpler? It doesn't
>> seem like a matter of "it would be better if" but more like
>> "a forked child must not unregister the statd RPC server."
>> In other words, this seems like a real bug to me, not an
>> enhancement.
>
> Thank you for your feedback.
>
> Here is a simpler one.
> If it looks fine to you, I'll send v2 with your Reviewed-by.

Looks good, thanks for the update.


> ---
>
> -Although this may not be an expected way of using -H option, it would be
> -better if statd could continue to work even in that situation. Also,
> -execl() could fail for other reasons like ENFILE and EIO, where statd
> -service should not be unregistered as well.
> -Call _exit(), which does not call any exit-handlers, instead of exit() to
> -take care of those situations and make statd more reliable.
>
> +Also, execl() could fail for other reasons like ENFILE or EIO as well.
> +
> +A forked child must not unregister the statd RPC server, so use
> +_exit(), which does not call any exit-handlers, instead of exit().
>
> ---
> Regards,
> Toshiaki Makita

--
Chuck Lever