2009-08-24 14:57:44

by Ian Kent

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Carlos Andr=E9 wrote:
> Hi Ian,
>=20
> Thanks for patch and sorry for delay (i'm expecting receive u reply o=
n
> bug track, not here) :)
>=20
> But, this patch doesnt worked to me like expected... :(
>=20
>=20
> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
> and later changed "10" to "2" with same results...
> (always restarting service, of course :)
>=20
> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but i got
> same results again.
>=20
> Or i'm doing something wrong?
>=20
>=20
> [root@KSTATION areas]# automount -V
>=20
> Linux automount version 5.0.1-0.rc2.131.bz517349.1
> [...]
>=20
> [root@KSTATION areas]# time ls -la testdown
> ls: testedown: No such file or directory
>=20
> real 3m9.006s
> user 0m0.002s
> sys 0m0.000s

OK, that isn't behaving the way I expect, I'll have a look.

>=20
>=20
> LOGGING:
> -----------------------------------------
> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount: mount(nfs):
> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/testdow=
n
> /misc/areas/testdown
> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token =3D=
91
> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/area=
s/testdown
> -----------------------------------------
>=20
>=20
>=20
>=20
>=20
> 2009/8/17 Ian Kent <[email protected]>:
>> On Thu, 2009-08-13 at 12:18 -0300, Carlos Andr=E9 wrote:
>>> Filled bug report:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=3D517349
>> Hi Carlos,
>>
>> I have a patched source rpm to add a mount wait parameter to autofs
>> located at:
>> http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1
>>
>> Could you build it and see if it works.
>> I haven't tested it at all but it is fairly straight forward.
>> It is still unclear if this is the right way to do this and what the
>> consequences are in sending a term signal to mount. This mount reque=
st
>> will likely be followed by other requests for the same mount causing=
an
>> accumulation of mount(8) processes waiting for RPC timeouts before t=
hey
>> can answer the TERM signal.
>>
>> Anyway, for information the patch included in the source rpm above i=
s:
>>
>> autofs-5.0.4 - add mount wait parameter
>>
>> From: Ian Kent <[email protected]>
>>
>> Often delays when trying to mount from a server that is not repondin=
g
>> for some reason are undesirable. To try and prevent these delays we
>> provide a configuration setting to limit the time that we wait for
>> our spawned mount(8) process to complete before sending it a SIGTERM
>> signal. This patch adds a configuration parameter to allow us to
>> request we limit the time we wait for mount(8) to complete before
>> send it a TERM signal.
>> ---
>>
>> daemon/spawn.c | 3 ++-
>> include/defaults.h | 2 ++
>> lib/defaults.c | 13 +++++++++++++
>> man/auto.master.5.in | 7 +++++++
>> redhat/autofs.sysconfig.in | 9 +++++++++
>> samples/autofs.conf.default.in | 9 +++++++++
>> 6 files changed, 42 insertions(+), 1 deletion(-)
>>
>>
>> --- autofs-5.0.1.orig/daemon/spawn.c
>> +++ autofs-5.0.1/daemon/spawn.c
>> @@ -312,6 +312,7 @@ int spawn_mount(unsigned logopt, ...)
>> unsigned int options;
>> unsigned int retries =3D MTAB_LOCK_RETRIES;
>> int update_mtab =3D 1, ret, printed =3D 0;
>> + unsigned int wait =3D defaults_get_mount_wait();
>> char buf[PATH_MAX];
>>
>> /* If we use mount locking we can't validate the location */
>> @@ -353,7 +354,7 @@ int spawn_mount(unsigned logopt, ...)
>> va_end(arg);
>>
>> while (retries--) {
>> - ret =3D do_spawn(logopt, -1, options, prog, (const c=
har **) argv);
>> + ret =3D do_spawn(logopt, wait, options, prog, (const=
char **) argv);
>> if (ret & MTAB_NOTUPDATED) {
>> struct timespec tm =3D {3, 0};
>>
>> --- autofs-5.0.1.orig/include/defaults.h
>> +++ autofs-5.0.1/include/defaults.h
>> @@ -24,6 +24,7 @@
>>
>> #define DEFAULT_TIMEOUT 600
>> #define DEFAULT_NEGATIVE_TIMEOUT 60
>> +#define DEFAULT_MOUNT_WAIT -1
>> #define DEFAULT_UMOUNT_WAIT 12
>> #define DEFAULT_BROWSE_MODE 1
>> #define DEFAULT_LOGGING 0
>> @@ -62,6 +63,7 @@ struct ldap_schema *defaults_get_schema(
>> struct ldap_searchdn *defaults_get_searchdns(void);
>> void defaults_free_searchdns(struct ldap_searchdn *);
>> unsigned int defaults_get_append_options(void);
>> +unsigned int defaults_get_mount_wait(void);
>> unsigned int defaults_get_umount_wait(void);
>> const char *defaults_get_auth_conf_file(void);
>> unsigned int defaults_get_map_hash_table_size(void);
>> --- autofs-5.0.1.orig/lib/defaults.c
>> +++ autofs-5.0.1/lib/defaults.c
>> @@ -45,6 +45,7 @@
>> #define ENV_NAME_VALUE_ATTR "VALUE_ATTRIBUTE"
>>
>> #define ENV_APPEND_OPTIONS "APPEND_OPTIONS"
>> +#define ENV_MOUNT_WAIT "MOUNT_WAIT"
>> #define ENV_UMOUNT_WAIT "UMOUNT_WAIT"
>> #define ENV_AUTH_CONF_FILE "AUTH_CONF_FILE"
>>
>> @@ -323,6 +324,7 @@ unsigned int defaults_read_config(unsign
>> check_set_config_value(key, ENV_NAME_ENTRY_ATTR, =
value, to_syslog) ||
>> check_set_config_value(key, ENV_NAME_VALUE_ATTR, =
value, to_syslog) ||
>> check_set_config_value(key, ENV_APPEND_OPTIONS, v=
alue, to_syslog) ||
>> + check_set_config_value(key, ENV_MOUNT_WAIT, valu=
e, to_syslog) ||
>> check_set_config_value(key, ENV_UMOUNT_WAIT, valu=
e, to_syslog) ||
>> check_set_config_value(key, ENV_AUTH_CONF_FILE, v=
alue, to_syslog) ||
>> check_set_config_value(key, ENV_MAP_HASH_TABLE_SI=
ZE, value, to_syslog))
>> @@ -652,6 +654,17 @@ unsigned int defaults_get_append_options
>> return res;
>> }
>>
>> +unsigned int defaults_get_mount_wait(void)
>> +{
>> + long wait;
>> +
>> + wait =3D get_env_number(ENV_MOUNT_WAIT);
>> + if (wait < 0)
>> + wait =3D DEFAULT_MOUNT_WAIT;
>> +
>> + return (unsigned int) wait;
>> +}
>> +
>> unsigned int defaults_get_umount_wait(void)
>> {
>> long wait;
>> --- autofs-5.0.1.orig/man/auto.master.5.in
>> +++ autofs-5.0.1/man/auto.master.5.in
>> @@ -175,6 +175,13 @@ Set the default timeout for caching fail
>> 60). If the equivalent command line option is given it will overrid=
e this
>> setting.
>> .TP
>> +.B MOUNT_WAIT
>> +Set the default time to wait for a response from a spawned mount(8)
>> +before sending it a SIGTERM. Note that we still need to wait for th=
e
>> +RPC layer to timeout before the sub-process exits so this isn't ide=
al
>> +but it is the best we can do. The default is to wait until mount(8)
>> +returns without intervention.
>> +.TP
>> .B UMOUNT_WAIT
>> Set the default time to wait for a response from a spawned umount(8=
)
>> before sending it a SIGTERM. Note that we still need to wait for th=
e
>> --- autofs-5.0.1.orig/redhat/autofs.sysconfig.in
>> +++ autofs-5.0.1/redhat/autofs.sysconfig.in
>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>> #
>> #NEGATIVE_TIMEOUT=3D60
>> #
>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>> +# Setting this timeout can cause problems when
>> +# mount would otherwise wait for a server that
>> +# is temporarily unavailable, such as when it's
>> +# restarting. The defailt of waiting for mount(8)
>> +# usually results in a wait of around 3 minutes.
>> +#
>> +#MOUNT_WAIT=3D-1
>> +#
>> # UMOUNT_WAIT - time to wait for a response from umount(8).
>> #
>> #UMOUNT_WAIT=3D12
>> --- autofs-5.0.1.orig/samples/autofs.conf.default.in
>> +++ autofs-5.0.1/samples/autofs.conf.default.in
>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>> #
>> #NEGATIVE_TIMEOUT=3D60
>> #
>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>> +# Setting this timeout can cause problems when
>> +# mount would otherwise wait for a server that
>> +# is temporarily unavailable, such as when it's
>> +# restarting. The defailt of waiting for mount(8)
>> +# usually results in a wait of around 3 minutes.
>> +#
>> +#MOUNT_WAIT=3D-1
>> +#
>> # UMOUNT_WAIT - time to wait for a response from umount(8).
>> #
>> #UMOUNT_WAIT=3D12
>>
>>
>>> Thanks!
>>>
>>> 2009/8/13 Carlos Andr=E9 <[email protected]>:
>>>> 2009/8/13 Ian Kent <[email protected]>:
>>>>> Carlos Andr=E9 wrote:
>>>>>> Today (2009-08-12) I'm using:
>>>>>> kernel-2.6.18-128.2.1.el5
>>>>>> autofs-5.0.1-0.rc2.102.el5_3.1
>>>>> Thanks,
>>>>>
>>>>> My mistake, the wait time I was referring to is used for umounts =
during
>>>>> expires and is present in rev rc2.102.
>>>>>
>>>>> It shouldn't be hard to add this for mount as well.
>>>>> Would you like me to put something together?
>>>> Sure! that 'll help me a lot (and for sure another ppl) :) Thanks =
:)
>>>>
>>>>> Probably would be good to test something out to see if we can mak=
e a
>>>>> difference with the killing mount after some configured timeout b=
ut, if
>>>>> we make progress, probably the best way to deal with it is for yo=
u to
>>>>> log a bug against rhel-5 so I can get it committed to the rhel pa=
ckage.
>>>>> The possible issue is that I'm not sure if the RPC subsystem in t=
he
>>>>> above rhel kernel will respond well to process death with potenti=
al
>>>>> outstanding requests. But we'll see.
>>>> Ok, on my way :)
>>>>
>>>> Thanks a lot!
>>>>
>>>>>>
>>>>>> Look my last test:
>>>>>> --------------------------------------------------------------
>>>>>> [root@KSTATION areas]# time ls testdown
>>>>>> ls: testdown: No such file or directory
>>>>>>
>>>>>> real 3m9.025s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.002s
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun)=
:
>>>>>> mounting root /misc/areas, mountpoint testdown, what
>>>>>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nf=
s):
>>>>>> root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdow=
n,
>>>>>> fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nf=
s):
>>>>>> nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=
=3D0, ro=3D0
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nf=
s):
>>>>>> calling mkdir_path /misc/areas/testdown
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nf=
s):
>>>>>> calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D=
0
>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 pa=
th /misc
>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc=
=3D
>>>>>> 3078093712 path /misc
>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect:=
2
>>>>>> submounts remaining in /misc
>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got t=
hid
>>>>>> 3078093712 path /misc stat 3
>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigch=
ld:
>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready():=
state
>>>>>> =3D 2 path /misc
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 pa=
th /misc
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc=
=3D
>>>>>> 3078093712 path /misc
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect:=
2
>>>>>> submounts remaining in /misc
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got t=
hid
>>>>>> 3078093712 path /misc stat 3
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigch=
ld:
>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready():=
state
>>>>>> =3D 2 path /misc
>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NF=
S
>>>>>> server '1.2.3.4' failed: timed out (giving up).
>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: moun=
t
>>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D =
17
>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc=
/areas/testdown
>>>>>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 pa=
th /misc
>>>>>> --------------------------------------------------------------
>>>>>>
>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>> Carlos Andr=E9 wrote:
>>>>>>>> Hi Ian,
>>>>>>>> I'm getting crazy trying put "retry=3D" to work on mount... th=
is option
>>>>>>>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/=
krb5i/krb5p)
>>>>>>>> like you can see on my previous emails...
>>>>>>> Right, my mistake for not looking closely enough at post.
>>>>>>>
>>>>>>> Maybe this is related to the same sort of problem we had with m=
ount in
>>>>>>> the past, before the options parsing went into the kernel, wher=
e other
>>>>>>> services, like portmapper (or rpcbind), were being done with di=
fferent
>>>>>>> timeout parameters before the RPC calls for mounting. That's ju=
st an
>>>>>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>>>>>>
>>>>>>> But what version of autofs and kernel did you say you were usin=
g?
>>>>>>>
>>>>>>>> I appreciate any help.
>>>>>>>>
>>>>>>>> Carlos.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>>>> Chuck Lever wrote:
>>>>>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>>>>>>>> This long timeout is good if workstation need mount a criti=
cal
>>>>>>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>>>>>>> But in my case, using this loooong timeout doesnt make any =
sense,
>>>>>>>>>>> since autofs retry mount directory on-access. This in fact =
gives me
>>>>>>>>>>> alot of headaches, coz user login 'll just hangs if one ser=
ver goes
>>>>>>>>>>> down for any reason, and will again hangs if user try acces=
s directory
>>>>>>>>>>> pointing to a NFS down server...
>>>>>>>>>> "retry=3D0" means the mount command will fail as soon as the=
first
>>>>>>>>>> mount(2) system call fails. When you set SYN retries to 1, =
this means
>>>>>>>>>> after 9 seconds, the connect fails, and that causes the moun=
t(2) system
>>>>>>>>>> call to fail.
>>>>>>>>>>
>>>>>>>>>> Recent conversations with Ian suggested that a long timeout =
was desired
>>>>>>>>>> for automounter as well as other cases. Ian, is there somet=
hing else we
>>>>>>>>>> need to consider to determine the correct retry timeout for =
NFS/TCP
>>>>>>>>>> mount points handled via automounter? How should mount.nfs =
wait so we
>>>>>>>>>> don't make other use cases worse? (Looks like most of the h=
istory is
>>>>>>>>>> intact below).
>>>>>>>>> Of course we know that autofs is entirely at the mercy of mou=
nt(8) (and
>>>>>>>>> mount.nfs in particular). This has always been a difficult si=
tuation for
>>>>>>>>> the automounter because interactive mount invocations should =
wait. But I
>>>>>>>>> believe automount mounts should always time out quickly, but =
that leads
>>>>>>>>> to its own set of problems, especially when home directories =
are concerned.
>>>>>>>>>
>>>>>>>>> I think adding "retry=3D0" is the right thing to do myself bu=
t I'm not
>>>>>>>>> certain that will work as we expect. I'll have to do some exp=
erimentation.
>>>>>>>>>
>>>>>>>>>> How long do you think is appropriate for the automounter to =
wait if the
>>>>>>>>>> server is down, in your case, Carlos?
>>>>>>>>>>
>>>>>>>>>>> Am losing something or there have was something weirdo...!?
>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retr=
ies [DEFAULT]
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
-o
>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>> user 0m0.002s
>>>>>>>>>>> sys 0m0.001s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
-o
>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
-o
>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real 3m9.001s
>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>> sys 0m0.003s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
-o
>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real 3m9.001s
>>>>>>>>>>> user 0m0.002s
>>>>>>>>>>> sys 0m0.001s
>>>>>>>>>>>
>>>>>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retr=
ies [ 5 to 1 ]
>>>>>>>>>>>
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
-o
>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (ret=
rying). [x 6]
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real 1m3.002s
>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
-o
>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (ret=
rying). [x 13]
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real 2m6.000s
>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
-o
>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real 0m9.003s
>>>>>>>>>>> user 0m0.001s
>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
-o
>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (ret=
rying). [x 13]
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real 2m6.001s
>>>>>>>>>>> user 0m0.001s
>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>> [root@KSTATION ~]#
>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to=
1... and
>>>>>>>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>>>>>>>
>>>>>>>>>>> *sigh*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_sy=
n_retries to
>>>>>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>>>>>>> Right. Normally the RPC client calls the kernel's socket =
connect
>>>>>>>>>>>> function,
>>>>>>>>>>>> which does 6 SYN retries. That one call usually takes lon=
ger than
>>>>>>>>>>>> the RPC
>>>>>>>>>>>> client's connect timeout, so it only makes one connect cal=
l, and then
>>>>>>>>>>>> fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Reducing the number of SYN retries per connect attempt cau=
ses the RPC
>>>>>>>>>>>> client
>>>>>>>>>>>> to retry the connect call until its connect timeout expire=
s. Each
>>>>>>>>>>>> connect
>>>>>>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>>>>>>
>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nf=
s4 -o
>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (g=
iving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_ret=
ries
>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nf=
s4 -o
>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp ("retry=3D1" =3D no change)
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (g=
iving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 2m6.004s
>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>> sys 0m0.004s
>>>>>>>>>>>>>
>>>>>>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And u're right about expo retries... with tcpdump i've m=
onitored
>>>>>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 se=
cs on port
>>>>>>>>>>>>>> 2049...
>>>>>>>>>>>>>> I tried use "retry=3D1" option on mount without any chan=
ge... I dont
>>>>>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>>>>> Bruce, no... you're right. I'm describing a situation=
where my
>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) t=
han 3 minutes
>>>>>>>>>>>>>>>> and 9 seconds...
>>>>>>>>>>>>>>> The 189 second timeout is likely how long it takes the =
kernel to
>>>>>>>>>>>>>>> give up
>>>>>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN att=
empts with
>>>>>>>>>>>>>>> exponential retries, or something like that). For stoc=
k CentOS
>>>>>>>>>>>>>>> 5.3, I
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 moun=
ts -- the
>>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no pre=
ceding rpcbind
>>>>>>>>>>>>>>> request.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Carlos, let us know if you have replaced any NFS-relate=
d CentOS
>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>> (kernel, nfs-utils) with something you've built yoursel=
f.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halev=
y wrote:
>>>>>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candre=
[email protected]>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 serve=
r to work with
>>>>>>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goe=
s down i get a
>>>>>>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client=
=2E..
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logo=
n process, if
>>>>>>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to time=
out (if server
>>>>>>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinatio=
ns, there my
>>>>>>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1=
=2E10) using
>>>>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t n=
fs4 -o
>>>>>>>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (prot=
o=3Dtcp OR
>>>>>>>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until s=
how error
>>>>>>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving=
up))
>>>>>>>>>>>>>>>>>> Sounds like you're hitting the server's grace period=
=2E
>>>>>>>>>>>>>>>>> I thought he was describing a situation where the ser=
ver the server
>>>>>>>>>>>>>>>>> is completely gone and isn't coming back, and wonderi=
ng how to make
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --b.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscr=
ibe
>>>>>>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordo=
mo-info.html
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Chuck Lever
>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>



2009-08-27 08:54:11

by Ian Kent

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Ian Kent wrote:
> Carlos Andr=E9 wrote:
>> Hi Ian,
>>
>> Thanks for patch and sorry for delay (i'm expecting receive u reply on
>> bug track, not here) :)
>>
>> But, this patch doesnt worked to me like expected... :(
>>
>>
>> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
>> and later changed "10" to "2" with same results...
>> (always restarting service, of course :)
>>
>> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but i got
>> same results again.
>>
>> Or i'm doing something wrong?
>>
>>
>> [root@KSTATION areas]# automount -V
>>
>> Linux automount version 5.0.1-0.rc2.131.bz517349.1
>> [...]
>>
>> [root@KSTATION areas]# time ls -la testdown
>> ls: testedown: No such file or directory
>>
>> real 3m9.006s
>> user 0m0.002s
>> sys 0m0.000s
> =

> OK, that isn't behaving the way I expect, I'll have a look.
> =

>>
>> LOGGING:
>> -----------------------------------------
>> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount: mount(nfs):
>> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/testdown
>> /misc/areas/testdown
>> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token =3D 91
>> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/areas/t=
estdown
>> -----------------------------------------

Having a look at this I suspect the reason it doesn't work as expected
is the waitpid(2) we do after sending the TERM signal to the mount
process (which we have to do) is not returning. This is likely because
the mount process isn't giving up in a shorter time as it used to. We
could send a KILL signal to the mount process but that does seem to
cause problems later on since there are still outstanding RPC requests.

I suspect that the early termination of blocked umount request will also
now be broken now.

Not sure what to do next here.
Anyone want to volunteer some indepth detail on kernel RPC request
termination on the issuing process receiving a TERM signal?

>>
>>
>>
>>
>>
>> 2009/8/17 Ian Kent <[email protected]>:
>>> On Thu, 2009-08-13 at 12:18 -0300, Carlos Andr=E9 wrote:
>>>> Filled bug report:
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=3D517349
>>> Hi Carlos,
>>>
>>> I have a patched source rpm to add a mount wait parameter to autofs
>>> located at:
>>> http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1
>>>
>>> Could you build it and see if it works.
>>> I haven't tested it at all but it is fairly straight forward.
>>> It is still unclear if this is the right way to do this and what the
>>> consequences are in sending a term signal to mount. This mount request
>>> will likely be followed by other requests for the same mount causing an
>>> accumulation of mount(8) processes waiting for RPC timeouts before they
>>> can answer the TERM signal.
>>>
>>> Anyway, for information the patch included in the source rpm above is:
>>>
>>> autofs-5.0.4 - add mount wait parameter
>>>
>>> From: Ian Kent <[email protected]>
>>>
>>> Often delays when trying to mount from a server that is not reponding
>>> for some reason are undesirable. To try and prevent these delays we
>>> provide a configuration setting to limit the time that we wait for
>>> our spawned mount(8) process to complete before sending it a SIGTERM
>>> signal. This patch adds a configuration parameter to allow us to
>>> request we limit the time we wait for mount(8) to complete before
>>> send it a TERM signal.
>>> ---
>>>
>>> daemon/spawn.c | 3 ++-
>>> include/defaults.h | 2 ++
>>> lib/defaults.c | 13 +++++++++++++
>>> man/auto.master.5.in | 7 +++++++
>>> redhat/autofs.sysconfig.in | 9 +++++++++
>>> samples/autofs.conf.default.in | 9 +++++++++
>>> 6 files changed, 42 insertions(+), 1 deletion(-)
>>>
>>>
>>> --- autofs-5.0.1.orig/daemon/spawn.c
>>> +++ autofs-5.0.1/daemon/spawn.c
>>> @@ -312,6 +312,7 @@ int spawn_mount(unsigned logopt, ...)
>>> unsigned int options;
>>> unsigned int retries =3D MTAB_LOCK_RETRIES;
>>> int update_mtab =3D 1, ret, printed =3D 0;
>>> + unsigned int wait =3D defaults_get_mount_wait();
>>> char buf[PATH_MAX];
>>>
>>> /* If we use mount locking we can't validate the location */
>>> @@ -353,7 +354,7 @@ int spawn_mount(unsigned logopt, ...)
>>> va_end(arg);
>>>
>>> while (retries--) {
>>> - ret =3D do_spawn(logopt, -1, options, prog, (const char=
**) argv);
>>> + ret =3D do_spawn(logopt, wait, options, prog, (const ch=
ar **) argv);
>>> if (ret & MTAB_NOTUPDATED) {
>>> struct timespec tm =3D {3, 0};
>>>
>>> --- autofs-5.0.1.orig/include/defaults.h
>>> +++ autofs-5.0.1/include/defaults.h
>>> @@ -24,6 +24,7 @@
>>>
>>> #define DEFAULT_TIMEOUT 600
>>> #define DEFAULT_NEGATIVE_TIMEOUT 60
>>> +#define DEFAULT_MOUNT_WAIT -1
>>> #define DEFAULT_UMOUNT_WAIT 12
>>> #define DEFAULT_BROWSE_MODE 1
>>> #define DEFAULT_LOGGING 0
>>> @@ -62,6 +63,7 @@ struct ldap_schema *defaults_get_schema(
>>> struct ldap_searchdn *defaults_get_searchdns(void);
>>> void defaults_free_searchdns(struct ldap_searchdn *);
>>> unsigned int defaults_get_append_options(void);
>>> +unsigned int defaults_get_mount_wait(void);
>>> unsigned int defaults_get_umount_wait(void);
>>> const char *defaults_get_auth_conf_file(void);
>>> unsigned int defaults_get_map_hash_table_size(void);
>>> --- autofs-5.0.1.orig/lib/defaults.c
>>> +++ autofs-5.0.1/lib/defaults.c
>>> @@ -45,6 +45,7 @@
>>> #define ENV_NAME_VALUE_ATTR "VALUE_ATTRIBUTE"
>>>
>>> #define ENV_APPEND_OPTIONS "APPEND_OPTIONS"
>>> +#define ENV_MOUNT_WAIT "MOUNT_WAIT"
>>> #define ENV_UMOUNT_WAIT "UMOUNT_WAIT"
>>> #define ENV_AUTH_CONF_FILE "AUTH_CONF_FILE"
>>>
>>> @@ -323,6 +324,7 @@ unsigned int defaults_read_config(unsign
>>> check_set_config_value(key, ENV_NAME_ENTRY_ATTR, val=
ue, to_syslog) ||
>>> check_set_config_value(key, ENV_NAME_VALUE_ATTR, val=
ue, to_syslog) ||
>>> check_set_config_value(key, ENV_APPEND_OPTIONS, valu=
e, to_syslog) ||
>>> + check_set_config_value(key, ENV_MOUNT_WAIT, value, =
to_syslog) ||
>>> check_set_config_value(key, ENV_UMOUNT_WAIT, value, =
to_syslog) ||
>>> check_set_config_value(key, ENV_AUTH_CONF_FILE, valu=
e, to_syslog) ||
>>> check_set_config_value(key, ENV_MAP_HASH_TABLE_SIZE,=
value, to_syslog))
>>> @@ -652,6 +654,17 @@ unsigned int defaults_get_append_options
>>> return res;
>>> }
>>>
>>> +unsigned int defaults_get_mount_wait(void)
>>> +{
>>> + long wait;
>>> +
>>> + wait =3D get_env_number(ENV_MOUNT_WAIT);
>>> + if (wait < 0)
>>> + wait =3D DEFAULT_MOUNT_WAIT;
>>> +
>>> + return (unsigned int) wait;
>>> +}
>>> +
>>> unsigned int defaults_get_umount_wait(void)
>>> {
>>> long wait;
>>> --- autofs-5.0.1.orig/man/auto.master.5.in
>>> +++ autofs-5.0.1/man/auto.master.5.in
>>> @@ -175,6 +175,13 @@ Set the default timeout for caching fail
>>> 60). If the equivalent command line option is given it will override t=
his
>>> setting.
>>> .TP
>>> +.B MOUNT_WAIT
>>> +Set the default time to wait for a response from a spawned mount(8)
>>> +before sending it a SIGTERM. Note that we still need to wait for the
>>> +RPC layer to timeout before the sub-process exits so this isn't ideal
>>> +but it is the best we can do. The default is to wait until mount(8)
>>> +returns without intervention.
>>> +.TP
>>> .B UMOUNT_WAIT
>>> Set the default time to wait for a response from a spawned umount(8)
>>> before sending it a SIGTERM. Note that we still need to wait for the
>>> --- autofs-5.0.1.orig/redhat/autofs.sysconfig.in
>>> +++ autofs-5.0.1/redhat/autofs.sysconfig.in
>>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>>> #
>>> #NEGATIVE_TIMEOUT=3D60
>>> #
>>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>>> +# Setting this timeout can cause problems when
>>> +# mount would otherwise wait for a server that
>>> +# is temporarily unavailable, such as when it's
>>> +# restarting. The defailt of waiting for mount(8)
>>> +# usually results in a wait of around 3 minutes.
>>> +#
>>> +#MOUNT_WAIT=3D-1
>>> +#
>>> # UMOUNT_WAIT - time to wait for a response from umount(8).
>>> #
>>> #UMOUNT_WAIT=3D12
>>> --- autofs-5.0.1.orig/samples/autofs.conf.default.in
>>> +++ autofs-5.0.1/samples/autofs.conf.default.in
>>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>>> #
>>> #NEGATIVE_TIMEOUT=3D60
>>> #
>>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>>> +# Setting this timeout can cause problems when
>>> +# mount would otherwise wait for a server that
>>> +# is temporarily unavailable, such as when it's
>>> +# restarting. The defailt of waiting for mount(8)
>>> +# usually results in a wait of around 3 minutes.
>>> +#
>>> +#MOUNT_WAIT=3D-1
>>> +#
>>> # UMOUNT_WAIT - time to wait for a response from umount(8).
>>> #
>>> #UMOUNT_WAIT=3D12
>>>
>>>
>>>> Thanks!
>>>>
>>>> 2009/8/13 Carlos Andr=E9 <[email protected]>:
>>>>> 2009/8/13 Ian Kent <[email protected]>:
>>>>>> Carlos Andr=E9 wrote:
>>>>>>> Today (2009-08-12) I'm using:
>>>>>>> kernel-2.6.18-128.2.1.el5
>>>>>>> autofs-5.0.1-0.rc2.102.el5_3.1
>>>>>> Thanks,
>>>>>>
>>>>>> My mistake, the wait time I was referring to is used for umounts dur=
ing
>>>>>> expires and is present in rev rc2.102.
>>>>>>
>>>>>> It shouldn't be hard to add this for mount as well.
>>>>>> Would you like me to put something together?
>>>>> Sure! that 'll help me a lot (and for sure another ppl) :) Thanks :)
>>>>>
>>>>>> Probably would be good to test something out to see if we can make a
>>>>>> difference with the killing mount after some configured timeout but,=
if
>>>>>> we make progress, probably the best way to deal with it is for you to
>>>>>> log a bug against rhel-5 so I can get it committed to the rhel packa=
ge.
>>>>>> The possible issue is that I'm not sure if the RPC subsystem in the
>>>>>> above rhel kernel will respond well to process death with potential
>>>>>> outstanding requests. But we'll see.
>>>>> Ok, on my way :)
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>>>> Look my last test:
>>>>>>> --------------------------------------------------------------
>>>>>>> [root@KSTATION areas]# time ls testdown
>>>>>>> ls: testdown: No such file or directory
>>>>>>>
>>>>>>> real 3m9.025s
>>>>>>> user 0m0.000s
>>>>>>> sys 0m0.002s
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun):
>>>>>>> mounting root /misc/areas, mountpoint testdown, what
>>>>>>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>>>>>> root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdown,
>>>>>>> fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>>>>>> nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=3D=
0, ro=3D0
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>>>>>> calling mkdir_path /misc/areas/testdown
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>>>>>> calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 path =
/misc
>>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc =
=3D
>>>>>>> 3078093712 path /misc
>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect: 2
>>>>>>> submounts remaining in /misc
>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got thid
>>>>>>> 3078093712 path /misc stat 3
>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigchld:
>>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready(): st=
ate
>>>>>>> =3D 2 path /misc
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 path =
/misc
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc =
=3D
>>>>>>> 3078093712 path /misc
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect: 2
>>>>>>> submounts remaining in /misc
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got thid
>>>>>>> 3078093712 path /misc stat 3
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigchld:
>>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready(): st=
ate
>>>>>>> =3D 2 path /misc
>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NFS
>>>>>>> server '1.2.3.4' failed: timed out (giving up).
>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: mount
>>>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D 17
>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc/ar=
eas/testdown
>>>>>>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 path =
/misc
>>>>>>> --------------------------------------------------------------
>>>>>>>
>>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>>> Carlos Andr=E9 wrote:
>>>>>>>>> Hi Ian,
>>>>>>>>> I'm getting crazy trying put "retry=3D" to work on mount... this =
option
>>>>>>>>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/krb=
5i/krb5p)
>>>>>>>>> like you can see on my previous emails...
>>>>>>>> Right, my mistake for not looking closely enough at post.
>>>>>>>>
>>>>>>>> Maybe this is related to the same sort of problem we had with moun=
t in
>>>>>>>> the past, before the options parsing went into the kernel, where o=
ther
>>>>>>>> services, like portmapper (or rpcbind), were being done with diffe=
rent
>>>>>>>> timeout parameters before the RPC calls for mounting. That's just =
an
>>>>>>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>>>>>>>
>>>>>>>> But what version of autofs and kernel did you say you were using?
>>>>>>>>
>>>>>>>>> I appreciate any help.
>>>>>>>>>
>>>>>>>>> Carlos.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>>>>> Chuck Lever wrote:
>>>>>>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>>>>>>>>> This long timeout is good if workstation need mount a critical
>>>>>>>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>>>>>>>> But in my case, using this loooong timeout doesnt make any sen=
se,
>>>>>>>>>>>> since autofs retry mount directory on-access. This in fact giv=
es me
>>>>>>>>>>>> alot of headaches, coz user login 'll just hangs if one server=
goes
>>>>>>>>>>>> down for any reason, and will again hangs if user try access d=
irectory
>>>>>>>>>>>> pointing to a NFS down server...
>>>>>>>>>>> "retry=3D0" means the mount command will fail as soon as the fi=
rst
>>>>>>>>>>> mount(2) system call fails. When you set SYN retries to 1, thi=
s means
>>>>>>>>>>> after 9 seconds, the connect fails, and that causes the mount(2=
) system
>>>>>>>>>>> call to fail.
>>>>>>>>>>>
>>>>>>>>>>> Recent conversations with Ian suggested that a long timeout was=
desired
>>>>>>>>>>> for automounter as well as other cases. Ian, is there somethin=
g else we
>>>>>>>>>>> need to consider to determine the correct retry timeout for NFS=
/TCP
>>>>>>>>>>> mount points handled via automounter? How should mount.nfs wai=
t so we
>>>>>>>>>>> don't make other use cases worse? (Looks like most of the hist=
ory is
>>>>>>>>>>> intact below).
>>>>>>>>>> Of course we know that autofs is entirely at the mercy of mount(=
8) (and
>>>>>>>>>> mount.nfs in particular). This has always been a difficult situa=
tion for
>>>>>>>>>> the automounter because interactive mount invocations should wai=
t. But I
>>>>>>>>>> believe automount mounts should always time out quickly, but tha=
t leads
>>>>>>>>>> to its own set of problems, especially when home directories are=
concerned.
>>>>>>>>>>
>>>>>>>>>> I think adding "retry=3D0" is the right thing to do myself but I=
'm not
>>>>>>>>>> certain that will work as we expect. I'll have to do some experi=
mentation.
>>>>>>>>>>
>>>>>>>>>>> How long do you think is appropriate for the automounter to wai=
t if the
>>>>>>>>>>> server is down, in your case, Carlos?
>>>>>>>>>>>
>>>>>>>>>>>> Am losing something or there have was something weirdo...!?
>>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries=
[DEFAULT]
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>>> user 0m0.002s
>>>>>>>>>>>> sys 0m0.001s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 3m9.001s
>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>> sys 0m0.003s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 3m9.001s
>>>>>>>>>>>> user 0m0.002s
>>>>>>>>>>>> sys 0m0.001s
>>>>>>>>>>>>
>>>>>>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries=
[ 5 to 1 ]
>>>>>>>>>>>>
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retryi=
ng). [x 6]
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 1m3.002s
>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retryi=
ng). [x 13]
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 2m6.000s
>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 0m9.003s
>>>>>>>>>>>> user 0m0.001s
>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retryi=
ng). [x 13]
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 2m6.001s
>>>>>>>>>>>> user 0m0.001s
>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]#
>>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1.=
.. and
>>>>>>>>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>>>>>>>>
>>>>>>>>>>>> *sigh*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_r=
etries to
>>>>>>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>>>>>>>> Right. Normally the RPC client calls the kernel's socket con=
nect
>>>>>>>>>>>>> function,
>>>>>>>>>>>>> which does 6 SYN retries. That one call usually takes longer=
than
>>>>>>>>>>>>> the RPC
>>>>>>>>>>>>> client's connect timeout, so it only makes one connect call, =
and then
>>>>>>>>>>>>> fails.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reducing the number of SYN retries per connect attempt causes=
the RPC
>>>>>>>>>>>>> client
>>>>>>>>>>>>> to retry the connect call until its connect timeout expires. =
Each
>>>>>>>>>>>>> connect
>>>>>>>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 =
-o
>>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (givi=
ng up).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 =
-o
>>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp ("retry=3D1" =3D no change)
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (givi=
ng up).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> real 2m6.004s
>>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>>> sys 0m0.004s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And u're right about expo retries... with tcpdump i've moni=
tored
>>>>>>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs =
on port
>>>>>>>>>>>>>>> 2049...
>>>>>>>>>>>>>>> I tried use "retry=3D1" option on mount without any change.=
.. I dont
>>>>>>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>>>>>> Bruce, no... you're right. I'm describing a situation wh=
ere my
>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than=
3 minutes
>>>>>>>>>>>>>>>>> and 9 seconds...
>>>>>>>>>>>>>>>> The 189 second timeout is likely how long it takes the ker=
nel to
>>>>>>>>>>>>>>>> give up
>>>>>>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN attemp=
ts with
>>>>>>>>>>>>>>>> exponential retries, or something like that). For stock C=
entOS
>>>>>>>>>>>>>>>> 5.3, I
>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts =
-- the
>>>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no preced=
ing rpcbind
>>>>>>>>>>>>>>>> request.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Carlos, let us know if you have replaced any NFS-related C=
entOS
>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy w=
rote:
>>>>>>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candrecn@=
gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server t=
o work with
>>>>>>>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes d=
own i get a
>>>>>>>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon p=
rocess, if
>>>>>>>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout=
(if server
>>>>>>>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinations,=
there my
>>>>>>>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10=
) using
>>>>>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4=
-o
>>>>>>>>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=
=3Dtcp OR
>>>>>>>>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show=
error
>>>>>>>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up=
))
>>>>>>>>>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>>>>>>>>>> I thought he was describing a situation where the server=
the server
>>>>>>>>>>>>>>>>>> is completely gone and isn't coming back, and wondering =
how to make
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --b.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-=
info.html
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Chuck Lever
>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
> =

> =

2009-08-27 14:38:15

by Chuck Lever

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug 27, 2009, at 4:54 AM, Ian Kent wrote:
> Ian Kent wrote:
>> Carlos Andr=E9 wrote:
>>> Hi Ian,
>>>
>>> Thanks for patch and sorry for delay (i'm expecting receive u =

>>> reply on
>>> bug track, not here) :)
>>>
>>> But, this patch doesnt worked to me like expected... :(
>>>
>>>
>>> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
>>> and later changed "10" to "2" with same results...
>>> (always restarting service, of course :)
>>>
>>> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but i got
>>> same results again.
>>>
>>> Or i'm doing something wrong?
>>>
>>>
>>> [root@KSTATION areas]# automount -V
>>>
>>> Linux automount version 5.0.1-0.rc2.131.bz517349.1
>>> [...]
>>>
>>> [root@KSTATION areas]# time ls -la testdown
>>> ls: testedown: No such file or directory
>>>
>>> real 3m9.006s
>>> user 0m0.002s
>>> sys 0m0.000s
>>
>> OK, that isn't behaving the way I expect, I'll have a look.
>>
>>>
>>> LOGGING:
>>> -----------------------------------------
>>> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount: mount(nfs):
>>> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/testdown
>>> /misc/areas/testdown
>>> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token =

>>> =3D 91
>>> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/ =

>>> areas/testdown
>>> -----------------------------------------
>
> Having a look at this I suspect the reason it doesn't work as expected
> is the waitpid(2) we do after sending the TERM signal to the mount
> process (which we have to do) is not returning. This is likely because
> the mount process isn't giving up in a shorter time as it used to.

You're thinking maybe mount(2) should be as interruptible as the =

socket calls that the mount command used to do? That might be =

reasonable, and I can take a look at that.

In the kernel, if the rpcbind for the MNT request is async, that would =

be done by rpciod. That's a different process, so the signal wouldn't =

have any effect on the mount. I have a patch that converts the MNT =

client to use rpcb_getport_sync() which might help in this case.

> We could send a KILL signal to the mount process but that does seem to
> cause problems later on since there are still outstanding RPC =

> requests.
>
> I suspect that the early termination of blocked umount request will =

> also
> now be broken now.

The network part of umount.nfs is still done in user space, just like =

it used to be. Worth checking, but I can't see that being a problem.

> Not sure what to do next here.
> Anyone want to volunteer some indepth detail on kernel RPC request
> termination on the issuing process receiving a TERM signal?
>
>>> 2009/8/17 Ian Kent <[email protected]>:
>>>> On Thu, 2009-08-13 at 12:18 -0300, Carlos Andr=E9 wrote:
>>>>> Filled bug report:
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=3D517349
>>>> Hi Carlos,
>>>>
>>>> I have a patched source rpm to add a mount wait parameter to autofs
>>>> located at:
>>>> http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1
>>>>
>>>> Could you build it and see if it works.
>>>> I haven't tested it at all but it is fairly straight forward.
>>>> It is still unclear if this is the right way to do this and what =

>>>> the
>>>> consequences are in sending a term signal to mount. This mount =

>>>> request
>>>> will likely be followed by other requests for the same mount =

>>>> causing an
>>>> accumulation of mount(8) processes waiting for RPC timeouts =

>>>> before they
>>>> can answer the TERM signal.
>>>>
>>>> Anyway, for information the patch included in the source rpm =

>>>> above is:
>>>>
>>>> autofs-5.0.4 - add mount wait parameter
>>>>
>>>> From: Ian Kent <[email protected]>
>>>>
>>>> Often delays when trying to mount from a server that is not =

>>>> reponding
>>>> for some reason are undesirable. To try and prevent these delays we
>>>> provide a configuration setting to limit the time that we wait for
>>>> our spawned mount(8) process to complete before sending it a =

>>>> SIGTERM
>>>> signal. This patch adds a configuration parameter to allow us to
>>>> request we limit the time we wait for mount(8) to complete before
>>>> send it a TERM signal.
>>>> ---
>>>>
>>>> daemon/spawn.c | 3 ++-
>>>> include/defaults.h | 2 ++
>>>> lib/defaults.c | 13 +++++++++++++
>>>> man/auto.master.5.in | 7 +++++++
>>>> redhat/autofs.sysconfig.in | 9 +++++++++
>>>> samples/autofs.conf.default.in | 9 +++++++++
>>>> 6 files changed, 42 insertions(+), 1 deletion(-)
>>>>
>>>>
>>>> --- autofs-5.0.1.orig/daemon/spawn.c
>>>> +++ autofs-5.0.1/daemon/spawn.c
>>>> @@ -312,6 +312,7 @@ int spawn_mount(unsigned logopt, ...)
>>>> unsigned int options;
>>>> unsigned int retries =3D MTAB_LOCK_RETRIES;
>>>> int update_mtab =3D 1, ret, printed =3D 0;
>>>> + unsigned int wait =3D defaults_get_mount_wait();
>>>> char buf[PATH_MAX];
>>>>
>>>> /* If we use mount locking we can't validate the location */
>>>> @@ -353,7 +354,7 @@ int spawn_mount(unsigned logopt, ...)
>>>> va_end(arg);
>>>>
>>>> while (retries--) {
>>>> - ret =3D do_spawn(logopt, -1, options, prog, (const =

>>>> char **) argv);
>>>> + ret =3D do_spawn(logopt, wait, options, prog, =

>>>> (const char **) argv);
>>>> if (ret & MTAB_NOTUPDATED) {
>>>> struct timespec tm =3D {3, 0};
>>>>
>>>> --- autofs-5.0.1.orig/include/defaults.h
>>>> +++ autofs-5.0.1/include/defaults.h
>>>> @@ -24,6 +24,7 @@
>>>>
>>>> #define DEFAULT_TIMEOUT 600
>>>> #define DEFAULT_NEGATIVE_TIMEOUT 60
>>>> +#define DEFAULT_MOUNT_WAIT -1
>>>> #define DEFAULT_UMOUNT_WAIT 12
>>>> #define DEFAULT_BROWSE_MODE 1
>>>> #define DEFAULT_LOGGING 0
>>>> @@ -62,6 +63,7 @@ struct ldap_schema *defaults_get_schema(
>>>> struct ldap_searchdn *defaults_get_searchdns(void);
>>>> void defaults_free_searchdns(struct ldap_searchdn *);
>>>> unsigned int defaults_get_append_options(void);
>>>> +unsigned int defaults_get_mount_wait(void);
>>>> unsigned int defaults_get_umount_wait(void);
>>>> const char *defaults_get_auth_conf_file(void);
>>>> unsigned int defaults_get_map_hash_table_size(void);
>>>> --- autofs-5.0.1.orig/lib/defaults.c
>>>> +++ autofs-5.0.1/lib/defaults.c
>>>> @@ -45,6 +45,7 @@
>>>> #define ENV_NAME_VALUE_ATTR "VALUE_ATTRIBUTE"
>>>>
>>>> #define ENV_APPEND_OPTIONS "APPEND_OPTIONS"
>>>> +#define ENV_MOUNT_WAIT "MOUNT_WAIT"
>>>> #define ENV_UMOUNT_WAIT "UMOUNT_WAIT"
>>>> #define ENV_AUTH_CONF_FILE "AUTH_CONF_FILE"
>>>>
>>>> @@ -323,6 +324,7 @@ unsigned int defaults_read_config(unsign
>>>> check_set_config_value(key, =

>>>> ENV_NAME_ENTRY_ATTR, value, to_syslog) ||
>>>> check_set_config_value(key, =

>>>> ENV_NAME_VALUE_ATTR, value, to_syslog) ||
>>>> check_set_config_value(key, ENV_APPEND_OPTIONS, =

>>>> value, to_syslog) ||
>>>> + check_set_config_value(key, ENV_MOUNT_WAIT, =

>>>> value, to_syslog) ||
>>>> check_set_config_value(key, ENV_UMOUNT_WAIT, =

>>>> value, to_syslog) ||
>>>> check_set_config_value(key, ENV_AUTH_CONF_FILE, =

>>>> value, to_syslog) ||
>>>> check_set_config_value(key, =

>>>> ENV_MAP_HASH_TABLE_SIZE, value, to_syslog))
>>>> @@ -652,6 +654,17 @@ unsigned int defaults_get_append_options
>>>> return res;
>>>> }
>>>>
>>>> +unsigned int defaults_get_mount_wait(void)
>>>> +{
>>>> + long wait;
>>>> +
>>>> + wait =3D get_env_number(ENV_MOUNT_WAIT);
>>>> + if (wait < 0)
>>>> + wait =3D DEFAULT_MOUNT_WAIT;
>>>> +
>>>> + return (unsigned int) wait;
>>>> +}
>>>> +
>>>> unsigned int defaults_get_umount_wait(void)
>>>> {
>>>> long wait;
>>>> --- autofs-5.0.1.orig/man/auto.master.5.in
>>>> +++ autofs-5.0.1/man/auto.master.5.in
>>>> @@ -175,6 +175,13 @@ Set the default timeout for caching fail
>>>> 60). If the equivalent command line option is given it will =

>>>> override this
>>>> setting.
>>>> .TP
>>>> +.B MOUNT_WAIT
>>>> +Set the default time to wait for a response from a spawned =

>>>> mount(8)
>>>> +before sending it a SIGTERM. Note that we still need to wait for =

>>>> the
>>>> +RPC layer to timeout before the sub-process exits so this isn't =

>>>> ideal
>>>> +but it is the best we can do. The default is to wait until =

>>>> mount(8)
>>>> +returns without intervention.
>>>> +.TP
>>>> .B UMOUNT_WAIT
>>>> Set the default time to wait for a response from a spawned =

>>>> umount(8)
>>>> before sending it a SIGTERM. Note that we still need to wait for =

>>>> the
>>>> --- autofs-5.0.1.orig/redhat/autofs.sysconfig.in
>>>> +++ autofs-5.0.1/redhat/autofs.sysconfig.in
>>>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>>>> #
>>>> #NEGATIVE_TIMEOUT=3D60
>>>> #
>>>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>>>> +# Setting this timeout can cause problems when
>>>> +# mount would otherwise wait for a server that
>>>> +# is temporarily unavailable, such as when it's
>>>> +# restarting. The defailt of waiting for mount(8)
>>>> +# usually results in a wait of around 3 minutes.
>>>> +#
>>>> +#MOUNT_WAIT=3D-1
>>>> +#
>>>> # UMOUNT_WAIT - time to wait for a response from umount(8).
>>>> #
>>>> #UMOUNT_WAIT=3D12
>>>> --- autofs-5.0.1.orig/samples/autofs.conf.default.in
>>>> +++ autofs-5.0.1/samples/autofs.conf.default.in
>>>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>>>> #
>>>> #NEGATIVE_TIMEOUT=3D60
>>>> #
>>>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>>>> +# Setting this timeout can cause problems when
>>>> +# mount would otherwise wait for a server that
>>>> +# is temporarily unavailable, such as when it's
>>>> +# restarting. The defailt of waiting for mount(8)
>>>> +# usually results in a wait of around 3 minutes.
>>>> +#
>>>> +#MOUNT_WAIT=3D-1
>>>> +#
>>>> # UMOUNT_WAIT - time to wait for a response from umount(8).
>>>> #
>>>> #UMOUNT_WAIT=3D12
>>>>
>>>>
>>>>> Thanks!
>>>>>
>>>>> 2009/8/13 Carlos Andr=E9 <[email protected]>:
>>>>>> 2009/8/13 Ian Kent <[email protected]>:
>>>>>>> Carlos Andr=E9 wrote:
>>>>>>>> Today (2009-08-12) I'm using:
>>>>>>>> kernel-2.6.18-128.2.1.el5
>>>>>>>> autofs-5.0.1-0.rc2.102.el5_3.1
>>>>>>> Thanks,
>>>>>>>
>>>>>>> My mistake, the wait time I was referring to is used for =

>>>>>>> umounts during
>>>>>>> expires and is present in rev rc2.102.
>>>>>>>
>>>>>>> It shouldn't be hard to add this for mount as well.
>>>>>>> Would you like me to put something together?
>>>>>> Sure! that 'll help me a lot (and for sure another ppl) :) =

>>>>>> Thanks :)
>>>>>>
>>>>>>> Probably would be good to test something out to see if we can =

>>>>>>> make a
>>>>>>> difference with the killing mount after some configured =

>>>>>>> timeout but, if
>>>>>>> we make progress, probably the best way to deal with it is for =

>>>>>>> you to
>>>>>>> log a bug against rhel-5 so I can get it committed to the rhel =

>>>>>>> package.
>>>>>>> The possible issue is that I'm not sure if the RPC subsystem =

>>>>>>> in the
>>>>>>> above rhel kernel will respond well to process death with =

>>>>>>> potential
>>>>>>> outstanding requests. But we'll see.
>>>>>> Ok, on my way :)
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>>>> Look my last test:
>>>>>>>> --------------------------------------------------------------
>>>>>>>> [root@KSTATION areas]# time ls testdown
>>>>>>>> ls: testdown: No such file or directory
>>>>>>>>
>>>>>>>> real 3m9.025s
>>>>>>>> user 0m0.000s
>>>>>>>> sys 0m0.002s
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: =

>>>>>>>> parse(sun):
>>>>>>>> mounting root /misc/areas, mountpoint testdown, what
>>>>>>>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>>>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>>>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>>>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
>>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: =

>>>>>>>> mount(nfs):
>>>>>>>> root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdown,
>>>>>>>> fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: =

>>>>>>>> mount(nfs):
>>>>>>>> nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=
=3D0, =

>>>>>>>> ro=3D0
>>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: =

>>>>>>>> mount(nfs):
>>>>>>>> calling mkdir_path /misc/areas/testdown
>>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: =

>>>>>>>> mount(nfs):
>>>>>>>> calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>>>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 =

>>>>>>>> path /misc
>>>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: =

>>>>>>>> exp_proc =3D
>>>>>>>> 3078093712 path /misc
>>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: =

>>>>>>>> expire_proc_indirect: 2
>>>>>>>> submounts remaining in /misc
>>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: =

>>>>>>>> got thid
>>>>>>>> 3078093712 path /misc stat 3
>>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: =

>>>>>>>> sigchld:
>>>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: =

>>>>>>>> st_ready(): state
>>>>>>>> =3D 2 path /misc
>>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 =

>>>>>>>> path /misc
>>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: =

>>>>>>>> exp_proc =3D
>>>>>>>> 3078093712 path /misc
>>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: =

>>>>>>>> expire_proc_indirect: 2
>>>>>>>> submounts remaining in /misc
>>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: =

>>>>>>>> got thid
>>>>>>>> 3078093712 path /misc stat 3
>>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: =

>>>>>>>> sigchld:
>>>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: =

>>>>>>>> st_ready(): state
>>>>>>>> =3D 2 path /misc
>>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to =

>>>>>>>> NFS
>>>>>>>> server '1.2.3.4' failed: timed out (giving up).
>>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: =

>>>>>>>> mount
>>>>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D =

>>>>>>>> 17
>>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount / =

>>>>>>>> misc/areas/testdown
>>>>>>>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 =

>>>>>>>> path /misc
>>>>>>>> --------------------------------------------------------------
>>>>>>>>
>>>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>>>> Carlos Andr=E9 wrote:
>>>>>>>>>> Hi Ian,
>>>>>>>>>> I'm getting crazy trying put "retry=3D" to work on mount... =

>>>>>>>>>> this option
>>>>>>>>>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/ =

>>>>>>>>>> krb5i/krb5p)
>>>>>>>>>> like you can see on my previous emails...
>>>>>>>>> Right, my mistake for not looking closely enough at post.
>>>>>>>>>
>>>>>>>>> Maybe this is related to the same sort of problem we had =

>>>>>>>>> with mount in
>>>>>>>>> the past, before the options parsing went into the kernel, =

>>>>>>>>> where other
>>>>>>>>> services, like portmapper (or rpcbind), were being done with =

>>>>>>>>> different
>>>>>>>>> timeout parameters before the RPC calls for mounting. That's =

>>>>>>>>> just an
>>>>>>>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>>>>>>>>
>>>>>>>>> But what version of autofs and kernel did you say you were =

>>>>>>>>> using?
>>>>>>>>>
>>>>>>>>>> I appreciate any help.
>>>>>>>>>>
>>>>>>>>>> Carlos.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>>>>>> Chuck Lever wrote:
>>>>>>>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>> This long timeout is good if workstation need mount a =

>>>>>>>>>>>>> critical
>>>>>>>>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>>>>>>>>> But in my case, using this loooong timeout doesnt make =

>>>>>>>>>>>>> any sense,
>>>>>>>>>>>>> since autofs retry mount directory on-access. This in =

>>>>>>>>>>>>> fact gives me
>>>>>>>>>>>>> alot of headaches, coz user login 'll just hangs if one =

>>>>>>>>>>>>> server goes
>>>>>>>>>>>>> down for any reason, and will again hangs if user try =

>>>>>>>>>>>>> access directory
>>>>>>>>>>>>> pointing to a NFS down server...
>>>>>>>>>>>> "retry=3D0" means the mount command will fail as soon as =

>>>>>>>>>>>> the first
>>>>>>>>>>>> mount(2) system call fails. When you set SYN retries to =

>>>>>>>>>>>> 1, this means
>>>>>>>>>>>> after 9 seconds, the connect fails, and that causes the =

>>>>>>>>>>>> mount(2) system
>>>>>>>>>>>> call to fail.
>>>>>>>>>>>>
>>>>>>>>>>>> Recent conversations with Ian suggested that a long =

>>>>>>>>>>>> timeout was desired
>>>>>>>>>>>> for automounter as well as other cases. Ian, is there =

>>>>>>>>>>>> something else we
>>>>>>>>>>>> need to consider to determine the correct retry timeout =

>>>>>>>>>>>> for NFS/TCP
>>>>>>>>>>>> mount points handled via automounter? How should =

>>>>>>>>>>>> mount.nfs wait so we
>>>>>>>>>>>> don't make other use cases worse? (Looks like most of =

>>>>>>>>>>>> the history is
>>>>>>>>>>>> intact below).
>>>>>>>>>>> Of course we know that autofs is entirely at the mercy of =

>>>>>>>>>>> mount(8) (and
>>>>>>>>>>> mount.nfs in particular). This has always been a difficult =

>>>>>>>>>>> situation for
>>>>>>>>>>> the automounter because interactive mount invocations =

>>>>>>>>>>> should wait. But I
>>>>>>>>>>> believe automount mounts should always time out quickly, =

>>>>>>>>>>> but that leads
>>>>>>>>>>> to its own set of problems, especially when home =

>>>>>>>>>>> directories are concerned.
>>>>>>>>>>>
>>>>>>>>>>> I think adding "retry=3D0" is the right thing to do myself =

>>>>>>>>>>> but I'm not
>>>>>>>>>>> certain that will work as we expect. I'll have to do some =

>>>>>>>>>>> experimentation.
>>>>>>>>>>>
>>>>>>>>>>>> How long do you think is appropriate for the automounter =

>>>>>>>>>>>> to wait if the
>>>>>>>>>>>> server is down, in your case, Carlos?
>>>>>>>>>>>>
>>>>>>>>>>>>> Am losing something or there have was something =

>>>>>>>>>>>>> weirdo...!?
>>>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/ =

>>>>>>>>>>>>> tcp_syn_retries [DEFAULT]
>>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t =

>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>>>> user 0m0.002s
>>>>>>>>>>>>> sys 0m0.001s
>>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t =

>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t =

>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 3m9.001s
>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>> sys 0m0.003s
>>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t =

>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 3m9.001s
>>>>>>>>>>>>> user 0m0.002s
>>>>>>>>>>>>> sys 0m0.001s
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/ =

>>>>>>>>>>>>> tcp_syn_retries [ 5 to 1 ]
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t =

>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (retrying). [x 6]
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 1m3.002s
>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t =

>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (retrying). [x 13]
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 2m6.000s
>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t =

>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 0m9.003s
>>>>>>>>>>>>> user 0m0.001s
>>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t =

>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (retrying). [x 13]
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real 2m6.001s
>>>>>>>>>>>>> user 0m0.001s
>>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>> [root@KSTATION ~]#
>>>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 =

>>>>>>>>>>>>> to 1... and
>>>>>>>>>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>>>>>>>>>
>>>>>>>>>>>>> *sigh*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change =

>>>>>>>>>>>>>>> tcp_syn_retries to
>>>>>>>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>>>>>>>>> Right. Normally the RPC client calls the kernel's =

>>>>>>>>>>>>>> socket connect
>>>>>>>>>>>>>> function,
>>>>>>>>>>>>>> which does 6 SYN retries. That one call usually takes =

>>>>>>>>>>>>>> longer than
>>>>>>>>>>>>>> the RPC
>>>>>>>>>>>>>> client's connect timeout, so it only makes one connect =

>>>>>>>>>>>>>> call, and then
>>>>>>>>>>>>>> fails.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Reducing the number of SYN retries per connect attempt =

>>>>>>>>>>>>>> causes the RPC
>>>>>>>>>>>>>> client
>>>>>>>>>>>>>> to retry the connect call until its connect timeout =

>>>>>>>>>>>>>> expires. Each
>>>>>>>>>>>>>> connect
>>>>>>>>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t =

>>>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/ =

>>>>>>>>>>>>>>> tcp_syn_retries
>>>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t =

>>>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp ("retry=3D1" =3D no change)
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (retrying).
>>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out =

>>>>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> real 2m6.004s
>>>>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>>>>> sys 0m0.004s
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And u're right about expo retries... with tcpdump =

>>>>>>>>>>>>>>>> i've monitored
>>>>>>>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 =

>>>>>>>>>>>>>>>> secs on port
>>>>>>>>>>>>>>>> 2049...
>>>>>>>>>>>>>>>> I tried use "retry=3D1" option on mount without any =

>>>>>>>>>>>>>>>> change... I dont
>>>>>>>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>>>>>>> Bruce, no... you're right. I'm describing a =

>>>>>>>>>>>>>>>>>> situation where my
>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs =

>>>>>>>>>>>>>>>>>> max) than 3 minutes
>>>>>>>>>>>>>>>>>> and 9 seconds...
>>>>>>>>>>>>>>>>> The 189 second timeout is likely how long it takes =

>>>>>>>>>>>>>>>>> the kernel to
>>>>>>>>>>>>>>>>> give up
>>>>>>>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN =

>>>>>>>>>>>>>>>>> attempts with
>>>>>>>>>>>>>>>>> exponential retries, or something like that). For =

>>>>>>>>>>>>>>>>> stock CentOS
>>>>>>>>>>>>>>>>> 5.3, I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 =

>>>>>>>>>>>>>>>>> mounts -- the
>>>>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no =

>>>>>>>>>>>>>>>>> preceding rpcbind
>>>>>>>>>>>>>>>>> request.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Carlos, let us know if you have replaced any NFS- =

>>>>>>>>>>>>>>>>> related CentOS
>>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>>> (kernel, nfs-utils) with something you've built =

>>>>>>>>>>>>>>>>> yourself.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny =

>>>>>>>>>>>>>>>>>>> Halevy wrote:
>>>>>>>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candrecn=
@gmail.com =

>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 =

>>>>>>>>>>>>>>>>>>>>>> server to work with
>>>>>>>>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server =

>>>>>>>>>>>>>>>>>>>>>> goes down i get a
>>>>>>>>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 =

>>>>>>>>>>>>>>>>>>>>>> client...
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user =

>>>>>>>>>>>>>>>>>>>>>> logon process, if
>>>>>>>>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to =

>>>>>>>>>>>>>>>>>>>>>> timeout (if server
>>>>>>>>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of =

>>>>>>>>>>>>>>>>>>>>>> combinations, there my
>>>>>>>>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: =

>>>>>>>>>>>>>>>>>>>>>> 172.16.1.10) using
>>>>>>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ - =

>>>>>>>>>>>>>>>>>>>>>> t nfs4 -o
>>>>>>>>>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS =

>>>>>>>>>>>>>>>>>>>>>> (proto=3Dtcp OR
>>>>>>>>>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) =

>>>>>>>>>>>>>>>>>>>>>> until show error
>>>>>>>>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out =

>>>>>>>>>>>>>>>>>>>>>> (giving up))
>>>>>>>>>>>>>>>>>>>> Sounds like you're hitting the server's grace =

>>>>>>>>>>>>>>>>>>>> period.
>>>>>>>>>>>>>>>>>>> I thought he was describing a situation where the =

>>>>>>>>>>>>>>>>>>> server the server
>>>>>>>>>>>>>>>>>>> is completely gone and isn't coming back, and =

>>>>>>>>>>>>>>>>>>> wondering how to make
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --b.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line =

>>>>>>>>>>>>>>>>>> "unsubscribe
>>>>>>>>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo=
-info.html
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>
>>
>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-08-27 14:52:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

T24gVGh1LCAyMDA5LTA4LTI3IGF0IDEwOjM4IC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToKPiBP
biBBdWcgMjcsIDIwMDksIGF0IDQ6NTQgQU0sIElhbiBLZW50IHdyb3RlOgo+ID4gSWFuIEtlbnQg
d3JvdGU6Cj4gPj4gQ2FybG9zIEFuZHLDqSB3cm90ZToKPiA+Pj4gSGkgSWFuLAo+ID4+Pgo+ID4+
PiBUaGFua3MgZm9yIHBhdGNoIGFuZCBzb3JyeSBmb3IgZGVsYXkgKGknbSBleHBlY3RpbmcgcmVj
ZWl2ZSB1ICAKPiA+Pj4gcmVwbHkgb24KPiA+Pj4gYnVnIHRyYWNrLCBub3QgaGVyZSkgOikKPiA+
Pj4KPiA+Pj4gQnV0LCB0aGlzIHBhdGNoIGRvZXNudCB3b3JrZWQgdG8gbWUgbGlrZSBleHBlY3Rl
ZC4uLiAgOigKPiA+Pj4KPiA+Pj4KPiA+Pj4gRmlyc3RseSBJJ3ZlIGNoYW5nZWQgIiNNT1VOVF9X
QUlUPS0xIiB0byAiTU9VTlRfV0FJVD0xMCIKPiA+Pj4gYW5kIGxhdGVyIGNoYW5nZWQgIjEwIiB0
byAiMiIgd2l0aCBzYW1lIHJlc3VsdHMuLi4KPiA+Pj4gKGFsd2F5cyByZXN0YXJ0aW5nIHNlcnZp
Y2UsIG9mIGNvdXJzZSA6KQo+ID4+Pgo+ID4+PiBUaGVuLCB0cmllZCByZW1vdmUgInNlYz1rcmI1
cCIsIGFuZCBsYXRlciByZW1vdmVkICJuZnM0IiBidXQgaSBnb3QKPiA+Pj4gc2FtZSByZXN1bHRz
IGFnYWluLgo+ID4+Pgo+ID4+PiBPciBpJ20gZG9pbmcgc29tZXRoaW5nIHdyb25nPwo+ID4+Pgo+
ID4+Pgo+ID4+PiBbcm9vdEBLU1RBVElPTiBhcmVhc10jIGF1dG9tb3VudCAtVgo+ID4+Pgo+ID4+
PiBMaW51eCBhdXRvbW91bnQgdmVyc2lvbiA1LjAuMS0wLnJjMi4xMzEuYno1MTczNDkuMQo+ID4+
PiBbLi4uXQo+ID4+Pgo+ID4+PiBbcm9vdEBLU1RBVElPTiBhcmVhc10jIHRpbWUgbHMgLWxhIHRl
c3Rkb3duCj4gPj4+IGxzOiB0ZXN0ZWRvd246IE5vIHN1Y2ggZmlsZSBvciBkaXJlY3RvcnkKPiA+
Pj4KPiA+Pj4gcmVhbCAgICAzbTkuMDA2cwo+ID4+PiB1c2VyICAgIDBtMC4wMDJzCj4gPj4+IHN5
cyAgICAgMG0wLjAwMHMKPiA+Pgo+ID4+IE9LLCB0aGF0IGlzbid0IGJlaGF2aW5nIHRoZSB3YXkg
SSBleHBlY3QsIEknbGwgaGF2ZSBhIGxvb2suCj4gPj4KPiA+Pj4KPiA+Pj4gTE9HR0lORzoKPiA+
Pj4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KPiA+Pj4gQXVnIDI0
IDA5OjIzOjUxIEtTVEFUSU9OIGF1dG9tb3VudFsyMDgwM106IG1vdW50X21vdW50OiBtb3VudChu
ZnMpOgo+ID4+PiBjYWxsaW5nIG1vdW50IC10IG5mczQgLXMgLW8gcncsYWNsLHNlYz1rcmI1cCAx
LjIuMy40Oi9hcmVhcy90ZXN0ZG93bgo+ID4+PiAvbWlzYy9hcmVhcy90ZXN0ZG93bgo+ID4+PiBB
dWcgMjQgMDk6Mjc6MDAgS1NUQVRJT04gYXV0b21vdW50WzIwODAzXTogbW91bnQobmZzKTogbmZz
OiBtb3VudAo+ID4+PiBmYWlsdXJlIDEuMi4zLjQ6L2FyZWFzL3Rlc3Rkb3duIG9uIC9taXNjL2Fy
ZWFzL3Rlc3Rkb3duCj4gPj4+IEF1ZyAyNCAwOToyNzowMCBLU1RBVElPTiBhdXRvbW91bnRbMjA4
MDNdOiBpb2N0bF9zZW5kX2ZhaWw6IHRva2VuICAKPiA+Pj4gPSA5MQo+ID4+PiBBdWcgMjQgMDk6
Mjc6MDAgS1NUQVRJT04gYXV0b21vdW50WzIwODAzXTogZmFpbGVkIHRvIG1vdW50IC9taXNjLyAK
PiA+Pj4gYXJlYXMvdGVzdGRvd24KPiA+Pj4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0KPiA+Cj4gPiBIYXZpbmcgYSBsb29rIGF0IHRoaXMgSSBzdXNwZWN0IHRoZSBy
ZWFzb24gaXQgZG9lc24ndCB3b3JrIGFzIGV4cGVjdGVkCj4gPiBpcyB0aGUgd2FpdHBpZCgyKSB3
ZSBkbyBhZnRlciBzZW5kaW5nIHRoZSBURVJNIHNpZ25hbCB0byB0aGUgbW91bnQKPiA+IHByb2Nl
c3MgKHdoaWNoIHdlIGhhdmUgdG8gZG8pIGlzIG5vdCByZXR1cm5pbmcuIFRoaXMgaXMgbGlrZWx5
IGJlY2F1c2UKPiA+IHRoZSBtb3VudCBwcm9jZXNzIGlzbid0IGdpdmluZyB1cCBpbiBhIHNob3J0
ZXIgdGltZSBhcyBpdCB1c2VkIHRvLgo+IAo+IFlvdSdyZSB0aGlua2luZyBtYXliZSBtb3VudCgy
KSBzaG91bGQgYmUgYXMgaW50ZXJydXB0aWJsZSBhcyB0aGUgIAo+IHNvY2tldCBjYWxscyB0aGF0
IHRoZSBtb3VudCBjb21tYW5kIHVzZWQgdG8gZG8/ICBUaGF0IG1pZ2h0IGJlICAKPiByZWFzb25h
YmxlLCBhbmQgSSBjYW4gdGFrZSBhIGxvb2sgYXQgdGhhdC4KCkluIHJlY2VudCBrZXJuZWxzLCBh
bGwgdGhvc2UgUlBDIGNhbGxzIHNob3VsZCBiZSB1c2luZyBUQVNLX0tJTExBQkxFCnNsZWVwIHN0
YXRlcy4gU0lHVEVSTSBzaG91bGQgY2F1c2UgdGhlbSB0byBhYm9ydCwgcHJvdmlkZWQgdGhhdCBz
b21lCnByb2Nlc3MgaXNuJ3QgYmxvY2tpbmcgaXQuCgpQZXJoYXBzIFRBU0tfS0lMTEFCTEUgY291
bGQgYmUgYmFja3BvcnRlZCB0byBSSEVMLTU/Cgo+IEluIHRoZSBrZXJuZWwsIGlmIHRoZSBycGNi
aW5kIGZvciB0aGUgTU5UIHJlcXVlc3QgaXMgYXN5bmMsIHRoYXQgd291bGQgIAo+IGJlIGRvbmUg
YnkgcnBjaW9kLiAgVGhhdCdzIGEgZGlmZmVyZW50IHByb2Nlc3MsIHNvIHRoZSBzaWduYWwgd291
bGRuJ3QgIAo+IGhhdmUgYW55IGVmZmVjdCBvbiB0aGUgbW91bnQuICBJIGhhdmUgYSBwYXRjaCB0
aGF0IGNvbnZlcnRzIHRoZSBNTlQgIAo+IGNsaWVudCB0byB1c2UgcnBjYl9nZXRwb3J0X3N5bmMo
KSB3aGljaCBtaWdodCBoZWxwIGluIHRoaXMgY2FzZS4KClRoZSBjbGllbnQgc2hvdWxkbid0IGJl
IHVzaW5nIHJwY2JpbmQgYXQgYWxsIHdoZW4gZG9pbmcgYSBORlN2NCBtb3VudC4KCkNoZWVycwog
IFRyb25kCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpO
RlN2NCBtYWlsaW5nIGxpc3QKTkZTdjRAbGludXgtbmZzLm9yZwpodHRwOi8vbGludXgtbmZzLm9y
Zy9jZ2ktYmluL21haWxtYW4vbGlzdGluZm8vbmZzdjQ=

2009-08-27 14:54:27

by Chuck Lever

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug 27, 2009, at 10:52 AM, Trond Myklebust wrote:
> On Thu, 2009-08-27 at 10:38 -0400, Chuck Lever wrote:
>> On Aug 27, 2009, at 4:54 AM, Ian Kent wrote:
>>> Ian Kent wrote:
>>>> Carlos Andr=E9 wrote:
>>>>> Hi Ian,
>>>>>
>>>>> Thanks for patch and sorry for delay (i'm expecting receive u
>>>>> reply on
>>>>> bug track, not here) :)
>>>>>
>>>>> But, this patch doesnt worked to me like expected... :(
>>>>>
>>>>>
>>>>> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
>>>>> and later changed "10" to "2" with same results...
>>>>> (always restarting service, of course :)
>>>>>
>>>>> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but i got
>>>>> same results again.
>>>>>
>>>>> Or i'm doing something wrong?
>>>>>
>>>>>
>>>>> [root@KSTATION areas]# automount -V
>>>>>
>>>>> Linux automount version 5.0.1-0.rc2.131.bz517349.1
>>>>> [...]
>>>>>
>>>>> [root@KSTATION areas]# time ls -la testdown
>>>>> ls: testedown: No such file or directory
>>>>>
>>>>> real 3m9.006s
>>>>> user 0m0.002s
>>>>> sys 0m0.000s
>>>>
>>>> OK, that isn't behaving the way I expect, I'll have a look.
>>>>
>>>>>
>>>>> LOGGING:
>>>>> -----------------------------------------
>>>>> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount: =

>>>>> mount(nfs):
>>>>> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/ =

>>>>> testdown
>>>>> /misc/areas/testdown
>>>>> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token
>>>>> =3D 91
>>>>> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/
>>>>> areas/testdown
>>>>> -----------------------------------------
>>>
>>> Having a look at this I suspect the reason it doesn't work as =

>>> expected
>>> is the waitpid(2) we do after sending the TERM signal to the mount
>>> process (which we have to do) is not returning. This is likely =

>>> because
>>> the mount process isn't giving up in a shorter time as it used to.
>>
>> You're thinking maybe mount(2) should be as interruptible as the
>> socket calls that the mount command used to do? That might be
>> reasonable, and I can take a look at that.
>
> In recent kernels, all those RPC calls should be using TASK_KILLABLE
> sleep states. SIGTERM should cause them to abort, provided that some
> process isn't blocking it.
>
> Perhaps TASK_KILLABLE could be backported to RHEL-5?

That's pretty extensive, with hooks in the page cache. I doubt RH =

would go for that.

>> In the kernel, if the rpcbind for the MNT request is async, that =

>> would
>> be done by rpciod. That's a different process, so the signal =

>> wouldn't
>> have any effect on the mount. I have a patch that converts the MNT
>> client to use rpcb_getport_sync() which might help in this case.
>
> The client shouldn't be using rpcbind at all when doing a NFSv4 mount.

Yep, forgot this was NFSv4.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-08-27 15:00:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Thu, 2009-08-27 at 10:54 -0400, Chuck Lever wrote:
> On Aug 27, 2009, at 10:52 AM, Trond Myklebust wrote:
> > On Thu, 2009-08-27 at 10:38 -0400, Chuck Lever wrote:
> >> On Aug 27, 2009, at 4:54 AM, Ian Kent wrote:
> >>> Ian Kent wrote:
> >>>> Carlos Andr=C3=A9 wrote:
> >>>>> Hi Ian,
> >>>>>
> >>>>> Thanks for patch and sorry for delay (i'm expecting receive u
> >>>>> reply on
> >>>>> bug track, not here) :)
> >>>>>
> >>>>> But, this patch doesnt worked to me like expected... :(
> >>>>>
> >>>>>
> >>>>> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
> >>>>> and later changed "10" to "2" with same results...
> >>>>> (always restarting service, of course :)
> >>>>>
> >>>>> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but =
i got
> >>>>> same results again.
> >>>>>
> >>>>> Or i'm doing something wrong?
> >>>>>
> >>>>>
> >>>>> [root@KSTATION areas]# automount -V
> >>>>>
> >>>>> Linux automount version 5.0.1-0.rc2.131.bz517349.1
> >>>>> [...]
> >>>>>
> >>>>> [root@KSTATION areas]# time ls -la testdown
> >>>>> ls: testedown: No such file or directory
> >>>>>
> >>>>> real 3m9.006s
> >>>>> user 0m0.002s
> >>>>> sys 0m0.000s
> >>>>
> >>>> OK, that isn't behaving the way I expect, I'll have a look.
> >>>>
> >>>>>
> >>>>> LOGGING:
> >>>>> -----------------------------------------
> >>>>> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount: =20
> >>>>> mount(nfs):
> >>>>> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/=20
> >>>>> testdown
> >>>>> /misc/areas/testdown
> >>>>> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mou=
nt
> >>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
> >>>>> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: tok=
en
> >>>>> =3D 91
> >>>>> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /mis=
c/
> >>>>> areas/testdown
> >>>>> -----------------------------------------
> >>>
> >>> Having a look at this I suspect the reason it doesn't work as =20
> >>> expected
> >>> is the waitpid(2) we do after sending the TERM signal to the moun=
t
> >>> process (which we have to do) is not returning. This is likely =20
> >>> because
> >>> the mount process isn't giving up in a shorter time as it used to=
=2E
> >>
> >> You're thinking maybe mount(2) should be as interruptible as the
> >> socket calls that the mount command used to do? That might be
> >> reasonable, and I can take a look at that.
> >
> > In recent kernels, all those RPC calls should be using TASK_KILLABL=
E
> > sleep states. SIGTERM should cause them to abort, provided that som=
e
> > process isn't blocking it.
> >
> > Perhaps TASK_KILLABLE could be backported to RHEL-5?
>=20
> That's pretty extensive, with hooks in the page cache. I doubt RH =20
> would go for that.

You don't have to add the hooks in the page cache in order to make moun=
t
interruptible. You just need to replace the sigmask-manipulation in
net/sunrpc and fs/nfs (a.k.a. rpc_clnt_sigmask()/rpc_clnt_sigunmask())
with TASK_KILLABLE.

Alternatively, it might suffice to just turn on the 'intr' flag
temporarily while doing the mount path walk, and then switch it to
whatever default the user actually specified afterwards.

Trond


2009-08-27 15:12:22

by Chuck Lever

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug 27, 2009, at 11:00 AM, Trond Myklebust wrote:
> On Thu, 2009-08-27 at 10:54 -0400, Chuck Lever wrote:
>> On Aug 27, 2009, at 10:52 AM, Trond Myklebust wrote:
>>> On Thu, 2009-08-27 at 10:38 -0400, Chuck Lever wrote:
>>>> On Aug 27, 2009, at 4:54 AM, Ian Kent wrote:
>>>>> Ian Kent wrote:
>>>>>> Carlos Andr=E9 wrote:
>>>>>>> Hi Ian,
>>>>>>>
>>>>>>> Thanks for patch and sorry for delay (i'm expecting receive u
>>>>>>> reply on
>>>>>>> bug track, not here) :)
>>>>>>>
>>>>>>> But, this patch doesnt worked to me like expected... :(
>>>>>>>
>>>>>>>
>>>>>>> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
>>>>>>> and later changed "10" to "2" with same results...
>>>>>>> (always restarting service, of course :)
>>>>>>>
>>>>>>> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but i =

>>>>>>> got
>>>>>>> same results again.
>>>>>>>
>>>>>>> Or i'm doing something wrong?
>>>>>>>
>>>>>>>
>>>>>>> [root@KSTATION areas]# automount -V
>>>>>>>
>>>>>>> Linux automount version 5.0.1-0.rc2.131.bz517349.1
>>>>>>> [...]
>>>>>>>
>>>>>>> [root@KSTATION areas]# time ls -la testdown
>>>>>>> ls: testedown: No such file or directory
>>>>>>>
>>>>>>> real 3m9.006s
>>>>>>> user 0m0.002s
>>>>>>> sys 0m0.000s
>>>>>>
>>>>>> OK, that isn't behaving the way I expect, I'll have a look.
>>>>>>
>>>>>>>
>>>>>>> LOGGING:
>>>>>>> -----------------------------------------
>>>>>>> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount:
>>>>>>> mount(nfs):
>>>>>>> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/
>>>>>>> testdown
>>>>>>> /misc/areas/testdown
>>>>>>> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: =

>>>>>>> mount
>>>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>>>> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: =

>>>>>>> token
>>>>>>> =3D 91
>>>>>>> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount / =

>>>>>>> misc/
>>>>>>> areas/testdown
>>>>>>> -----------------------------------------
>>>>>
>>>>> Having a look at this I suspect the reason it doesn't work as
>>>>> expected
>>>>> is the waitpid(2) we do after sending the TERM signal to the mount
>>>>> process (which we have to do) is not returning. This is likely
>>>>> because
>>>>> the mount process isn't giving up in a shorter time as it used to.
>>>>
>>>> You're thinking maybe mount(2) should be as interruptible as the
>>>> socket calls that the mount command used to do? That might be
>>>> reasonable, and I can take a look at that.
>>>
>>> In recent kernels, all those RPC calls should be using TASK_KILLABLE
>>> sleep states. SIGTERM should cause them to abort, provided that some
>>> process isn't blocking it.
>>>
>>> Perhaps TASK_KILLABLE could be backported to RHEL-5?
>>
>> That's pretty extensive, with hooks in the page cache. I doubt RH
>> would go for that.
>
> You don't have to add the hooks in the page cache in order to make =

> mount
> interruptible. You just need to replace the sigmask-manipulation in
> net/sunrpc and fs/nfs (a.k.a. rpc_clnt_sigmask()/rpc_clnt_sigunmask())
> with TASK_KILLABLE.

That sounds like a schlep.

> Alternatively, it might suffice to just turn on the 'intr' flag
> temporarily while doing the mount path walk, and then switch it to
> whatever default the user actually specified afterwards.

That sounds easy, especially for an EL5 kernel. Maybe "soft" too for =

the first few requests?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-08-24 18:07:10

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Ian,
Thanks for Support/Help :)


2009/8/24 Ian Kent <[email protected]>:
> Carlos Andr=E9 wrote:
>> Hi Ian,
>>
>> Thanks for patch and sorry for delay (i'm expecting receive u reply on
>> bug track, not here) :)
>>
>> But, this patch doesnt worked to me like expected... =A0:(
>>
>>
>> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
>> and later changed "10" to "2" with same results...
>> (always restarting service, of course :)
>>
>> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but i got
>> same results again.
>>
>> Or i'm doing something wrong?
>>
>>
>> [root@KSTATION areas]# automount -V
>>
>> Linux automount version 5.0.1-0.rc2.131.bz517349.1
>> [...]
>>
>> [root@KSTATION areas]# time ls -la testdown
>> ls: testedown: No such file or directory
>>
>> real =A0 =A03m9.006s
>> user =A0 =A00m0.002s
>> sys =A0 =A0 0m0.000s
>
> OK, that isn't behaving the way I expect, I'll have a look.
>
>>
>>
>> LOGGING:
>> -----------------------------------------
>> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount: mount(nfs):
>> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/testdown
>> /misc/areas/testdown
>> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token =3D 91
>> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/areas/t=
estdown
>> -----------------------------------------
>>
>>
>>
>>
>>
>> 2009/8/17 Ian Kent <[email protected]>:
>>> On Thu, 2009-08-13 at 12:18 -0300, Carlos Andr=E9 wrote:
>>>> Filled bug report:
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=3D517349
>>> Hi Carlos,
>>>
>>> I have a patched source rpm to add a mount wait parameter to autofs
>>> located at:
>>> http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1
>>>
>>> Could you build it and see if it works.
>>> I haven't tested it at all but it is fairly straight forward.
>>> It is still unclear if this is the right way to do this and what the
>>> consequences are in sending a term signal to mount. This mount request
>>> will likely be followed by other requests for the same mount causing an
>>> accumulation of mount(8) processes waiting for RPC timeouts before they
>>> can answer the TERM signal.
>>>
>>> Anyway, for information the patch included in the source rpm above is:
>>>
>>> autofs-5.0.4 - add mount wait parameter
>>>
>>> From: Ian Kent <[email protected]>
>>>
>>> Often delays when trying to mount from a server that is not reponding
>>> for some reason are undesirable. To try and prevent these delays we
>>> provide a configuration setting to limit the time that we wait for
>>> our spawned mount(8) process to complete before sending it a SIGTERM
>>> signal. This patch adds a configuration parameter to allow us to
>>> request we limit the time we wait for mount(8) to complete before
>>> send it a TERM signal.
>>> ---
>>>
>>> =A0daemon/spawn.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 | =A0 =A03 ++-
>>> =A0include/defaults.h =A0 =A0 =A0 =A0 =A0 =A0 | =A0 =A02 ++
>>> =A0lib/defaults.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 | =A0 13 +++++++++++++
>>> =A0man/auto.master.5.in =A0 =A0 =A0 =A0 =A0 | =A0 =A07 +++++++
>>> =A0redhat/autofs.sysconfig.in =A0 =A0 | =A0 =A09 +++++++++
>>> =A0samples/autofs.conf.default.in | =A0 =A09 +++++++++
>>> =A06 files changed, 42 insertions(+), 1 deletion(-)
>>>
>>>
>>> --- autofs-5.0.1.orig/daemon/spawn.c
>>> +++ autofs-5.0.1/daemon/spawn.c
>>> @@ -312,6 +312,7 @@ int spawn_mount(unsigned logopt, ...)
>>> =A0 =A0 =A0 =A0unsigned int options;
>>> =A0 =A0 =A0 =A0unsigned int retries =3D MTAB_LOCK_RETRIES;
>>> =A0 =A0 =A0 =A0int update_mtab =3D 1, ret, printed =3D 0;
>>> + =A0 =A0 =A0 unsigned int wait =3D defaults_get_mount_wait();
>>> =A0 =A0 =A0 =A0char buf[PATH_MAX];
>>>
>>> =A0 =A0 =A0 =A0/* If we use mount locking we can't validate the locatio=
n */
>>> @@ -353,7 +354,7 @@ int spawn_mount(unsigned logopt, ...)
>>> =A0 =A0 =A0 =A0va_end(arg);
>>>
>>> =A0 =A0 =A0 =A0while (retries--) {
>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D do_spawn(logopt, -1, options, pro=
g, (const char **) argv);
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D do_spawn(logopt, wait, options, p=
rog, (const char **) argv);
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (ret & MTAB_NOTUPDATED) {
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct timespec tm =3D {=
3, 0};
>>>
>>> --- autofs-5.0.1.orig/include/defaults.h
>>> +++ autofs-5.0.1/include/defaults.h
>>> @@ -24,6 +24,7 @@
>>>
>>> =A0#define DEFAULT_TIMEOUT =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0600
>>> =A0#define DEFAULT_NEGATIVE_TIMEOUT =A0 =A0 =A0 60
>>> +#define DEFAULT_MOUNT_WAIT =A0 =A0 =A0 =A0 =A0 =A0 -1
>>> =A0#define DEFAULT_UMOUNT_WAIT =A0 =A0 =A0 =A0 =A0 =A012
>>> =A0#define DEFAULT_BROWSE_MODE =A0 =A0 =A0 =A0 =A0 =A01
>>> =A0#define DEFAULT_LOGGING =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A00
>>> @@ -62,6 +63,7 @@ struct ldap_schema *defaults_get_schema(
>>> =A0struct ldap_searchdn *defaults_get_searchdns(void);
>>> =A0void defaults_free_searchdns(struct ldap_searchdn *);
>>> =A0unsigned int defaults_get_append_options(void);
>>> +unsigned int defaults_get_mount_wait(void);
>>> =A0unsigned int defaults_get_umount_wait(void);
>>> =A0const char *defaults_get_auth_conf_file(void);
>>> =A0unsigned int defaults_get_map_hash_table_size(void);
>>> --- autofs-5.0.1.orig/lib/defaults.c
>>> +++ autofs-5.0.1/lib/defaults.c
>>> @@ -45,6 +45,7 @@
>>> =A0#define ENV_NAME_VALUE_ATTR =A0 =A0 =A0 =A0 =A0 =A0"VALUE_ATTRIBUTE"
>>>
>>> =A0#define ENV_APPEND_OPTIONS =A0 =A0 =A0 =A0 =A0 =A0 "APPEND_OPTIONS"
>>> +#define ENV_MOUNT_WAIT =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "MOUNT_WAIT"
>>> =A0#define ENV_UMOUNT_WAIT =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0"UMOUNT_WAIT"
>>> =A0#define ENV_AUTH_CONF_FILE =A0 =A0 =A0 =A0 =A0 =A0 "AUTH_CONF_FILE"
>>>
>>> @@ -323,6 +324,7 @@ unsigned int defaults_read_config(unsign
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0check_set_config_value(key, ENV_=
NAME_ENTRY_ATTR, value, to_syslog) ||
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0check_set_config_value(key, ENV_=
NAME_VALUE_ATTR, value, to_syslog) ||
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0check_set_config_value(key, ENV_=
APPEND_OPTIONS, value, to_syslog) ||
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 check_set_config_value(key, ENV_M=
OUNT_WAIT, value, to_syslog) ||
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0check_set_config_value(key, ENV_=
UMOUNT_WAIT, value, to_syslog) ||
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0check_set_config_value(key, ENV_=
AUTH_CONF_FILE, value, to_syslog) ||
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0check_set_config_value(key, ENV_=
MAP_HASH_TABLE_SIZE, value, to_syslog))
>>> @@ -652,6 +654,17 @@ unsigned int defaults_get_append_options
>>> =A0 =A0 =A0 =A0return res;
>>> =A0}
>>>
>>> +unsigned int defaults_get_mount_wait(void)
>>> +{
>>> + =A0 =A0 =A0 long wait;
>>> +
>>> + =A0 =A0 =A0 wait =3D get_env_number(ENV_MOUNT_WAIT);
>>> + =A0 =A0 =A0 if (wait < 0)
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 wait =3D DEFAULT_MOUNT_WAIT;
>>> +
>>> + =A0 =A0 =A0 return (unsigned int) wait;
>>> +}
>>> +
>>> =A0unsigned int defaults_get_umount_wait(void)
>>> =A0{
>>> =A0 =A0 =A0 =A0long wait;
>>> --- autofs-5.0.1.orig/man/auto.master.5.in
>>> +++ autofs-5.0.1/man/auto.master.5.in
>>> @@ -175,6 +175,13 @@ Set the default timeout for caching fail
>>> =A060). If the equivalent command line option is given it will override=
this
>>> =A0setting.
>>> =A0.TP
>>> +.B MOUNT_WAIT
>>> +Set the default time to wait for a response from a spawned mount(8)
>>> +before sending it a SIGTERM. Note that we still need to wait for the
>>> +RPC layer to timeout before the sub-process exits so this isn't ideal
>>> +but it is the best we can do. The default is to wait until mount(8)
>>> +returns without intervention.
>>> +.TP
>>> =A0.B UMOUNT_WAIT
>>> =A0Set the default time to wait for a response from a spawned umount(8)
>>> =A0before sending it a SIGTERM. Note that we still need to wait for the
>>> --- autofs-5.0.1.orig/redhat/autofs.sysconfig.in
>>> +++ autofs-5.0.1/redhat/autofs.sysconfig.in
>>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>>> =A0#
>>> =A0#NEGATIVE_TIMEOUT=3D60
>>> =A0#
>>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 Setting this timeout can cause problems when
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 mount would otherwise wait for a server that
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 is temporarily unavailable, such as when it's
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 restarting. The defailt of waiting for mount=
(8)
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 usually results in a wait of around 3 minute=
s.
>>> +#
>>> +#MOUNT_WAIT=3D-1
>>> +#
>>> =A0# UMOUNT_WAIT - time to wait for a response from umount(8).
>>> =A0#
>>> =A0#UMOUNT_WAIT=3D12
>>> --- autofs-5.0.1.orig/samples/autofs.conf.default.in
>>> +++ autofs-5.0.1/samples/autofs.conf.default.in
>>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>>> =A0#
>>> =A0#NEGATIVE_TIMEOUT=3D60
>>> =A0#
>>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 Setting this timeout can cause problems when
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 mount would otherwise wait for a server that
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 is temporarily unavailable, such as when it's
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 restarting. The defailt of waiting for mount=
(8)
>>> +# =A0 =A0 =A0 =A0 =A0 =A0 usually results in a wait of around 3 minute=
s.
>>> +#
>>> +#MOUNT_WAIT=3D-1
>>> +#
>>> =A0# UMOUNT_WAIT - time to wait for a response from umount(8).
>>> =A0#
>>> =A0#UMOUNT_WAIT=3D12
>>>
>>>
>>>> Thanks!
>>>>
>>>> 2009/8/13 Carlos Andr=E9 <[email protected]>:
>>>>> 2009/8/13 Ian Kent <[email protected]>:
>>>>>> Carlos Andr=E9 wrote:
>>>>>>> Today (2009-08-12) I'm using:
>>>>>>> kernel-2.6.18-128.2.1.el5
>>>>>>> autofs-5.0.1-0.rc2.102.el5_3.1
>>>>>> Thanks,
>>>>>>
>>>>>> My mistake, the wait time I was referring to is used for umounts dur=
ing
>>>>>> expires and is present in rev rc2.102.
>>>>>>
>>>>>> It shouldn't be hard to add this for mount as well.
>>>>>> Would you like me to put something together?
>>>>> Sure! that 'll help me a lot (and for sure another ppl) :) Thanks :)
>>>>>
>>>>>> Probably would be good to test something out to see if we can make a
>>>>>> difference with the killing mount after some configured timeout but,=
if
>>>>>> we make progress, probably the best way to deal with it is for you to
>>>>>> log a bug against rhel-5 so I can get it committed to the rhel packa=
ge.
>>>>>> The possible issue is that I'm not sure if the RPC subsystem in the
>>>>>> above rhel kernel will respond well to process death with potential
>>>>>> outstanding requests. But we'll see.
>>>>> Ok, on my way :)
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>>>>
>>>>>>> Look my last test:
>>>>>>> --------------------------------------------------------------
>>>>>>> [root@KSTATION areas]# time ls testdown
>>>>>>> ls: testdown: No such file or directory
>>>>>>>
>>>>>>> real =A0 =A03m9.025s
>>>>>>> user =A0 =A00m0.000s
>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun):
>>>>>>> mounting root /misc/areas, mountpoint testdown, what
>>>>>>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>>>>>> root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdown,
>>>>>>> fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>>>>>> nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=3D=
0, ro=3D0
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>>>>>> calling mkdir_path /misc/areas/testdown
>>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>>>>>> calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 path =
/misc
>>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc =
=3D
>>>>>>> 3078093712 path /misc
>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect: 2
>>>>>>> submounts remaining in /misc
>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got thid
>>>>>>> 3078093712 path /misc stat 3
>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigchld:
>>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready(): st=
ate
>>>>>>> =3D 2 path /misc
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 path =
/misc
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc =
=3D
>>>>>>> 3078093712 path /misc
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect: 2
>>>>>>> submounts remaining in /misc
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got thid
>>>>>>> 3078093712 path /misc stat 3
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigchld:
>>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready(): st=
ate
>>>>>>> =3D 2 path /misc
>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NFS
>>>>>>> server '1.2.3.4' failed: timed out (giving up).
>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: mount
>>>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D 17
>>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc/ar=
eas/testdown
>>>>>>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 path =
/misc
>>>>>>> --------------------------------------------------------------
>>>>>>>
>>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>>> Carlos Andr=E9 wrote:
>>>>>>>>> Hi Ian,
>>>>>>>>> I'm getting crazy trying put "retry=3D" to work on mount... this =
option
>>>>>>>>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/krb=
5i/krb5p)
>>>>>>>>> like you can see on my previous emails...
>>>>>>>> Right, my mistake for not looking closely enough at post.
>>>>>>>>
>>>>>>>> Maybe this is related to the same sort of problem we had with moun=
t in
>>>>>>>> the past, before the options parsing went into the kernel, where o=
ther
>>>>>>>> services, like portmapper (or rpcbind), were being done with diffe=
rent
>>>>>>>> timeout parameters before the RPC calls for mounting. That's just =
an
>>>>>>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>>>>>>>
>>>>>>>> But what version of autofs and kernel did you say you were using?
>>>>>>>>
>>>>>>>>> I appreciate any help.
>>>>>>>>>
>>>>>>>>> Carlos.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>>>>> Chuck Lever wrote:
>>>>>>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>>>>>>>>> This long timeout is good if workstation need mount a critical
>>>>>>>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>>>>>>>> But in my case, using this loooong timeout doesnt make any sen=
se,
>>>>>>>>>>>> since autofs retry mount directory on-access. This in fact giv=
es me
>>>>>>>>>>>> alot of headaches, coz user login 'll just hangs if one server=
goes
>>>>>>>>>>>> down for any reason, and will again hangs if user try access d=
irectory
>>>>>>>>>>>> pointing to a NFS down server...
>>>>>>>>>>> "retry=3D0" means the mount command will fail as soon as the fi=
rst
>>>>>>>>>>> mount(2) system call fails. =A0When you set SYN retries to 1, t=
his means
>>>>>>>>>>> after 9 seconds, the connect fails, and that causes the mount(2=
) system
>>>>>>>>>>> call to fail.
>>>>>>>>>>>
>>>>>>>>>>> Recent conversations with Ian suggested that a long timeout was=
desired
>>>>>>>>>>> for automounter as well as other cases. =A0Ian, is there someth=
ing else we
>>>>>>>>>>> need to consider to determine the correct retry timeout for NFS=
/TCP
>>>>>>>>>>> mount points handled via automounter? =A0How should mount.nfs w=
ait so we
>>>>>>>>>>> don't make other use cases worse? =A0(Looks like most of the hi=
story is
>>>>>>>>>>> intact below).
>>>>>>>>>> Of course we know that autofs is entirely at the mercy of mount(=
8) (and
>>>>>>>>>> mount.nfs in particular). This has always been a difficult situa=
tion for
>>>>>>>>>> the automounter because interactive mount invocations should wai=
t. But I
>>>>>>>>>> believe automount mounts should always time out quickly, but tha=
t leads
>>>>>>>>>> to its own set of problems, especially when home directories are=
concerned.
>>>>>>>>>>
>>>>>>>>>> I think adding "retry=3D0" is the right thing to do myself but I=
'm not
>>>>>>>>>> certain that will work as we expect. I'll have to do some experi=
mentation.
>>>>>>>>>>
>>>>>>>>>>> How long do you think is appropriate for the automounter to wai=
t if the
>>>>>>>>>>> server is down, in your case, Carlos?
>>>>>>>>>>>
>>>>>>>>>>>> Am losing something or there have was something weirdo...!?
>>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries=
=A0[DEFAULT]
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real =A0 =A03m9.000s
>>>>>>>>>>>> user =A0 =A00m0.002s
>>>>>>>>>>>> sys =A0 =A0 0m0.001s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real =A0 =A03m9.000s
>>>>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real =A0 =A03m9.001s
>>>>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>>>>> sys =A0 =A0 0m0.003s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real =A0 =A03m9.001s
>>>>>>>>>>>> user =A0 =A00m0.002s
>>>>>>>>>>>> sys =A0 =A0 0m0.001s
>>>>>>>>>>>>
>>>>>>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries=
[ 5 to 1 ]
>>>>>>>>>>>>
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retryi=
ng). [x 6]
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real =A0 =A01m3.002s
>>>>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retryi=
ng). [x 13]
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real =A0 =A02m6.000s
>>>>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real =A0 =A00m9.003s
>>>>>>>>>>>> user =A0 =A00m0.001s
>>>>>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retryi=
ng). [x 13]
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving=
up).
>>>>>>>>>>>>
>>>>>>>>>>>> real =A0 =A02m6.001s
>>>>>>>>>>>> user =A0 =A00m0.001s
>>>>>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>>>>>> [root@KSTATION ~]#
>>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1.=
.. and
>>>>>>>>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>>>>>>>>
>>>>>>>>>>>> *sigh*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_r=
etries to
>>>>>>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>>>>>>>> Right. =A0Normally the RPC client calls the kernel's socket c=
onnect
>>>>>>>>>>>>> function,
>>>>>>>>>>>>> which does 6 SYN retries. =A0That one call usually takes long=
er than
>>>>>>>>>>>>> the RPC
>>>>>>>>>>>>> client's connect timeout, so it only makes one connect call, =
and then
>>>>>>>>>>>>> fails.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reducing the number of SYN retries per connect attempt causes=
the RPC
>>>>>>>>>>>>> client
>>>>>>>>>>>>> to retry the connect call until its connect timeout expires. =
=A0Each
>>>>>>>>>>>>> connect
>>>>>>>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 =
-o
>>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (givi=
ng up).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> real =A0 =A03m9.000s
>>>>>>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 =
-o
>>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp =A0("retry=3D1" =3D no change)
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retr=
ying).
>>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (givi=
ng up).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> real =A0 =A02m6.004s
>>>>>>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>>>>>>> sys =A0 =A0 0m0.004s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And u're right about expo retries... with tcpdump i've moni=
tored
>>>>>>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs =
on port
>>>>>>>>>>>>>>> 2049...
>>>>>>>>>>>>>>> I tried use "retry=3D1" option on mount without any change.=
.. I dont
>>>>>>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>>>>>> Bruce, no... you're right. =A0I'm describing a situation =
where my
>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than=
3 minutes
>>>>>>>>>>>>>>>>> and 9 seconds...
>>>>>>>>>>>>>>>> The 189 second timeout is likely how long it takes the ker=
nel to
>>>>>>>>>>>>>>>> give up
>>>>>>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN attemp=
ts with
>>>>>>>>>>>>>>>> exponential retries, or something like that). =A0For stock=
CentOS
>>>>>>>>>>>>>>>> 5.3, I
>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts =
-- the
>>>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no preced=
ing rpcbind
>>>>>>>>>>>>>>>> request.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Carlos, let us know if you have replaced any NFS-related C=
entOS
>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy w=
rote:
>>>>>>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candrecn@=
gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server t=
o work with
>>>>>>>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes d=
own i get a
>>>>>>>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon p=
rocess, if
>>>>>>>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout=
(if server
>>>>>>>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinations,=
there my
>>>>>>>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10=
) using
>>>>>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4=
-o
>>>>>>>>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=
=3Dtcp OR
>>>>>>>>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until =
show error
>>>>>>>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up=
))
>>>>>>>>>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>>>>>>>>>> I thought he was describing a situation where the server=
the server
>>>>>>>>>>>>>>>>>> is completely gone and isn't coming back, and wondering =
how to make
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> mount fail faster. =A0But I may be misunderstanding.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --b.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>>>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordom=
o-info.html
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Chuck Lever
>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>>>
>
>

2009-09-17 12:58:45

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Hi ppl,

any news about this problem? :)

Thanks.

2009/8/27 Chuck Lever <[email protected]>:
> On Aug 27, 2009, at 11:00 AM, Trond Myklebust wrote:
>>
>> On Thu, 2009-08-27 at 10:54 -0400, Chuck Lever wrote:
>>>
>>> On Aug 27, 2009, at 10:52 AM, Trond Myklebust wrote:
>>>>
>>>> On Thu, 2009-08-27 at 10:38 -0400, Chuck Lever wrote:
>>>>>
>>>>> On Aug 27, 2009, at 4:54 AM, Ian Kent wrote:
>>>>>>
>>>>>> Ian Kent wrote:
>>>>>>>
>>>>>>> Carlos Andr=E9 wrote:
>>>>>>>>
>>>>>>>> Hi Ian,
>>>>>>>>
>>>>>>>> Thanks for patch and sorry for delay (i'm expecting receive u
>>>>>>>> reply on
>>>>>>>> bug track, not here) :)
>>>>>>>>
>>>>>>>> But, this patch doesnt worked to me like expected... =A0:(
>>>>>>>>
>>>>>>>>
>>>>>>>> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
>>>>>>>> and later changed "10" to "2" with same results...
>>>>>>>> (always restarting service, of course :)
>>>>>>>>
>>>>>>>> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but i g=
ot
>>>>>>>> same results again.
>>>>>>>>
>>>>>>>> Or i'm doing something wrong?
>>>>>>>>
>>>>>>>>
>>>>>>>> [root@KSTATION areas]# automount -V
>>>>>>>>
>>>>>>>> Linux automount version 5.0.1-0.rc2.131.bz517349.1
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> [root@KSTATION areas]# time ls -la testdown
>>>>>>>> ls: testedown: No such file or directory
>>>>>>>>
>>>>>>>> real =A0 =A03m9.006s
>>>>>>>> user =A0 =A00m0.002s
>>>>>>>> sys =A0 =A0 0m0.000s
>>>>>>>
>>>>>>> OK, that isn't behaving the way I expect, I'll have a look.
>>>>>>>
>>>>>>>>
>>>>>>>> LOGGING:
>>>>>>>> -----------------------------------------
>>>>>>>> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount:
>>>>>>>> mount(nfs):
>>>>>>>> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/
>>>>>>>> testdown
>>>>>>>> /misc/areas/testdown
>>>>>>>> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
>>>>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>>>>> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token
>>>>>>>> =3D 91
>>>>>>>> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/
>>>>>>>> areas/testdown
>>>>>>>> -----------------------------------------
>>>>>>
>>>>>> Having a look at this I suspect the reason it doesn't work as
>>>>>> expected
>>>>>> is the waitpid(2) we do after sending the TERM signal to the mount
>>>>>> process (which we have to do) is not returning. This is likely
>>>>>> because
>>>>>> the mount process isn't giving up in a shorter time as it used to.
>>>>>
>>>>> You're thinking maybe mount(2) should be as interruptible as the
>>>>> socket calls that the mount command used to do? =A0That might be
>>>>> reasonable, and I can take a look at that.
>>>>
>>>> In recent kernels, all those RPC calls should be using TASK_KILLABLE
>>>> sleep states. SIGTERM should cause them to abort, provided that some
>>>> process isn't blocking it.
>>>>
>>>> Perhaps TASK_KILLABLE could be backported to RHEL-5?
>>>
>>> That's pretty extensive, with hooks in the page cache. =A0I doubt RH
>>> would go for that.
>>
>> You don't have to add the hooks in the page cache in order to make mount
>> interruptible. You just need to replace the sigmask-manipulation in
>> net/sunrpc and fs/nfs (a.k.a. rpc_clnt_sigmask()/rpc_clnt_sigunmask())
>> with TASK_KILLABLE.
>
> That sounds like a schlep.
>
>> Alternatively, it might suffice to just turn on the 'intr' flag
>> temporarily while doing the mount path walk, and then switch it to
>> whatever default the user actually specified afterwards.
>
> That sounds easy, especially for an EL5 kernel. =A0Maybe "soft" too for t=
he
> first few requests?
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>



-- =

Atenciosamente,
Carlos Andr=E9
LPIC-1 / LPIC-2 / CCNA / CCNP

candrecn.at.gmail.dot.com

2009-09-17 13:12:01

by Ondrej Valousek

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

aHR0cHM6Ly9idWd6aWxsYS5yZWRoYXQuY29tL3Nob3dfYnVnLmNnaT9pZD01MTczNDkKbm8gbmV3
cy4uLi4uCgpDYXJsb3MgQW5kcsOpIHdyb3RlOgo+IEhpIHBwbCwKPgo+IGFueSBuZXdzIGFib3V0
IHRoaXMgcHJvYmxlbT8gOikKPgo+IFRoYW5rcy4KPgo+IDIwMDkvOC8yNyBDaHVjayBMZXZlciA8
Y2h1Y2subGV2ZXJAb3JhY2xlLmNvbT46Cj4gICAKPj4gT24gQXVnIDI3LCAyMDA5LCBhdCAxMTow
MCBBTSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOgo+PiAgICAgCj4+PiBPbiBUaHUsIDIwMDktMDgt
MjcgYXQgMTA6NTQgLTA0MDAsIENodWNrIExldmVyIHdyb3RlOgo+Pj4gICAgICAgCj4+Pj4gT24g
QXVnIDI3LCAyMDA5LCBhdCAxMDo1MiBBTSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOgo+Pj4+ICAg
ICAgICAgCj4+Pj4+IE9uIFRodSwgMjAwOS0wOC0yNyBhdCAxMDozOCAtMDQwMCwgQ2h1Y2sgTGV2
ZXIgd3JvdGU6Cj4+Pj4+ICAgICAgICAgICAKPj4+Pj4+IE9uIEF1ZyAyNywgMjAwOSwgYXQgNDo1
NCBBTSwgSWFuIEtlbnQgd3JvdGU6Cj4+Pj4+PiAgICAgICAgICAgICAKPj4+Pj4+PiBJYW4gS2Vu
dCB3cm90ZToKPj4+Pj4+PiAgICAgICAgICAgICAgIAo+Pj4+Pj4+PiBDYXJsb3MgQW5kcsOpIHdy
b3RlOgo+Pj4+Pj4+PiAgICAgICAgICAgICAgICAgCj4+Pj4+Pj4+PiBIaSBJYW4sCj4+Pj4+Pj4+
Pgo+Pj4+Pj4+Pj4gVGhhbmtzIGZvciBwYXRjaCBhbmQgc29ycnkgZm9yIGRlbGF5IChpJ20gZXhw
ZWN0aW5nIHJlY2VpdmUgdQo+Pj4+Pj4+Pj4gcmVwbHkgb24KPj4+Pj4+Pj4+IGJ1ZyB0cmFjaywg
bm90IGhlcmUpIDopCj4+Pj4+Pj4+Pgo+Pj4+Pj4+Pj4gQnV0LCB0aGlzIHBhdGNoIGRvZXNudCB3
b3JrZWQgdG8gbWUgbGlrZSBleHBlY3RlZC4uLiAgOigKPj4+Pj4+Pj4+Cj4+Pj4+Pj4+Pgo+Pj4+
Pj4+Pj4gRmlyc3RseSBJJ3ZlIGNoYW5nZWQgIiNNT1VOVF9XQUlUPS0xIiB0byAiTU9VTlRfV0FJ
VD0xMCIKPj4+Pj4+Pj4+IGFuZCBsYXRlciBjaGFuZ2VkICIxMCIgdG8gIjIiIHdpdGggc2FtZSBy
ZXN1bHRzLi4uCj4+Pj4+Pj4+PiAoYWx3YXlzIHJlc3RhcnRpbmcgc2VydmljZSwgb2YgY291cnNl
IDopCj4+Pj4+Pj4+Pgo+Pj4+Pj4+Pj4gVGhlbiwgdHJpZWQgcmVtb3ZlICJzZWM9a3JiNXAiLCBh
bmQgbGF0ZXIgcmVtb3ZlZCAibmZzNCIgYnV0IGkgZ290Cj4+Pj4+Pj4+PiBzYW1lIHJlc3VsdHMg
YWdhaW4uCj4+Pj4+Pj4+Pgo+Pj4+Pj4+Pj4gT3IgaSdtIGRvaW5nIHNvbWV0aGluZyB3cm9uZz8K
Pj4+Pj4+Pj4+Cj4+Pj4+Pj4+Pgo+Pj4+Pj4+Pj4gW3Jvb3RAS1NUQVRJT04gYXJlYXNdIyBhdXRv
bW91bnQgLVYKPj4+Pj4+Pj4+Cj4+Pj4+Pj4+PiBMaW51eCBhdXRvbW91bnQgdmVyc2lvbiA1LjAu
MS0wLnJjMi4xMzEuYno1MTczNDkuMQo+Pj4+Pj4+Pj4gWy4uLl0KPj4+Pj4+Pj4+Cj4+Pj4+Pj4+
PiBbcm9vdEBLU1RBVElPTiBhcmVhc10jIHRpbWUgbHMgLWxhIHRlc3Rkb3duCj4+Pj4+Pj4+PiBs
czogdGVzdGVkb3duOiBObyBzdWNoIGZpbGUgb3IgZGlyZWN0b3J5Cj4+Pj4+Pj4+Pgo+Pj4+Pj4+
Pj4gcmVhbCAgICAzbTkuMDA2cwo+Pj4+Pj4+Pj4gdXNlciAgICAwbTAuMDAycwo+Pj4+Pj4+Pj4g
c3lzICAgICAwbTAuMDAwcwo+Pj4+Pj4+Pj4gICAgICAgICAgICAgICAgICAgCj4+Pj4+Pj4+IE9L
LCB0aGF0IGlzbid0IGJlaGF2aW5nIHRoZSB3YXkgSSBleHBlY3QsIEknbGwgaGF2ZSBhIGxvb2su
Cj4+Pj4+Pj4+Cj4+Pj4+Pj4+ICAgICAgICAgICAgICAgICAKPj4+Pj4+Pj4+IExPR0dJTkc6Cj4+
Pj4+Pj4+PiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQo+Pj4+Pj4+
Pj4gQXVnIDI0IDA5OjIzOjUxIEtTVEFUSU9OIGF1dG9tb3VudFsyMDgwM106IG1vdW50X21vdW50
Ogo+Pj4+Pj4+Pj4gbW91bnQobmZzKToKPj4+Pj4+Pj4+IGNhbGxpbmcgbW91bnQgLXQgbmZzNCAt
cyAtbyBydyxhY2wsc2VjPWtyYjVwIDEuMi4zLjQ6L2FyZWFzLwo+Pj4+Pj4+Pj4gdGVzdGRvd24K
Pj4+Pj4+Pj4+IC9taXNjL2FyZWFzL3Rlc3Rkb3duCj4+Pj4+Pj4+PiBBdWcgMjQgMDk6Mjc6MDAg
S1NUQVRJT04gYXV0b21vdW50WzIwODAzXTogbW91bnQobmZzKTogbmZzOiBtb3VudAo+Pj4+Pj4+
Pj4gZmFpbHVyZSAxLjIuMy40Oi9hcmVhcy90ZXN0ZG93biBvbiAvbWlzYy9hcmVhcy90ZXN0ZG93
bgo+Pj4+Pj4+Pj4gQXVnIDI0IDA5OjI3OjAwIEtTVEFUSU9OIGF1dG9tb3VudFsyMDgwM106IGlv
Y3RsX3NlbmRfZmFpbDogdG9rZW4KPj4+Pj4+Pj4+ID0gOTEKPj4+Pj4+Pj4+IEF1ZyAyNCAwOToy
NzowMCBLU1RBVElPTiBhdXRvbW91bnRbMjA4MDNdOiBmYWlsZWQgdG8gbW91bnQgL21pc2MvCj4+
Pj4+Pj4+PiBhcmVhcy90ZXN0ZG93bgo+Pj4+Pj4+Pj4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0KPj4+Pj4+Pj4+ICAgICAgICAgICAgICAgICAgIAo+Pj4+Pj4+IEhh
dmluZyBhIGxvb2sgYXQgdGhpcyBJIHN1c3BlY3QgdGhlIHJlYXNvbiBpdCBkb2Vzbid0IHdvcmsg
YXMKPj4+Pj4+PiBleHBlY3RlZAo+Pj4+Pj4+IGlzIHRoZSB3YWl0cGlkKDIpIHdlIGRvIGFmdGVy
IHNlbmRpbmcgdGhlIFRFUk0gc2lnbmFsIHRvIHRoZSBtb3VudAo+Pj4+Pj4+IHByb2Nlc3MgKHdo
aWNoIHdlIGhhdmUgdG8gZG8pIGlzIG5vdCByZXR1cm5pbmcuIFRoaXMgaXMgbGlrZWx5Cj4+Pj4+
Pj4gYmVjYXVzZQo+Pj4+Pj4+IHRoZSBtb3VudCBwcm9jZXNzIGlzbid0IGdpdmluZyB1cCBpbiBh
IHNob3J0ZXIgdGltZSBhcyBpdCB1c2VkIHRvLgo+Pj4+Pj4+ICAgICAgICAgICAgICAgCj4+Pj4+
PiBZb3UncmUgdGhpbmtpbmcgbWF5YmUgbW91bnQoMikgc2hvdWxkIGJlIGFzIGludGVycnVwdGli
bGUgYXMgdGhlCj4+Pj4+PiBzb2NrZXQgY2FsbHMgdGhhdCB0aGUgbW91bnQgY29tbWFuZCB1c2Vk
IHRvIGRvPyAgVGhhdCBtaWdodCBiZQo+Pj4+Pj4gcmVhc29uYWJsZSwgYW5kIEkgY2FuIHRha2Ug
YSBsb29rIGF0IHRoYXQuCj4+Pj4+PiAgICAgICAgICAgICAKPj4+Pj4gSW4gcmVjZW50IGtlcm5l
bHMsIGFsbCB0aG9zZSBSUEMgY2FsbHMgc2hvdWxkIGJlIHVzaW5nIFRBU0tfS0lMTEFCTEUKPj4+
Pj4gc2xlZXAgc3RhdGVzLiBTSUdURVJNIHNob3VsZCBjYXVzZSB0aGVtIHRvIGFib3J0LCBwcm92
aWRlZCB0aGF0IHNvbWUKPj4+Pj4gcHJvY2VzcyBpc24ndCBibG9ja2luZyBpdC4KPj4+Pj4KPj4+
Pj4gUGVyaGFwcyBUQVNLX0tJTExBQkxFIGNvdWxkIGJlIGJhY2twb3J0ZWQgdG8gUkhFTC01Pwo+
Pj4+PiAgICAgICAgICAgCj4+Pj4gVGhhdCdzIHByZXR0eSBleHRlbnNpdmUsIHdpdGggaG9va3Mg
aW4gdGhlIHBhZ2UgY2FjaGUuICBJIGRvdWJ0IFJICj4+Pj4gd291bGQgZ28gZm9yIHRoYXQuCj4+
Pj4gICAgICAgICAKPj4+IFlvdSBkb24ndCBoYXZlIHRvIGFkZCB0aGUgaG9va3MgaW4gdGhlIHBh
Z2UgY2FjaGUgaW4gb3JkZXIgdG8gbWFrZSBtb3VudAo+Pj4gaW50ZXJydXB0aWJsZS4gWW91IGp1
c3QgbmVlZCB0byByZXBsYWNlIHRoZSBzaWdtYXNrLW1hbmlwdWxhdGlvbiBpbgo+Pj4gbmV0L3N1
bnJwYyBhbmQgZnMvbmZzIChhLmsuYS4gcnBjX2NsbnRfc2lnbWFzaygpL3JwY19jbG50X3NpZ3Vu
bWFzaygpKQo+Pj4gd2l0aCBUQVNLX0tJTExBQkxFLgo+Pj4gICAgICAgCj4+IFRoYXQgc291bmRz
IGxpa2UgYSBzY2hsZXAuCj4+Cj4+ICAgICAKPj4+IEFsdGVybmF0aXZlbHksIGl0IG1pZ2h0IHN1
ZmZpY2UgdG8ganVzdCB0dXJuIG9uIHRoZSAnaW50cicgZmxhZwo+Pj4gdGVtcG9yYXJpbHkgd2hp
bGUgZG9pbmcgdGhlIG1vdW50IHBhdGggd2FsaywgYW5kIHRoZW4gc3dpdGNoIGl0IHRvCj4+PiB3
aGF0ZXZlciBkZWZhdWx0IHRoZSB1c2VyIGFjdHVhbGx5IHNwZWNpZmllZCBhZnRlcndhcmRzLgo+
Pj4gICAgICAgCj4+IFRoYXQgc291bmRzIGVhc3ksIGVzcGVjaWFsbHkgZm9yIGFuIEVMNSBrZXJu
ZWwuICBNYXliZSAic29mdCIgdG9vIGZvciB0aGUKPj4gZmlyc3QgZmV3IHJlcXVlc3RzPwo+Pgo+
PiAtLQo+PiBDaHVjayBMZXZlcgo+PiBjaHVja1tkb3RdbGV2ZXJbYXRdb3JhY2xlW2RvdF1jb20K
Pj4KPj4KPj4KPj4KPj4gICAgIAo+Cj4KPgo+ICAgCgpfX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fXwpORlN2NCBtYWlsaW5nIGxpc3QKTkZTdjRAbGludXgtbmZz
Lm9yZwpodHRwOi8vbGludXgtbmZzLm9yZy9jZ2ktYmluL21haWxtYW4vbGlzdGluZm8vbmZzdjQ=